Performance and Scalability improvement progress report #1

This is first in a series where I'll be discussing the issues and challenges involved in building a world-class real-time search engine, and keep all of you more informed on what's going on under the covers at Technorati.

The blogosphere has been growing at an explosive rate - Technorati is now indexing over 14 million blogs, with about 80,000 new blogs created every day. That's about a new blog created every second! And there's about 900,000 new posts every day, which means about 37,500 posts per hour that we're indexing.

So, we have been working really hard on performance and scalability improvements for the service. Just as the size of the blogosphere has been growing by leaps and bounds, and our traffic growth has been growing even faster. We just had another 40%+ growth in traffic this month - which makes this month the fourth month in a row of these kinds of traffic jumps. Basically, that means that we are now serving more traffic in a week than we did in a month just 4 months ago. So, we've been racking and stacking servers - over 200 now in our data center, and more coming each week, and we've been fixing bugs and making performance enhancements on the web site as well. Our median time from post to index is now under 5 minutes. That means that on average, we index your blog posts in under 5 minutes from when you post them to the web. All you have to do is make sure that your blog software sends us a ping.

Most already do - with just a few exceptions. So, if you're a blogger and you're not finding your posts in our index, you should check out our publisher's guide and send an email to your blog software developer or hosting provider to ask why they aren't including you in the Technorati High Priority indexer, which we've built especially for getting you indexed quickly. And if you are a developer, we've got a wealth of material and sample code that you can use to make the process of integrating Technorati features a snap.

As for performance speedups, you can see the results of the work we've undertaken in the past few weeks, search result times have gotten more consistent, and consistently faster. The response time bump you see in that graph linked above last night was due to the rollout of our language filtering service (see this post) and the transition as we rolled it out over multiple servers in our farm, which reduced capacity temporarily. Note however, that in early July, our average search response times were 5-7 seconds. Now they are between 1-3 seconds. Our goal is sub 1-second response on all these queries.

There's still more to do, especially around Cosmos search and ensuring regularly updated link counts, dealing with spam, and making sure that everyone's tags are indexed properly. We're working on that as a top priority. We're going to keep working our butts off to keep providing you with the best search and discovery experience in the world, and I'll keep you informed with regular updates to let you know what's going on, both the good and the bad (but hopefully there'll be more good than bad!) Our mantra around here is to Be Of Service. Thanks for putting up with us during these tough months while we continue to grow to meet all of the demand.

Technorati Tags: , , , , , , , ,