Performance and Scalability improvement progress report #1

Home
None
Performance and Scalability improvement progress report #1

Performance and Scalability improvement progress report #1

This is first in a series where I’ll be discussing the issues and challenges involved in building a world-class real-time search engine, and keep all of you more informed on what’s going on under the covers at Technorati.

The blogosphere has been growing at an explosive rate – Technorati is now indexing over 14 million blogs, with about 80,000 new blogs created every day. That’s about a new blog created every second! And there’s about 900,000 new posts every day, which means about 37,500 posts per hour that we’re indexing.

So, we have been working really hard on performance and scalability improvements for the service. Just as the size of the blogosphere has been growing by leaps and bounds, and our traffic growth has been growing even faster. We just had another 40%+ growth in traffic this month – which makes this month the fourth month in a row of these kinds of traffic jumps. Basically, that means that we are now serving more traffic in a week than we did in a month just 4 months ago. So, we’ve been racking and stacking servers – over 200 now in our data center, and more coming each week, and we’ve been fixing bugs and making performance enhancements on the web site as well. Our median time from post to index is now under 5 minutes. That means that on average, we index your blog posts in under 5 minutes from when you post them to the web. All you have to do is make sure that your blog software sends us a ping.

Most already do – with just a few exceptions. So, if you’re a blogger and you’re not finding your posts in our index, you should check out our publisher’s guide and send an email to your blog software developer or hosting provider to ask why they aren’t including you in the Technorati High Priority indexer, which we’ve built especially for getting you indexed quickly. And if you are a developer, we’ve got a wealth of material and sample code that you can use to make the process of integrating Technorati features a snap.

As for performance speedups, you can see the results of the work we’ve undertaken in the past few weeks, search result times have gotten more consistent, and consistently faster. The response time bump you see in that graph linked above last night was due to the rollout of our language filtering service (see this post) and the transition as we rolled it out over multiple servers in our farm, which reduced capacity temporarily. Note however, that in early July, our average search response times were 5-7 seconds. Now they are between 1-3 seconds. Our goal is sub 1-second response on all these queries.

There’s still more to do, especially around Cosmos search and ensuring regularly updated link counts, dealing with spam, and making sure that everyone’s tags are indexed properly. We’re working on that as a top priority. We’re going to keep working our butts off to keep providing you with the best search and discovery experience in the world, and I’ll keep you informed with regular updates to let you know what’s going on, both the good and the bad (but hopefully there’ll be more good than bad!) Our mantra around here is to Be Of Service. Thanks for putting up with us during these tough months while we continue to grow to meet all of the demand.

Technorati Tags: , , , , , , , ,

Sifry's Alerts
About The Author
I'm founder of a number of companies, including Offbeat Guides and Technorati. I was the cofounder and CTO of Sputnik and Linuxcare, founding board member of Linux International, and a WEF Technology Pioneer. I've been around the block a few times. Some might call me a serial entrepreneur. You can contact me at david-blog@sifry.com.

5 Comments:


  • By Stephen Pierzchala 28 Jul 2005

    Dave:
    Thanks for the link love for WebPerformance.org
    Readers may also be interested in the New York Search Results
    NYC Search:
    http://www.webperformance.org/grabperf/graph_month_hourly_site.php?test=2
    smp

  • By Jeremy Wright 28 Jul 2005

    Dave, I’d like to interview you for my podcast about this next week if you’ve got time. Let me know, and job well done.

  • By The Newest Industry 28 Jul 2005

    Ok, maybe the drama queen act doesn’t suit me…

    You have to wonder about the resiliemcy of the human mind sometime; apparently as quickly as one dives into a deep funk, you get to bounce off the floor….
    I’m Baaaaaaacccckkk!
    If only to handle the flood from Dave Sifry’s post on Technorati’s p…

  • By Fred 28 Jul 2005

    Hello Mr. Sifry,
    You made it. Since your post during the first bombing in London, the performance of Technorati is much, much better (in the point of view of a user). People had doubts, I had many, but you took the time to see what they had to say and you make sure that everything would be answered. Now we can see the results. It is not finished but it make my really happy to see one of the Blogsphere?s founder regaining his vitality.
    All these statistics are unbelievable; it is like going to the moon: it is hard to believe, but it is happening.
    Keep the good works going Mr. Sifry,
    Salutations,
    Frédérick.

  • By Seth Anderson 03 Aug 2005

    Dave, you may remember leaving a comment on my site regarding spidering issues. That night, Technorati indexed my page, but hasn’t indexed it again since, despite numerous pings. In fact, the search that Kevin Marks tried (successfully) and left in comments no longer finds anything either.
    From your posting, Technorati has scaled beautifully, and works as described for 99% of sites, just not mine. :(
    Seth

Search

Follow @dsifry