Performance and Scalability improvement progress report #2
16It’s been a long and busy month, and I wanted to give y’all an update on the infrastructure, performance and scalability progress over at Technorati. There’s been a lot going on as I described earlier in the year, and we’ve made some progress, but there’s important things that are still broken, and are being fixed this month.
The situation as of couple of months ago
The blogosphere has been growing at an explosive rate – Technorati is now indexing over 16 million blogs, with about 100,000 new blogs created every day. And there’s over 1.4 Million new posts every day, and about 22% of those posts are from spam or fake blogs, which means that even after we pull out the spam and fake blogs from the indexes, we are dealing with about 1.2 Million posts each day.
We just weren’t expecting that kind of sudden growth, both on the posting side and also on the search side, and frankly we didn’t plan well enough to handle the load. We’ve been adding new machines to our datacenter, – over 400 now – and more coming each week, and we’ve been fixing bugs and making performance enhancements on the web site as well.
We also made some pretty significant performance improvements to keyword search – most now returning in 1–2 seconds; you can see some details on those statistics and also a month view.
However, Cosmos search (or URL search) is still being worked on, and is often timing out under the increased load. Unfortunately this is also one of the searches that bloggers find most compelling, as it helps you to all know who is linking to your blog, and it is the very first type of search that Technorati made available, so it is near and dear to our hearts. Everyone here also uses it every day, so it really sucks when it isn’t working right.
As search traffic has grown, we’ve also seen an increase in support and feedback requests. It’s my goal to make sure that we respond to all support requests within 24 hours of getting the request. right now, we’re not meeting those goals, and some people haven’t had a human response in over a week from when they sent in their request.
What we’re doing
Once we got our keyword search infrastructure back on track, our infrastructure team has been working 100% on fixing Cosmos search. Our current plan is to have Cosmos search back up and running by the end of September, and you’ll see incremental improvement throughout the coming month. I’ll keep you informed on progress of this critical project. As the project progresses throughout the month, you’ll be able to see progress because you’ll see fewer and fewer error messages when you do a URL search as September progresses.
We’re busy expanding out our support capabilities, and also putting together tools to make it easier for users to help answer their own questions before a Technorati support staffer has to get involved, and we’ve already made a bunch of fixes and feature enhancements to help fix the most common support requests, like fixes in our blog claiming code.
What about new stuff?
While we work on these core infrastructure issues, we’re not resting on our laurels in our dedication to provide great tools and services for bloggers and for people who want to keep track of what’s happening on the web right now. There’ll be more to announce in the coming days and weeks, stay tuned…
Thanks for your support
I am consistently humbled and amazed at how great our users are. You guys have stood by us as the service has grown and has gone through growing pains. We take this trust very seriously, and are working very very hard to live up to your expectations. Thanks.
Related posts:





Dave, any comments on BL Ochman’s “hot tip” a few weeks ago that Technorati was being sold to … Yahoo, News Corp, Google …?
Están arreglando Technorati
Después de que tras muchos días parado por fin volvieran a actualizar los datos de fuentes y enlaces de Technorati, David Sifry explica qué es lo que están haciendo para que las cosas vuelvan a funcionar correctamente en Per…
David, talking about preformance problems only is a huge understatement. I don’t think NOT getting data out of Technorati is the worst; getting the WRONG info is far far worse.
You have major problems parsing blog main pages, (standard templates, major bog platforms) the result is entries where:
- the body of a post appears with the title of another one (mostly, but not always, the previous one)
- the body of a post is associated with tags of another one.
I always wondered that if parsing the main page is so difficult why you don’t use the permalink view, or even better, the RSS feed intead of the main page where you “get lost” – perhaps THAT is a performance issue?
In any case, search problems are the tip of the iceberg, you’re problem is building the wrong index. From a blogger’s point of view, this makes us look like complete fools – meaningless posts.
Dave
Can you tell us what OS/language(s)/DB(s) you guys are using?
Technorati strikes back
We all had some troubles with Technorati in the past weeks. Now Sifry has this post about the upgrade and newly found balance….
I would like to add in my 2 pennies (2 cents (2 yen etc)) here, I sent a support request in close to three weeks ago now and have still not recieved any reply.
I know that you (Technorati) are providing a great service and it is “Free” at the point of use, but to have your site claim that within 3 business days a reply will be sent, and then not to have a reply; it is a bit silly and I am sure I am not the only one.
This isn’t hate mail or anything, I love Technorati; I regularly use Technorati; Technorati has helped the readership of my blog increase massively for which I thank you no end, but you still need to get a handle on your support issues.
I will still stick with you though!!
http://www.kinlan.co.uk/2005/09/technoratiboooooo.html
When you talk about removing spam blogs, is this an entirely automated process?? Simply because I’m half given the impression that spam removal is a hand process, which would almost certainly become overwhelmed.
Start.com out of beta…feedback coming in…
Start.com blog spills the beans…
“primarily driven by your feedback, here are some of the highlights:…
GNC-2005-09-02 #96
I spend some time talking about the Hurricane Katrina aftermath, and give you some links to some tech sites that…
GrabPERF: Technorati Uses GrabPERF to Track Improvements
Dave Sifry released a state of the search post yesterday. Between re-building my laptop and getting a client project out, I missed the original post.[here]
Thanks Dave! And continued success.
Technorati: Technorati, Web Performance, Capacity Pl…
Technorati responds to criticism: Too many blogs!
Technorati, as we’ve mentioned before, is a venture-backed San Francisco company that lets you search things written by blogs, and is a favorite among bloggers because it also lets them check who has linked to them. Recently, there has been some harsh …
technorati blog finder
O processo n
Revisiting Technorati’s Blog Finder & Listing Issues
Gary Price wrote earlier of Technorati’s new Technorati Blog Finder, along with some issues with the new beta service. For our Search Engine Watch members, the Revisiting Technorati’s Blog Finder & Listing Issues article I’ve now posted take a long…
Revisiting Technorati’s Blog Finder & Listing Issues
Gary Price wrote earlier of Technorati’s new Technorati Blog Finder, along with some issues with the new beta service. For our Search Engine Watch members, the Revisiting Technorati’s Blog Finder & Listing Issues article I’ve now posted take a long…
We have complete faith in you David and understand that your search engine is at it’s beginning stage of uncovering new technology. What matters is that you have the vision and will become successful in implementing your ideas
Mike
P.S.: We’ve met at the SES for a brief second.
Start.com out of beta…feedback coming in…
Start.com blog spills the beans…
“primarily driven by your feedback, here are some of the highlights:…