It's been a long and busy month, and I wanted to give y'all an update on the infrastructure, performance and scalability progress over at Technorati. There's been a lot going on as I described earlier in the year, and we've made some progress, but there's important things that are still broken, and are being fixed this month.
The situation as of couple of months ago
The blogosphere has been growing at an explosive rate - Technorati is now indexing over 16 million blogs, with about 100,000 new blogs created every day. And there's over 1.4 Million new posts every day, and about 22% of those posts are from spam or fake blogs, which means that even after we pull out the spam and fake blogs from the indexes, we are dealing with about 1.2 Million posts each day.
We just weren't expecting that kind of sudden growth, both on the posting side and also on the search side, and frankly we didn't plan well enough to handle the load. We've been adding new machines to our datacenter, - over 400 now - and more coming each week, and we've been fixing bugs and making performance enhancements on the web site as well.
We also made some pretty significant performance improvements to keyword search - most now returning in 1--2 seconds; you can see some details on those statistics and also a month view.
However, Cosmos search (or URL search) is still being worked on, and is often timing out under the increased load. Unfortunately this is also one of the searches that bloggers find most compelling, as it helps you to all know who is linking to your blog, and it is the very first type of search that Technorati made available, so it is near and dear to our hearts. Everyone here also uses it every day, so it really sucks when it isn't working right.
As search traffic has grown, we've also seen an increase in support and feedback requests. It's my goal to make sure that we respond to all support requests within 24 hours of getting the request. right now, we're not meeting those goals, and some people haven't had a human response in over a week from when they sent in their request.
What we're doing
Once we got our keyword search infrastructure back on track, our infrastructure team has been working 100% on fixing Cosmos search. Our current plan is to have Cosmos search back up and running by the end of September, and you'll see incremental improvement throughout the coming month. I'll keep you informed on progress of this critical project. As the project progresses throughout the month, you'll be able to see progress because you'll see fewer and fewer error messages when you do a URL search as September progresses.
We're busy expanding out our support capabilities, and also putting together tools to make it easier for users to help answer their own questions before a Technorati support staffer has to get involved, and we've already made a bunch of fixes and feature enhancements to help fix the most common support requests, like fixes in our blog claiming code.
What about new stuff?
While we work on these core infrastructure issues, we're not resting on our laurels in our dedication to provide great tools and services for bloggers and for people who want to keep track of what's happening on the web right now. There'll be more to announce in the coming days and weeks, stay tuned...
Thanks for your support
I am consistently humbled and amazed at how great our users are. You guys have stood by us as the service has grown and has gone through growing pains. We take this trust very seriously, and are working very very hard to live up to your expectations. Thanks.
Posted by dsifry at September 1, 2005 1:58 AM | TrackBack | View blog reactionsDave, any comments on BL Ochman's "hot tip" a few weeks ago that Technorati was being sold to ... Yahoo, News Corp, Google ...?
Posted by: John (SYNTAGMA) at September 1, 2005 3:09 AMDavid, talking about preformance problems only is a huge understatement. I don't think NOT getting data out of Technorati is the worst; getting the WRONG info is far far worse.
You have major problems parsing blog main pages, (standard templates, major bog platforms) the result is entries where:
- the body of a post appears with the title of another one (mostly, but not always, the previous one)
- the body of a post is associated with tags of another one.
I always wondered that if parsing the main page is so difficult why you don't use the permalink view, or even better, the RSS feed intead of the main page where you "get lost" - perhaps THAT is a performance issue?
In any case, search problems are the tip of the iceberg, you're problem is building the wrong index. From a blogger's point of view, this makes us look like complete fools - meaningless posts.
Posted by: Zoli Erdos at September 1, 2005 7:05 AMDave
Can you tell us what OS/language(s)/DB(s) you guys are using?
I would like to add in my 2 pennies (2 cents (2 yen etc)) here, I sent a support request in close to three weeks ago now and have still not recieved any reply.
I know that you (Technorati) are providing a great service and it is "Free" at the point of use, but to have your site claim that within 3 business days a reply will be sent, and then not to have a reply; it is a bit silly and I am sure I am not the only one.
This isn't hate mail or anything, I love Technorati; I regularly use Technorati; Technorati has helped the readership of my blog increase massively for which I thank you no end, but you still need to get a handle on your support issues.
I will still stick with you though!!
http://www.kinlan.co.uk/2005/09/technoratiboooooo.html
Posted by: Paul Kinlan at September 1, 2005 1:01 PMWhen you talk about removing spam blogs, is this an entirely automated process?? Simply because I'm half given the impression that spam removal is a hand process, which would almost certainly become overwhelmed.
Posted by: Brian Turner at September 1, 2005 1:35 PMWe have complete faith in you David and understand that your search engine is at it's beginning stage of uncovering new technology. What matters is that you have the vision and will become successful in implementing your ideas :)
Mike
P.S.: We've met at the SES for a brief second.
Posted by: Mike from Blog Party at September 7, 2005 8:16 AM| Sun | Mon | Tue | Wed | Thu | Fri | Sat |
|---|---|---|---|---|---|---|
| 1 | 2 | 3 | ||||
| 4 | 5 | 6 | 7 | 8 | 9 | 10 |
| 11 | 12 | 13 | 14 | 15 | 16 | 17 |
| 18 | 19 | 20 | 21 | 22 | 23 | 24 |
| 25 | 26 | 27 | 28 | 29 | 30 | 31 |