Lilia Efimova at Mathemagenic asked an interesting question about Technorati on her weblog today, and I popped by (thanks to my watchlist) and answered her questions. Given the interest, I thought I'd republish my response here, along with a few elaborations.
Does anyone knows how Technorati works? Do they process blog homepages only? Or only items in RSS feeds? Or only things "not older than ..."?
I wonder because I usually observe some fluctuations in numbers of inbound blogs and inbould links. E.g. yesterday I had 100+ inbound blogs and today it's 80+. It would be interesting to know why these things change. I tried Technorati site and weblog of David Sifry with no luck.
I guess this is a quite typical question that user has about systems that digest information: what are the criteria that are used?
Some basics about Technorati
1) We spider weblogs, and correlate each weblog's outbound links to any page on your blog/site
2) Technorati works on any URL - not just URLs for weblogs. For example, you can see what people are saying about an interesting article or favorite company, and get an instant read on the conversations going on around that article or site.
3) The simplest way get your weblog included in the Technorati index is to ping us whenever you update your weblog. That puts you in the high-priority queue for indexing. You can save the page as a bookmark, or you can program your weblog software to do it automatically.
4) To calculate the inbound blog list, we use the outbound links from the blog homepage, not from the archives
5) We do process RSS feeds an other metadata, but that doesn't affect your inbound blog stats. As long as you produce HTML, you're OK.
6) Nightly, we go through the database and re-calculate the number of inbound blogs and links to every weblog we track, which helps us double-check our work and also allows us to create the interesting newcomers list, the interesting recent blogs list, etc.
We strive to be accurate all the time. Sometimes things slip through. For example, one of the reasons why your inbound blog count may be smaller today is because we were doing maintenance of the database last night to remove duplicate blogs - for example, Radio Userland has an obnoxious habit of sending pings to www.weblogs.com for each weblog "category" if you use multiple categories on your blog. Same information, same author, just link spam, basically. So, last night we cleaned out a bunch of that stuff. If you were linked from a bunch of people's blog categories, then you lost those inbound blogs. Then again, so did everyone else. :-)
The last thing to remember is that while we strive for accuracy and completeness, we still do have bugs and have to fix things. If you notice something strange, please don't hesitate to send us feedback (firstname.lastname@example.org) and let us know.