It is that time of the year again, and I've got some new information on the continued growth of the blogosphere. I made this presentation as part of my 10 minute talk at Web 2.0 on October 6, 2005. You can download the entire presentation, complete with underlying data as well, for research use, or to make part of other presentations. All I ask is that you keep attribution and the Technorati logo in a prominent place wherever the data is used.
So, What's New?
Well, first, the basics. The chart below shows the continued growth of the blogosphere. Technorati is now tracking 19.6 Million weblogs, and the total number of weblogs tracked continues to double about every 5 months. This trend has been consistent for at least the last 36 months. In other words, the blogosphere has doubled at least 5 times in the last 3 years. Another way of looking at it is that the blogosphere is now over 30 times as big as it was 3 years ago:
The next chart shows the number of new blogs tracked each day by Technorati. About 70,000 new weblogs are tracked every day, which is about a new weblog created each second, somewhere in the world. It also appears that blogging is taking off around the world, and not just in English. Some of the significant increases we've seen over the past 3 months have been due to a proliferation of chinese-speaking weblogs, both on MSN Spaces as well as on Chinese sites like blogcn.com.
Now that we've been tracking spam and fake blogs, we've included the daily tracking statistics for spam and fake blogs from June 1, 2005. We are currently tracking about 2% - 8% of new weblogs are fake or spam weblogs. They are represented as the red spikes that are over and above the legitimate (human-created and updated) blogs shown in blue below.
In the last couple of days, there's been a lot of talk about a set of spam blogs that have been set up to do keyword stuffing using a lot of popular phrases, including many popular bloggers' names. Lots of people have discussed this, including Tim Bray, Dave Winer, Ed Cone, Robert Scoble, Chris Pirillo, Jeff Jarvis, and others. In order to adequately analyze this, I updated the chart to include the blog data we've been tracking all the way up to yesterday, October 16, 2005.
In the past 2 weeks, there were 805,000 new weblogs created. In addition, Technorati tracked an additional 39,000 new fake and spam weblogs, which means that about 4.6% of the total weblogs tracked were fake or spam.
One of the remarkable things that comes out of looking at the data is that while spam and fake blogs are a problem, they are not an overwhelming problem - In fact, we've experienced much worse spam attacks in the past. The key difference in the spam attack over the weekend is that the attackers' posts included many popular search terms including popular bloggers' names - which is a common ego search on engines like Technorati. This made this particular attack much more visible to a number of high profile bloggers than attacks in the past.
A look at the posting volume over the last year is illustrative as well, and is included in the chart below:
You can see from the post statistics that there are on average, between 700,000 and 1.3 Million posts made each day. That's about 33,000 posts per hour. Spam and fake posts are reported here as well, and on average an additional 5.8% of posts (or about 50,000 posts/day) seen each day are spam or fake. This number changes on a daily basis as we track spam attacks, and have reached as high as an additional 18% over the regular daily volume.
One may argue that the numbers I'm reporting are way too low, or that Technorati isn't finding all of the spam weblogs out there. That's a legitimate argument, and by no means am I asserting that Technorati is capturing all of the spam and fake weblogs in existence. We know we're not getting them all, and every day we're working on improving our algorithms and data quality. However, our hard work means that we can still provide you with comprehensive timely results without having to do anything drastic, like removing a major hosting provider with millions of legitimate blogs from our indexes.
We're also working closely with the other players in the industry in order to close the gaps. In October, we helped organize the second Web 2.0 Spam summit, and representatives from Google, Yahoo, Microsoft, AOL, Six Apart, Tucows, Wordpress, Feedster, and many more companies and organizations participated. The summit was quite successful, and I expect that there will be many more to come.
Of course, one important question rears its head - how to make sense out of this monstrous onrush of conversation, and just get what you want - the best information from the most authoritative or influential people, in the most timely manner.
More on that in my next two posts, covering the growth of tags and of context in search and discovery.
Technorati Tags: blogosphere, blogs, blogsearch, fakeblog, postingvolume, posts, postvolume, scaling, search, search engine, sotb, sotb2005, spam, spamblog, statistics, stats, web2con, weblog, weblogs