October 17, 2005

State of the Blogosphere, October 2005 Part 1: On Blogosphere Growth

It is that time of the year again, and I've got some new information on the continued growth of the blogosphere. I made this presentation as part of my 10 minute talk at Web 2.0 on October 6, 2005. You can download the entire presentation, complete with underlying data as well, for research use, or to make part of other presentations. All I ask is that you keep attribution and the Technorati logo in a prominent place wherever the data is used.

Earlier State of the Blogosphere reports are available as well - from July 2005, from
March 2005, and from October 2004.

So, What's New?

Well, first, the basics. The chart below shows the continued growth of the blogosphere. Technorati is now tracking 19.6 Million weblogs, and the total number of weblogs tracked continues to double about every 5 months. This trend has been consistent for at least the last 36 months. In other words, the blogosphere has doubled at least 5 times in the last 3 years. Another way of looking at it is that the blogosphere is now over 30 times as big as it was 3 years ago:

Slide0002-1

The next chart shows the number of new blogs tracked each day by Technorati. About 70,000 new weblogs are tracked every day, which is about a new weblog created each second, somewhere in the world. It also appears that blogging is taking off around the world, and not just in English. Some of the significant increases we've seen over the past 3 months have been due to a proliferation of chinese-speaking weblogs, both on MSN Spaces as well as on Chinese sites like blogcn.com.

Now that we've been tracking spam and fake blogs, we've included the daily tracking statistics for spam and fake blogs from June 1, 2005. We are currently tracking about 2% - 8% of new weblogs are fake or spam weblogs. They are represented as the red spikes that are over and above the legitimate (human-created and updated) blogs shown in blue below.

Slide0003-4

Spam Attacks

In the last couple of days, there's been a lot of talk about a set of spam blogs that have been set up to do keyword stuffing using a lot of popular phrases, including many popular bloggers' names. Lots of people have discussed this, including Tim Bray, Dave Winer, Ed Cone, Robert Scoble, Chris Pirillo, Jeff Jarvis, and others. In order to adequately analyze this, I updated the chart to include the blog data we've been tracking all the way up to yesterday, October 16, 2005.

In the past 2 weeks, there were 805,000 new weblogs created. In addition, Technorati tracked an additional 39,000 new fake and spam weblogs, which means that about 4.6% of the total weblogs tracked were fake or spam.

One of the remarkable things that comes out of looking at the data is that while spam and fake blogs are a problem, they are not an overwhelming problem - In fact, we've experienced much worse spam attacks in the past. The key difference in the spam attack over the weekend is that the attackers' posts included many popular search terms including popular bloggers' names - which is a common ego search on engines like Technorati. This made this particular attack much more visible to a number of high profile bloggers than attacks in the past.

A look at the posting volume over the last year is illustrative as well, and is included in the chart below:

Slide0004-4

You can see from the post statistics that there are on average, between 700,000 and 1.3 Million posts made each day. That's about 33,000 posts per hour. Spam and fake posts are reported here as well, and on average an additional 5.8% of posts (or about 50,000 posts/day) seen each day are spam or fake. This number changes on a daily basis as we track spam attacks, and have reached as high as an additional 18% over the regular daily volume.

One may argue that the numbers I'm reporting are way too low, or that Technorati isn't finding all of the spam weblogs out there. That's a legitimate argument, and by no means am I asserting that Technorati is capturing all of the spam and fake weblogs in existence. We know we're not getting them all, and every day we're working on improving our algorithms and data quality. However, our hard work means that we can still provide you with comprehensive timely results without having to do anything drastic, like removing a major hosting provider with millions of legitimate blogs from our indexes.

We're also working closely with the other players in the industry in order to close the gaps. In October, we helped organize the second Web 2.0 Spam summit, and representatives from Google, Yahoo, Microsoft, AOL, Six Apart, Tucows, Wordpress, Feedster, and many more companies and organizations participated. The summit was quite successful, and I expect that there will be many more to come.

To summarize:

  • As of October 2005, Technorati is now tracking 19.6 Million weblogs
  • The total number of weblogs tracked continues to double about every 5 months
  • The blogosphere is now over 30 times as big as it was 3 years ago, with no signs of letup in growth
  • About 70,000 new weblogs are created every day
  • About a new weblog is created each second
  • 2% - 8% of new weblogs per day are fake or spam weblogs
  • Between 700,000 and 1.3 Million posts are made each day
  • About 33,000 posts are created per hour, or 9.2 posts per second
  • An additional 5.8% of posts (or about 50,000 posts/day) seen each day are from spam or fake blogs, on average

What's Next?

Of course, one important question rears its head - how to make sense out of this monstrous onrush of conversation, and just get what you want - the best information from the most authoritative or influential people, in the most timely manner.

More on that in my next two posts, covering the growth of tags and of context in search and discovery.

Technorati Tags: , , , , , , , , , , , , , , , , , ,

Posted by dsifry at October 17, 2005 2:32 AM | TrackBack | View blog reactions
Comments

It's actually reassuring to see the numbers as low as you report 'em. It may very well be that the most commonly searched for terms are also the ones that are most commonly spammed, leading to this weekend's surge and subsequent widespread emotional breakdown. ;) I certainly appreciate the efforts of Technorati, et al - but I believe it's ultimately the responsibility of the hosted blog platform vendor to diminish the ability for its service to be used for spamming. I think what shocked "us" the most this weekend is that a lot of it has been coming from Google (for some time now) - a company that isn't short of any kind of resource these days.

Posted by: Chris Pirillo at October 17, 2005 3:11 AM

What I would like to hear is more information about his take on the API's how Technorati plans on leading this area. My personal opinion is that Technoartia have made a great start, but there is so much they could do to add extra value to their services with regards to API's.

One of these things is to let the community know that they are taking note of feature requests! It appears that on their Wiki, they don't even confrim that certain suggestions are helpful or unhelpful or if a certain feature is a really cool idea.

Paul Kinlan
http://www.kinlan.co.uk/2005/10/re-state-of-blogosphere-october-2005.html

Posted by: Paul Kinlan at October 17, 2005 4:29 AM

I appreciate these blogosphere updates -- keep them coming! One question though. Technorati currently tracks 19+ million blogs, but networks Xanga and MySpace combine to have other 30 million members alone. Presumably, each member would have a blog, or at least the vast majority of them. Does Technorati not consider MySpace pages blogs? Just curious...

Posted by: David Gong at October 17, 2005 6:53 AM

I appreciate these blogosphere updates -- keep them coming! One question though. Technorati currently tracks 19+ million blogs, but networks Xanga and MySpace combine to have other 30 million members alone. Presumably, each member would have a blog, or at least the vast majority of them. Does Technorati not consider MySpace pages blogs? Just curious...

Posted by: David Gong at October 17, 2005 6:54 AM

Nice to see the spam blogs haven't taken over completely. I'd like to know what percentage of spam blogs are hosted at BlogSpot.

Posted by: Andy Wibbels at October 17, 2005 8:00 AM

Very interesting datas as usual, but could we once get any idea of geolocalisation, how many blogs per country, a classification of the blogosphere per country...?? Would be great !

Posted by: Guillaume du Gardier at October 17, 2005 9:35 AM

Guillaume-- Yes, that would be nice, Technorati does have country data for each registered user. Furuthermore, as I had suggested six months ago in response to a prior Technorati SofB report, an online census could be devised which would specify an XML schema for each blog/RSS stream to more accurately survey users-- http://civilities.net/OnlineCensus

Posted by: Jon Garfunkel at October 17, 2005 9:51 PM

What about the blog advertising market: How many blogs make money and what is their model.

(Full disclosure: I am a journalist interested in seeing how to drive revenue to mainstream media companies like the one I work for.)

Posted by: Thomas Crampton at October 18, 2005 3:03 AM

I wonder if I've been counted and my website about really angry men who happen to be white has been counted.

Posted by: ericatruth at October 18, 2005 4:14 PM

using tools like the trendwatcher you can see that atleast over the last year the mention of 'blog' on the internet has grown exponentially, along with other terms too
click here to see the 'blog' trend history graph

Posted by: leigh at October 18, 2005 7:41 PM

I do surf around the blogs of the internet marketing circles and there are lots of people talking about the ease of starting a blog. Just to create an adsense revenue stream. Although the number of spam blogs does not seem high from these numbers there is software that allows people to create very spammy blogs that do not look so much like it so I am not sure the numbers are quite right from Technorati, if anything maybe the high end of 8% is closer to correct.

We have to wonder how long that blogs can continue in number every 5 months when there is only a finite number of people interested in starting or continue blogging but I believe that we are seeing a real change over the last year where there are peoples journals and blog new sites that no longer compete with each other but are instead finding their own niche and running with it.

Even a few years in to the world of blogging there are exciting times behind us but even more exciting times ahead of us.

Posted by: Bill Nadraszky at October 19, 2005 7:40 AM

Ah long gone are the days of 500,000 blogs!

Posted by: Gavin at October 19, 2005 10:02 AM

Thanks for the updates. Very useful information. Regarding Spam Blogs, what I want to know is what is what steps are being taken in conjunction with the monetizers of this scheme. Spam Blogging would be greatly reduced if Google Ads, Yahoo Ads, and similar ad publishers would suspend the ads running on known spam blogs. Have you and other blog search engines offered to share your blacklists with the ad publishers so that they may suspend the ads running on those site?

Posted by: John Frost at October 20, 2005 6:26 AM

The slide comparing Blogs and MSM in your SOB presentation, what does the y graph show [downloads or site visits] ?

Posted by: peter wells-thorpe at October 21, 2005 4:41 AM

A question:

Have you got from Technorati any information about country blogs in the world, specifically Latin America and the other countries of the world?
How many blogs does my country post daily (I live in Chile, SA)?
What about the quantity of Chilean blogs? Do you have it? Or do you know where I can get it?

Posted by: Benjamin Perez Carrillo at October 23, 2005 6:43 PM

A question:

Have you got from Technorati any information about country blogs in the world, specifically Latin America and the other countries of the world?
How many blogs does my country post daily (I live in Chile, SA)?
What about the quantity of Chilean blogs? Do you have it? Or do you know where I can get it?

Posted by: Benjamin Perez Carrillo at October 23, 2005 6:43 PM

bextra patent buy bextra
bextra http://buy-bextra.name.vg bextra, bextra information, bextra side effects
buy bextra, where can i buy bextra tablets, bextra dosage

Posted by: buy bextra at October 24, 2005 9:33 PM

Interesting collection of statistics, and a great summary of the highlights!

Since the blogosphere is as much about conversations -- as represented by comments and trackbacks -- and community -- as represented by blogrolls and RSS feeds -- I'm eager to learn more about what Technorati has gleaned about these important dimensions of the space.

Posted by: Joe at October 24, 2005 11:26 PM

celebrex celebrex information
celebrex http://celebrex-information.streetracing.org celebrex, celebrex information, celebrex
buy celebrex online, buy celebrex, celebrex online, celebrex on line, cheap celebrex

Posted by: celebrex information at October 25, 2005 12:46 AM

I pay respect to you and your post. Very interesting information! Thanks!

Posted by: John Beale at November 3, 2005 10:35 AM

Have a read of the post of Technorati – the figures for blog growth is truly staggering!

Posted by: Rose at November 5, 2005 4:22 AM

Of course, one important question rears its head - how to make sense out of this monstrous onrush of conversation, and just get what you want - the best information from the most authoritative or influential people, in the most timely manner.

I'll be interested in reading this. I do have some comments on your blog ranking by category. Right now, a blog that self categorizes using a tag, is ranked as "most authoritative" for that tag based on the total number of links. But, that doesn't really work. For example, look here:

http://www.technorati.com/blogs/Knitting

Kitta.net, and green LA girl are not really about knitting. Yes, they have, occasionally blogged about knitting, so they didn't lie when they self categorized, but I doubt more than 1 of their links come from knitting blogs.

Likewise, *my* blog (The knitting fiend) appears here:
http://www.technorati.com/blogs/Intelligent%20Design

I *have* blogged about intelligent design-- but none of my links come from intelligent design blogs. My links are from knitting site.

One quick fix to this problem would be to limit bloggers to claiming 5 or fewer categories. The slower fix would be to actually count how often they post in a specific category. The even slower-- but ultimately better-- fixt would be to count how many of their links come from blogs in a category.

Posted by: lucia at November 6, 2005 6:58 AM

David,

Great post, and useful information on the spam blogs, though I think for certain keywords, spam blogs tend to dominate. I've heard of alternative data from blog metrics companies that backs that up.

I was wondering if you have the latest data on the estimates for the number of corporate blogs? since your last post in 2004 http://www.sifry.com/alerts/archives/000390.html.

John

Posted by: john cass at November 9, 2005 9:20 AM