March 14, 2005

State of The Blogosphere, March 2005, Part 1: Growth of Blogs

It's been 5 months since my first presentation on the State of the Blogosphere at the Web 2.0 conference, which I later posted in parts. A lot has happened, and its time for an update on what's going on in the world of weblogs, and to have a look at the numbers.

I'll be posting this in a number of parts, as there's a lot of information to cover. Today, I'll be focusing on the macro growth of the blogosphere, both in the aggregate number of bloggers out there, as well as the growth of the number of new blogs per day. Here's the chart of the aggregate growth of the blogosphere from March 2003 to February 2005 (compare this chart with the one from October 2004):

Slide0003-1

Technorati is now tracking over 7.8 million weblogs, and 937 million links. That's just about double the number of weblogs tracked in October 2004. In fact, the blogosphere is doubling in size about once every 5 months. It has already done so at this pace four times, which means that in the last 20 months, the blogosphere has increased in size by over 16 times.

Things don't appear to be letting up either. With the launch of MSN spaces and the continued significant growth of popular blogging and journaling tools like Google's Blogger, SixApart's LiveJournal, AOL Journals, and proliferation of software like WordPress and Movable Type, the number of people out there blogging has jumped in the past few months. The chart below shows the significant jump in the number of new blogs created per day (compare with the chart from October 2004):

Slide0004-1

We are currently seeing about 30,000 - 40,000 new weblogs being created each day, depending on the day. Compared to the past, this is well over double the rate of change in October, when there were about 15,000 new weblogs created each day. The remarkable growth over the past 3 months can be attributed to the increase in new, mainstream services such as MSN Spaces, and in increases of use of services like Blogger, AOL Journals, and LiveJournal. In addition, services outside the United States have been taking off, including a number of media sites promoting blogging, such as Le Monde in France.

There is a dark underbelly to these numbers, however: Part of the growth of new weblogs created each day is due to an increase in spam blogs - fake blogs that are created by robots in order to foster link farms, attempted search engine optimization, or drive traffic through to advertising or affiliate sites. We have been battling the spam situation in a significant way for about 2 months - prior to January, spam wasn't much of an issue. All of these charts reflect Technorati's databases after spam blogs have been removed, and we feel that we've been able to capture and identify most of the spam out there, but one should note that there is definitely blog spam that we don't catch (tell us if you see spam in the index!). I'd estimate that we currently catch about 90% of spam and remove it from the index, and notify the blog hosting operators. Most of this fake blog spam comes from hosted services or from specific IP addresses. One of the results of the extremely productive Spam Squashing Summit of a few weeks ago is the increased collaboration between services in order to report and combat this spam. Right now, about 20% of the aggregate pings Technorati receives are from spam blogs, so you won't see that in these numbers - these statistics show only "cleaned" data.

Tomorrow, I'll discuss some statistics around posting volume, which is a more accurate indicator of how much blogging is becoming a habit for people. While some of the dramatic increase in the number of aggregate weblogs out there is quite interesting, it is far more telling to look at the number of posts per day, which show the size and quantity of conversation that is going on. Well, more of that tomorrow, stay tuned!

Posted by dsifry at March 14, 2005 12:54 AM | TrackBack | View blog reactions
Comments

Fascinating stuff, David. Do you have any sense of geographical trends --which countries or continents are driving this growth?

Posted by: Rob Cottingham at March 14, 2005 6:32 AM

Very, very interesting. Do you know if there are any stats out there that discuss how many of these 40,000 new blogs are acutally still being updated say, 3 or 6 months down the road?

Posted by: Michael Choate at March 14, 2005 6:49 AM

Same things come to mind: how many of those 40,000 blogs last more than 2 weeks, and what's going on around the world (in numbers). I can personally tell that in Chile the blog revolution is happening, and the only ones not catching on are the big newspapers (they don't even have RSS feeds!).

Posted by: leo prieto at March 14, 2005 7:05 AM

Hi, Dave. How do you define "spam" exactly?

I'd love to know. We get a large number of spam submissions as well at Findory and we're constantly working to defeat it.

The spam has become increasingly sophisticated over the last several months. For example, we often see blogs that appear to be real with several articles posted. Careful inspection shows the articles are copied from elsewhere with shill links inserted to some website that's trying to increase its PageRank.

What do you consider to be spam? What's your criteria for deciding that a blog is spam?

Posted by: Greg Linden at March 14, 2005 8:20 AM

Great post. I'm throwing out an idea. I run across quite a few link farms as I search the web for relevant date. If there was some way for me to report these sites I would. But is that even fessable, due to the time consuming nature of policing the internet.

Posted by: Alex at March 14, 2005 8:28 AM

I think it's important to keep in mind that the number of weblogs added to the Technorati database and the number of weblogs being created are not the same, nor are the trends necessarily the same. In particular, I'm very skeptical about the huge jump in blogs created per day that happened in the last two months of 2004. It seems likely to me that could be more an artifact of how Technorati is finding blogs (found a previously undiscovered vein of blogs or something) or some other explanation.

Posted by: jkottke at March 14, 2005 8:32 AM

Is it possible to say that blogging as we know it now has an effect on national security of states? Foreign Policy magazine published an article in December 2004 that indicated it is in fact the case. What do others think?

Posted by: Yevgeny at March 14, 2005 9:01 AM

I hear that blogging has cured the common cold. Are there other self-congratulatory remarks that someone would like to make?

Posted by: Cranky Skeptic at March 14, 2005 1:28 PM

I must confess that I use technorati data in my presentations but I am still baffled by the huge difference between these figures and those that point towards a blogosphere exceeding 50 million. Does technorati take into account any of the reported 10 million Korean blogs, for instance?
I would really like to know more about this.

Posted by: lsantos at March 14, 2005 1:39 PM

To those of us who have been following e-mail spam for some years, these trends (and even the graphs) look disturbingly familiar.

The primary difference here, though, is that the free blog hosting sites do have control over who signs up. But you need to exert that control now instead of waiting for the problem to get worse, 'cause once it really gets going it'll be near-impossible to stop.

Posted by: dr.jd at March 14, 2005 2:43 PM

I tend to run into a linkfarm hosted on blogspot about once every other day. I'm not sure whether I should report these or not; do they actually violate the TOS, or are they "merely" a problem for search engines and aggregators? I can't see that it's any skin off of Blogger's nose to host a linkfarm, even if parent Google doesn't want to index it, because the direct cost to them is minimal.

I do suspect that the growth spike is mostly an artifact. But I've also noticed that sites like del.icio.us and technorati are getting an increasing amount of non-English content -- another interesting problem, although one that's less "evil" in nature. I can foresee such services having to filter by language in order to remain usable in English -- or in Chinese (but clearly such "minority" language speakers are able to overcome the same obstacle with ease), unless native duplicates arise. But getting back to the point at hand, this non-English growth is just starting and is probably some portion of the spike. Are there any figures broken out by language, or country, that could tell us if the major growth is happening in just a few places right now?

Posted by: dan hartung at March 14, 2005 3:21 PM

http://www.technorati.com/cosmos/search.html?rank=&url=vioxx

Wow. Lot's of amazing real blogs there. If you'd clean out the spam from your directory, you might accidentally have correct numbers to report in your articles.

Posted by: Sideshow Bob at March 14, 2005 3:33 PM

To make these figures really useful for academics and policy types like myself what would be really useful is detailed info on how you gather blogs (aside from those people who visit and register themselves). Do you spider for them? Does your spider target or favour certain domains and/or countries? Are there blog companies you partner with whose new blogs are automatically registered and others you don't? As lsantos says, there are some who suggest there are 10m blogs in Korea alone.

I'm with Jason in feeling uncomfortable in using your figures when I can't tell if a big jump is because of the number of bloggers increasing or because your means of finding bloggers improved. Or have you published all the necessary details somewhere and I've just missed it?

Posted by: David Brake at March 14, 2005 3:34 PM

I do find it intriguing that you seem less than concerned about spam listings in Technorati. I've conducted a few quick keyword searches to see what popped up, and low and behold, it was packed with spam blogs, host 404s and TONs of spam blogs added TODAY. Take a look at this one:

http://www.technorati.com/cosmos/search.html?rank=&url=vioxx

Maybe the padding of blog numbers looks good on paper, but it makes the integrity of Technorati's search less than stellar. Do you plan any means of taming the spam-blog issue, which seems very much responsible for this huge surge in new blogs?

Posted by: Dave N at March 14, 2005 3:35 PM

I'd like to second Greg Linden's request:

"What do you consider to be spam? What's your criteria for deciding that a blog is spam?"

Because the question is fairly important if we're gonna have a discussion about the spam on Technorati and within other aggregators.

Also, I think the numeric percentage of spam that is quarantined or filtered is quite possibly irrelevant.

Nobody's spamming 'emergent semantics'. Or 'SXSW'. Try either, now. So if I am looking in Technorati for that sort of subject matter, I'm going to find that the signal is strong, relative to the background noise.

Now let's say I want to see what the blogosphere has to say about the Sony DSC-V3 camera I'm considering buying. Try it; query technorati for "Sony DSC-V3".

Woah. I'd call that about 95% spam.

Yeah, most of them are blogspot linkfarms. And the question of whether or not it should be up to Google to take care of that sordid bit of business is a good one.

Here's another one: in the meantime, would it be fair to apply a huge negative penalty to ranks for all blogspot blogs? This is extreme, unfair, and just plain ineffecient, but I wonder, if it might be temporarily necessary to kick the relevant folks into action over there.

Anyway, I have a difficult time believing we're looking at data here that is thoroughly "cleaned" of spam activity.

Otherwise, nice press release. ;P

Posted by: benjamin at March 14, 2005 3:42 PM

You guys need to sit across the table from people that create these type of blogs and ping your server. (This includes pingomatic, weblogs.com, feedburner, etc.).

This is not so much a thing of link farms. The majority of these blogs are simply using a technique called "blogging and pinging." It is to get large sites indexed faster in the search engines.

Now, true, most of these sites are auto generated with tools. However, I am finding it interesting that many of the guys running aggregators and "ping pages" are not in tune with what marketers and others are doing.

Get talking.

Posted by: Mayosan at March 14, 2005 6:30 PM

I agree 100% about blog spam. One of the biggest contributors to this relatively new phenomenon is the "blog & ping" method people are using in an attempt to get their main site's spidered quicker and get the inbound links. Many of their 'blog entries' consist of nothing but a single (anchor text) link, or sometimes the link is accompanied by some meanlingless garble text.

I just started venting my anger these "blogs" that use up server space and bandwidth at Google's free blogspot accounts (with atom.xml feed):
blogdetective.blogspot.com

Thanks for airing this problem.

Posted by: Steve at March 14, 2005 6:52 PM

David
I'd like to reflect a little on some of these comments in that Technorati's tracking of the blogosphere is excellent, yet none the less I feel that it is imprudent for you to refer to Technorati tracking as being representative of the entire "blogosphere", which you do in this post, when simply it is not. As one of the commenters mention, you track mainly English speaking blogs, or more accurately blogs who use the Roman Alphabet. Talking about the number of blogs in the "blogosphere" doubling etc when only reflecting Technorati stats ignores the fact that non-English speaking markets using non-roman text are arguably larger markets that the roman-alphabet market, but "us and them" if you like, collectively, make the blogoshpere.

Posted by: Duncan Riley at March 14, 2005 9:07 PM

David-- I've written a proposal for an Online Personality Census to figure out who comprises all these blogs. Thought you might be able to help spread the word to others who may help with this.
see http://civilities.net/OnlineCensus

-- Jon

Posted by: Jon Garfunkel at March 14, 2005 9:39 PM

I put the number of Korean language bloggers (gotta include the Korean diaspora) around 11.9 million as of January 10, 2005. This is after adjusting down the 29 million reported by three major hosts.

http://dijest.com/bc/2005/01/119-million-korean-bloggers.html

More than spam recognition, I'm really interested in getting consensus on blog recognition. What's are the criteria for a blog?
- Do they include link logs with headlines only? Moblogs and other picture blogs?
- Vlogs?
- Blogs without RSS?
- Handcrafted html blogs, as was common in Japan through 2001?
- Is it a blog if there's just one post on the front page? But it changes regularly?
- What about columns written by one author, as part of a magazine?
- Blogs without links?
- Blogs without archives?
- Do you include conference blogs that are "seasonal", coming back to life every few quarters?
- Do they include group blogs?
- How many authors push a group blog into the "online community" category?
- Do you count a blog's categories, with their own home pages, archives, and RSS feeds, as separate blogs in your count?
- You're looking at the posts on blog home pages. Do you count posts that stay on the home page but that are never updated?
- etc.

In blog recognition, I'd like accuracy numbers: the number of false positives and false negatives. When an educated human observer takes a random sample of sites your software recognizes as a weblog, what percent of the time does the human agree with the software's judgement? Track and report that over time.

About the non-Western blogs, that's something you'll learn this year as you roll out to the Pacific Rim.

The dark blogosphere isn't included in T's numbers. Large percentages of people are blogging behind passwords or firewalls, or choose not to ping or otherwise get noticed.

Keep up the number crunching.

- Phil

Posted by: Phil Wolff at March 14, 2005 11:38 PM

With the ease and accessibility of newer blogging software, it's no wonder we see a spike. I think it's also fair to say the media has really grabbed on to the word "Blog," which in my opinion has lead to this increase. I've seen everyone from Fox News to Jeopardy talk about Blogs - you got to wonder what sort of effect this sort of promotion has on the sphere.

Posted by: Donnie Jeter at March 15, 2005 12:39 AM

Actually, it gets worse than vioxx. There is plenty of other Techno-spamming out there on company names. Try performing research on companies that hold affiliate marketing practices... such as ValueClick or DoubleClick's Performics.

I encourage Technorati to please view my detailed comments on your problem here:

http://www.revenews.com/jeffmolander/archives/000430.html

Posted by: Jeff Molander at March 15, 2005 6:41 AM

Hi,

Thank you for the info. I suspect the curve will continue up for new blogs; but, I think someone already mentioned, most will not posts after two weeks.
http://www.advancinginsights.com/mybiz/?q=blog

Posted by: jim wilde at March 15, 2005 7:12 PM

This is fantastic news. I didn't think they were growing at this pace though!

Posted by: Monish at March 15, 2005 11:53 PM

Nice work.

I like the fact that you have highlighted the growing spam problem.

Now is anybody from Blogger listening? Because if Blogger and other free hosting services don't do something to police the spammers, it will destroy it for everyone, including Blogger's parent, Google. Imagine how gunked up Google is getting.


Best,
Anita Campbell

Posted by: Anita Campbell at March 16, 2005 5:51 AM

Gee, looks like 10 Mil is around a corner. Lets hope that most new blogs stay for a long term ...

How to blog by Tony Pierce, 110

1. write every day.
2. if you think youre a good writer, write twice a day.

Read Tony and Understand

Posted by: Jozef Imrich at March 16, 2005 6:17 AM

Link to the highly recommended article and a winner of 2005 Bloggies:
http://www.tonypierce.com/blog/2004/06/how-to-blog-by-tony-pierce-110-1.htm

Posted by: Jozef Imrich at March 16, 2005 6:19 AM

As to the discussion about who is blogging -

I'm a journalism student at Florida International University and I'm taking an online writing course, the class requires that we create and maintain our own blogs.

I think it's relevant to consider how blogs are being used in our own educational system, as tools to teach and for students to communicate ideas, not only on a college level.

Posted by: Amanda at March 16, 2005 10:33 PM

Nice work.

Buen trabajo.

Posted by: topgun at March 17, 2005 2:14 AM

This is very interesting. But it still doesn't answer the question: how to get the people to stick around? I mean, if they go, say, to
http://new-art.blogspot.com
they will certainly like it - but will they come back?

Posted by: Vvoitek at March 17, 2005 1:17 PM

8 million blogs...I suppose this brings the fundamental issue to the fore: Information vs. Knowledge.

So, the question is, how long (in seconds) do you give the average blogger you are reading before you decide to bookmark them? What are the criteria you use?

Mark
http://MakingLoveEasy.com

Posted by: Mark Michael Lewis at March 18, 2005 1:35 AM

8 million blogs...I suppose this brings the fundamental issue to the fore: Information vs. Knowledge.

So, the question is, how long (in seconds) do you give the average blogger you are reading before you decide to bookmark them? What are the criteria you use?

Mark
http://MakingLoveEasy.com

Posted by: Mark Michael Lewis at March 18, 2005 1:35 AM

This is definitely good to hear. Hopefully it will help whip the MSM into shape eventually.

Posted by: Lee at March 20, 2005 6:56 PM