August 9, 2005

State of the Blogosphere August 2005 Part 4: Spam and Fake Blogs

Today I will write about some of the darker sides of the blogosphere, including the increase in spam and fake blogs, comment and trackback spam. Along with the growth in the blogosphere (as reported in parts 1, 2 and 3 last week), Technorati has also been tracking an increase in the number of people who are trying to manipulate the blogosphere. First off, some defintions:

Spam Blogs

Spam blogs are blogs that are created in order to influence results on a search engine by filling the results with spam or fake postings. Sometimes it is done to influence page rank-type algorithms, which monitor the number of pages (in this case blog postings) what link to a page or a site. In the more general web sense, these are called "Link Farms". Sometimes it is to push higher rankings of those posts and blogs for certain keywords, also known as "keyword stuffing". There's been quite a bit already written about link farms and keyword stuffing, it is a pretty well-known technique used by some people to influence search ranking. It is also pretty easy to catch, and most search engines actively penalize or exclude these sites from their index. Here's some example spam blogs.

Fake Blogs

Fake Blogs are blogs that appear "blog-like" on the surface: They have numerous posts, usually around a particular area or subject, and at first glance look as if they were created by a person. However, these blogs are actually automated creatures created by programs usually in order to get highly targetting Adsense advertising, or in some cases are built to be become a portal for affiliate systems like the Amazon Associates program. They are created in order to perpetuate click fraud or sometimes as a part of a "make money fast" scam on the internet by again taking advantage of traffic brought to them by search engines and web rings. Here's some example fake blogs.

I should note that some fake blogs may very well contain interesting and relevant content, which opens a debate onto how useful or valuable they are. This is why I don't include fake blogs in with Spam blogs (as defined above) because it is debatable that these systems are actually providing readers some value.

Comment and Trackback Spam

Modern blogging systems allow for comments and trackbacks as ways of allowing readers or other bloggers to easily add their thoughts and comments to a post. Unfortunately, some spammers have been abusing these systems as well. Many hosting providers and tool makers have incorporated authentication mechanisms and captchas to make it more difficult to automate the tasks. They have also added moderation capabilities and many vendors have made these moderation system turned on by default on new blogs. Early this year, a number of search engines including Technorati adopted the rel="nofollow" microformat. This latest set of salvos have worked quite well in many cases, but there are thunderclouds on the horizon as research into defeating captcha systems has been effective, and my expectation is that this will continue to be an ongoing battleground in the future.

So what's being done about it?

The people who build spam and fake blogs think that they can get some kind of advantage - usually by getting additional search engine rankings or affiliate income by building these systems. In essence, they believe that there is an economics that spurs them on - and at Technorati, we've been working together with leading players to eliminate that economic incentive. We're working with the folks who run web advertising systems and at major affiliate programs to alert them of spammers as quickly as possible. We've been building real-time systems to identify spammers and fake blogs and sharing that information with other web search engines so that link farms and keyword stuffers see no increases in search rankings.

Now, that doesn't mean that some of these blogs won't slip through - it requires a lot of algorithms, deep thinking, and human intervention to build and monitor systems that deal with these problems. It is also an ongoing issue that needs time, care and attention as spammers come up with new and innovative ways to get game search engines and affiliate networks. It would be disingenuous of me to proclaim that the folks at Technorati have got it all solved. We don't. But we've been putting a lot of time and effort into building those systems, and we're going to continue to innovate as well.

Technorati doesn't index comments or trackback content or links, and we also support the nofollow tag (you'll note I used it when linking to the example spam and fake blogs above) to give greater control to bloggers who want to point to spam or fake blogs without implicitly endorsing the site.

We've also been working on a number of social methods to help filter through the blogosphere so that bloggers and readers can help to filter wheat from the chaff. Expect to see more from us on this in the coming months.

Web 2.0 Spam Squashing Summit

In February 2005, the first Web 2.0 Spam Squashing Summit was held in Silicon Valley. Key industry players such as AOL, Google, MSN, Six Apart and Yahoo were all in attendance at the standing room-only event, and it engendered a lot of industry cooperation and communication.

Working together with the same group of folks, the second Web Spam Squashing Summit will be held in the second half of September in Silicon Valley again. Final details are still being arranged, but representatives from Amazon, AOL, Ask Jeeves, Drupal, Google, MSN, Six Apart, Tucows, and Wordpress have all confirmed their plans to attend the event.

More to come, including an open invitation to others in the industry, in the next few weeks. Watch this space.

Summary:

  • Along with the explosive growth in the blogosphere, there has also been a growth in spam blogs and fake blogs
  • These blogs are almost always created by automated programs, not by people
  • They are usually created with an economic incentive - to get better search engine rankings, or to create affiliate or advertising revenue
  • Technorati has been working closely with major toolmakers, search engines, and hosting providers to quickly identify and stamp out spam and fake blogs
  • The key to reducing blog spam is to eliminate economic incentives, and we are working with major advertising and affiliate programs to create roadblocks for spammers and creators of fake blogs
  • Industry players including Amazon, AOL, Ask Jeeves, Drupal, Google, MSN, Six Apart, Technorati, Tucows, and Wordpress and others are getting together in the second half of September for the second Web 2.0 Spam Squashing Summit.

Coming next: Blogs and the Mainstream Media.

Posted by dsifry at August 9, 2005 9:54 AM | TrackBack | View blog reactions
Comments

Excellent post. Can I guess at the hidden meaning of almost all the SPAM blog links going to blogspot?

Posted by: Randy Charles Morin at August 9, 2005 12:32 PM

Numbers, please! Do you have an estimate on what percentage of blogs are spam/fake blogs?

Posted by: Gabe at August 9, 2005 1:07 PM

Great installment, enlivened by the links to outrageous sample spam and fakes. But heads up: a visit to one of them [buyaccutane] filled my screen with pop-ups and offers to update my Windows registry. Non-buyers beware!

Posted by: GBenett at August 9, 2005 1:09 PM

Dave: can I get Microsoft and MSN invited to the Spam Squishing Summit?

Great post!

Posted by: Robert Scoble at August 9, 2005 3:14 PM

Gabe: Matt says that he blocks "over 80% of all incoming pings to Ping-O-Matic as obvious spam".

http://photomatt.net/2005/08/09/spam-blogs/

Posted by: Jon Åslund at August 9, 2005 3:37 PM

Blog spot is popular because it's free and easy to set up, but I've seen plenty of blog hosted on their own servers, and I even ran across what appeared to be a free blog service that hosted nothing but about a thousand spam blogs.

For examples, most of the blogs at this search seem to be spam blogs:
http://www.technorati.com/search/mountain%20bike%20resources

A pattern I've noticed is the blog's name will be something like "amazing [keyword] {resource|info|guide]." Some of them even have blogrolls!

RFM

Posted by: Fritz at August 9, 2005 4:05 PM

Is Technorati going to implement a mechanism for people to report spam/fake blogs, or was the idea discarded as too open to abuse? I've found such blogs in my search results and looked for how to report them, but unless I've overlooked something there's no indication on the site.

Posted by: KCinDC at August 9, 2005 8:51 PM

We're in for the Spam Squashing Summit as well.
ScottR

Posted by: Scott Rafer at August 9, 2005 10:26 PM

Great post.

I have been wondering what to do about the spam blogs out there that are basically stealing my content. At first I thought it was just a couple but so far I have counted around a dozen!

Posted by: Alistair at August 10, 2005 4:08 AM

Excellent post on a growing concern. There are a lot of these "spam blogs" are being used to generate AdSense revenue (and stealing content to do it) and to boost SEO efforts. It'll be interesting to see what the big search engines do to quosh this little trend. Glad to see that some of them are working with Technorati (as you've inferred) on this.

Regarding fake blogs...there is some value to some of the affiliate blogs out there. In fact, some of them are quite good. (On the other hand, some are God-awful, useless and probably completely ineffective.)

There are a lot of marketers out there looking for ways to "monetize" blogs, and this appears to be the most promising. It will be interesting to see how receptive the blogosphere, search engines and web marketing world at large are to this idea.

Posted by: Aimee Kessler Evans at August 10, 2005 10:17 AM

Great, topical post. This is a big concern - especially for those of us in SEO and those of us with proprietary content that's being lifted for AdSense revenue.

I do wonder about the future of affiliate blogs, which you file under "fake" blogs. Granted, most of these are useless, keyword-stuffed, and probably completely ineffective. But there are some really good ones out there, and this seems like the most likely hope for the many marketers out there looking to "monetize" blogs.

It'll be interesting to see how the blogosphere, major online retailers and the web marketing community as a whole eventually warm up to -- or fully reject -- this practice.

Posted by: Aimee Kessler Evans at August 10, 2005 10:31 AM

You mention trackback and referral spam, but you leave out folksonomic tag spam.

Another I'd be interested to hear more about, but don't know what to call it: I had to stop using TopicExchange RSS on my website because while they were quick to delete spam trackbacks post-hoc, their RSS had already been published, blasting those links across the subscribing aggregators and newsreaders -- I don't know if Technorati or del.icio.us tag pages offer RSS feeds, but I imagine the same may be true for these services as well.

What sticks in my mind, though, and what I hope gets addressed at the next Spam Stomp, are the words of the spammer interviewed by The Register who, when asked if he felt any compromise of ethics over his actions said, "No, they invite me to post, so I post."

I still find it incredible that virtually every blog engine will implement various spam filters but only for some of the public-provided content. Some filter posts but not the title or author name/link, some filter comments but not trackbacks, some filter posts but not comments.

Some filter trackbacks and comments but leave themselves vulnerable to referrer-log spam (which is also quite considerable even if it is invisible).

Much as it's hip and cool to blame the spammers, what that guy said was true: Just as we invite virus hacks and crackers by running an O/S with gaping security holes, we also invite our own content exploitation running public websites with gaping publishing holes, and fixing it ad-hoc reactively and piece-meal is not the way to do it any more than securing Windows with Update Packs ;)

Don't flame me. You know it's true. Running a public website that permits handbills to be pasted over it is no different than putting up a public wall that you damn well know your friends' band have pasted handbills all over. Same diff. The space was there for free, they needed it, so they used it.

Before we go damning the spammers, we need to notice that this game takes two to play. This is not an issue of who's holier than who, it is not an issue of how we would want the world to be if only Walt Disney were God. This is simply the way the world is: People exploit what they can for their own aims and needs, and people pulling in $40K/month often exploit more cleverly than the part-time volunteer.

We need a realization of the responsibility of the tools in this blog-spam business. Much as I know they all loath a rewrite looming in to expose what they'd overlooked in version one, we need fundamental changes, not finger pointing and heads in the sands. There has to be a core policy in all these blog tools that says nothing, not a single byte, gets into the permanent store except by one robust and reliable gateway, and then structure the rest of the blog services around that -- let's hope getting all these bright people together in one spot prompts at least one of them towards a realistic blog-security plan.

Posted by: mrG at August 10, 2005 10:04 PM

Glad to see you are using rel="nofollow" so they dont get any pagerank from this post.

Hopefully the Spam Squishing Summit will achieve something that will make fake blogs pointless

Posted by: Ben S at August 11, 2005 2:46 AM

Excellent topic - but we need some "real-world" solutions.

In my industry, blogging about home business conducted on the internet I am being smothered by "home business" spam/fake blogs.

It's just not fair that I'm trying to go the serious blogging route (doing everything right - there's no keyword stuffing or overuse of affiliate links at my blog etc) and so many others utilize dodgy seo tactics and get ahead of me. And it's not just at Technorati. I can't get anywhere at Google because of such fake sites.

Reporting fake/spam blogs is a good idea.

I'm eagerly awaiting the outcome of Web 2.0 Spam Squashing Summit next month.

Posted by: Martin at August 11, 2005 8:24 AM

I have a blog in which I put lots of interesting original content. But hardly anybody reads it. What kind of blog is that?

(A cr@p one? Oh.)

GM

Posted by: Gary Monro at August 19, 2005 8:18 PM

I believe "spam that pollutes the blogosphere" should be referred to as "SMOG"

Jim Parham (Swing Trader/Creative Thinker)
Yuba City, CA
stockmaverick@gmail.com

Posted by: Jim Parham at August 20, 2005 3:09 PM

I believe "spam that pollutes the blogosphere" should be referred to as "SMOG".......

Jim Parham ~ Yuba City, CA
stockmaverick@gmail.com

Posted by: Jim Parham at August 20, 2005 4:59 PM

you have avery great blog site.is it easy to create a blog site???

Posted by: joel carter at August 21, 2005 11:27 PM

you have avery great blog site.is it easy to create a blog site???

Posted by: joel carter at August 21, 2005 11:27 PM

I hope everyone recognizes that "blogs" don't have to be created with blogging software or be posted on blogging sites....

C

Posted by: Counsel at August 25, 2005 11:23 AM

Good info on Dog Training
and http://www.dogtrainingsecrets.org

Good info on guitar lesson
and http://www.secretguitarlessons.com

Good info on keyword research
and http://www.keywordspytool.com


Posted by: Dog Training at August 25, 2005 1:35 PM

Great Info on Anti-Aging Products check it out at www.antiagingsupersecrets.comGreat info on Trade Show Secrets check it out at www.tradeshowsecretsrevealed.comGreat info on White Teeth Products check it out at www.whiterteethsource.comGreat info on Patent Information check it out at www.everythingpatents.com

Posted by: Anti Aging at August 25, 2005 9:26 PM

Great Info on Anti-Aging Products check it out at www.antiagingsupersecrets.comGreat info on Trade Show Secrets check it out at www.tradeshowsecretsrevealed.comGreat info on White Teeth Products check it out at www.whiterteethsource.comGreat info on Patent Information check it out at www.everythingpatents.com

Posted by: Anti Aging at August 25, 2005 9:26 PM