Categories

State of the Blogosphere August 2005 Part 4: Spam and Fake Blogs

Today I will write about some of the darker sides of the blogosphere, including the increase in spam and fake blogs, comment and trackback spam. Along with the growth in the blogosphere (as reported in parts 1, 2 and 3 last week), Technorati has also been tracking an increase in the number of people who are trying to manipulate the blogosphere. First off, some defintions:

Spam Blogs

Spam blogs are blogs that are created in order to influence results on a search engine by filling the results with spam or fake postings. Sometimes it is done to influence page rank-type algorithms, which monitor the number of pages (in this case blog postings) what link to a page or a site. In the more general web sense, these are called “Link Farms“. Sometimes it is to push higher rankings of those posts and blogs for certain keywords, also known as “keyword stuffing“. There’s been quite a bit already written about link farms and keyword stuffing, it is a pretty well-known technique used by some people to influence search ranking. It is also pretty easy to catch, and most search engines actively penalize or exclude these sites from their index. Here’s some example spam blogs.

Fake Blogs

Fake Blogs are blogs that appear “blog-like” on the surface: They have numerous posts, usually around a particular area or subject, and at first glance look as if they were created by a person. However, these blogs are actually automated creatures created by programs usually in order to get highly targetting Adsense advertising, or in some cases are built to be become a portal for affiliate systems like the Amazon Associates program. They are created in order to perpetuate click fraud or sometimes as a part of a “make money fast” scam on the internet by again taking advantage of traffic brought to them by search engines and web rings. Here’s some example fake blogs.

I should note that some fake blogs may very well contain interesting and relevant content, which opens a debate onto how useful or valuable they are. This is why I don’t include fake blogs in with Spam blogs (as defined above) because it is debatable that these systems are actually providing readers some value.

Comment and Trackback Spam

Modern blogging systems allow for comments and trackbacks as ways of allowing readers or other bloggers to easily add their thoughts and comments to a post. Unfortunately, some spammers have been abusing these systems as well. Many hosting providers and tool makers have incorporated authentication mechanisms and captchas to make it more difficult to automate the tasks. They have also added moderation capabilities and many vendors have made these moderation system turned on by default on new blogs. Early this year, a number of search engines including Technorati adopted the rel=”nofollow” microformat. This latest set of salvos have worked quite well in many cases, but there are thunderclouds on the horizon as research into defeating captcha systems has been effective, and my expectation is that this will continue to be an ongoing battleground in the future.

So what’s being done about it?

The people who build spam and fake blogs think that they can get some kind of advantage – usually by getting additional search engine rankings or affiliate income by building these systems. In essence, they believe that there is an economics that spurs them on – and at Technorati, we’ve been working together with leading players to eliminate that economic incentive. We’re working with the folks who run web advertising systems and at major affiliate programs to alert them of spammers as quickly as possible. We’ve been building real-time systems to identify spammers and fake blogs and sharing that information with other web search engines so that link farms and keyword stuffers see no increases in search rankings.

Now, that doesn’t mean that some of these blogs won’t slip through – it requires a lot of algorithms, deep thinking, and human intervention to build and monitor systems that deal with these problems. It is also an ongoing issue that needs time, care and attention as spammers come up with new and innovative ways to get game search engines and affiliate networks. It would be disingenuous of me to proclaim that the folks at Technorati have got it all solved. We don’t. But we’ve been putting a lot of time and effort into building those systems, and we’re going to continue to innovate as well.

Technorati doesn’t index comments or trackback content or links, and we also support the nofollow tag (you’ll note I used it when linking to the example spam and fake blogs above) to give greater control to bloggers who want to point to spam or fake blogs without implicitly endorsing the site.

We’ve also been working on a number of social methods to help filter through the blogosphere so that bloggers and readers can help to filter wheat from the chaff. Expect to see more from us on this in the coming months.

Web 2.0 Spam Squashing Summit

In February 2005, the first Web 2.0 Spam Squashing Summit was held in Silicon Valley. Key industry players such as AOL, Google, MSN, Six Apart and Yahoo were all in attendance at the standing room-only event, and it engendered a lot of industry cooperation and communication.

Working together with the same group of folks, the second Web Spam Squashing Summit will be held in the second half of September in Silicon Valley again. Final details are still being arranged, but representatives from Amazon, AOL, Ask Jeeves, Drupal, Google, MSN, Six Apart, Tucows, and WordPress have all confirmed their plans to attend the event.

More to come, including an open invitation to others in the industry, in the next few weeks. Watch this space.

Summary:

  • Along with the explosive growth in the blogosphere, there has also been a growth in spam blogs and fake blogs
  • These blogs are almost always created by automated programs, not by people
  • They are usually created with an economic incentive – to get better search engine rankings, or to create affiliate or advertising revenue
  • Technorati has been working closely with major toolmakers, search engines, and hosting providers to quickly identify and stamp out spam and fake blogs
  • The key to reducing blog spam is to eliminate economic incentives, and we are working with major advertising and affiliate programs to create roadblocks for spammers and creators of fake blogs
  • Industry players including Amazon, AOL, Ask Jeeves, Drupal, Google, MSN, Six Apart, Technorati, Tucows, and WordPress and others are getting together in the second half of September for the second Web 2.0 Spam Squashing Summit.

Coming next: Blogs and the Mainstream Media.

  • Share/Bookmark

Related posts:

  1. State of The Blogosphere, March 2005, Part 1: Growth of Blogs
  2. State of the Blogosphere, August 2005, Part 1: Blog Growth
  3. State of the Blogosphere, October 2005 Part 1: On Blogosphere Growth
  4. State of the Blogosphere, August 2005 Part 3: Tags and Tagging
  5. State of the Blogosphere, August 2005, Part 2: Posting Volume

23 comments to State of the Blogosphere August 2005 Part 4: Spam and Fake Blogs

  • David Sifry & Technorati on Spam Blogs and Fake Blogs

    In the latest of his ‘State of the Blogosphere’ posts, David Sifry of Technorati addresses Spam Blogs and Fake Blogs. David talks about why he differentiates spam blogs from fake blogs: I should note that some fake blogs may very…

  • Excellent post. Can I guess at the hidden meaning of almost all the SPAM blog links going to blogspot?

  • Numbers, please! Do you have an estimate on what percentage of blogs are spam/fake blogs?

  • GBenett

    Great installment, enlivened by the links to outrageous sample spam and fakes. But heads up: a visit to one of them [buyaccutane] filled my screen with pop-ups and offers to update my Windows registry. Non-buyers beware!

  • Dave: can I get Microsoft and MSN invited to the Spam Squishing Summit?
    Great post!

  • Spam Blogs

    You should read spam and fake blogs, another problem I’ve been seeing a lot lately is entire blogs being scraped and their content being re-published with ads on it. Structured formats like RSS make this easier than before. The dark side to the …

  • Gabe: Matt says that he blocks “over 80% of all incoming pings to Ping-O-Matic as obvious spam”.
    http://photomatt.net/2005/08/09/spam-blogs/

  • Blog spot is popular because it’s free and easy to set up, but I’ve seen plenty of blog hosted on their own servers, and I even ran across what appeared to be a free blog service that hosted nothing but about a thousand spam blogs.
    For examples, most of the blogs at this search seem to be spam blogs:
    http://www.technorati.com/search/mountain%20bike%20resources
    A pattern I’ve noticed is the blog’s name will be something like “amazing [keyword] {resource|info|guide].” Some of them even have blogrolls!
    RFM

  • Is Technorati going to implement a mechanism for people to report spam/fake blogs, or was the idea discarded as too open to abuse? I’ve found such blogs in my search results and looked for how to report them, but unless I’ve overlooked something there’s no indication on the site.

  • Sifry Part 4 – Spam and Fake Blogs

    Todays post for Sifry, State of the Blogosphere August 2005 Part 4: Spam and Fake Blogs, has lots of detail so I recommend you read it on your own if any part of the following summary catches your eye. Summary:* Along with the explosive growth in the …

  • We’re in for the Spam Squashing Summit as well.
    ScottR

  • Great post.
    I have been wondering what to do about the spam blogs out there that are basically stealing my content. At first I thought it was just a couple but so far I have counted around a dozen!

  • Excellent post on a growing concern. There are a lot of these “spam blogs” are being used to generate AdSense revenue (and stealing content to do it) and to boost SEO efforts. It’ll be interesting to see what the big search engines do to quosh this little trend. Glad to see that some of them are working with Technorati (as you’ve inferred) on this.
    Regarding fake blogs…there is some value to some of the affiliate blogs out there. In fact, some of them are quite good. (On the other hand, some are God-awful, useless and probably completely ineffective.)
    There are a lot of marketers out there looking for ways to “monetize” blogs, and this appears to be the most promising. It will be interesting to see how receptive the blogosphere, search engines and web marketing world at large are to this idea.

  • Great, topical post. This is a big concern – especially for those of us in SEO and those of us with proprietary content that’s being lifted for AdSense revenue.
    I do wonder about the future of affiliate blogs, which you file under “fake” blogs. Granted, most of these are useless, keyword-stuffed, and probably completely ineffective. But there are some really good ones out there, and this seems like the most likely hope for the many marketers out there looking to “monetize” blogs.
    It’ll be interesting to see how the blogosphere, major online retailers and the web marketing community as a whole eventually warm up to — or fully reject — this practice.

  • mrG

    You mention trackback and referral spam, but you leave out folksonomic tag spam.
    Another I’d be interested to hear more about, but don’t know what to call it: I had to stop using TopicExchange RSS on my website because while they were quick to delete spam trackbacks post-hoc, their RSS had already been published, blasting those links across the subscribing aggregators and newsreaders — I don’t know if Technorati or del.icio.us tag pages offer RSS feeds, but I imagine the same may be true for these services as well.
    What sticks in my mind, though, and what I hope gets addressed at the next Spam Stomp, are the words of the spammer interviewed by The Register who, when asked if he felt any compromise of ethics over his actions said, “No, they invite me to post, so I post.”
    I still find it incredible that virtually every blog engine will implement various spam filters but only for some of the public-provided content. Some filter posts but not the title or author name/link, some filter comments but not trackbacks, some filter posts but not comments.
    Some filter trackbacks and comments but leave themselves vulnerable to referrer-log spam (which is also quite considerable even if it is invisible).
    Much as it’s hip and cool to blame the spammers, what that guy said was true: Just as we invite virus hacks and crackers by running an O/S with gaping security holes, we also invite our own content exploitation running public websites with gaping publishing holes, and fixing it ad-hoc reactively and piece-meal is not the way to do it any more than securing Windows with Update Packs ;)
    Don’t flame me. You know it’s true. Running a public website that permits handbills to be pasted over it is no different than putting up a public wall that you damn well know your friends’ band have pasted handbills all over. Same diff. The space was there for free, they needed it, so they used it.
    Before we go damning the spammers, we need to notice that this game takes two to play. This is not an issue of who’s holier than who, it is not an issue of how we would want the world to be if only Walt Disney were God. This is simply the way the world is: People exploit what they can for their own aims and needs, and people pulling in $40K/month often exploit more cleverly than the part-time volunteer.
    We need a realization of the responsibility of the tools in this blog-spam business. Much as I know they all loath a rewrite looming in to expose what they’d overlooked in version one, we need fundamental changes, not finger pointing and heads in the sands. There has to be a core policy in all these blog tools that says nothing, not a single byte, gets into the permanent store except by one robust and reliable gateway, and then structure the rest of the blog services around that — let’s hope getting all these bright people together in one spot prompts at least one of them towards a realistic blog-security plan.

  • Glad to see you are using rel=”nofollow” so they dont get any pagerank from this post.
    Hopefully the Spam Squishing Summit will achieve something that will make fake blogs pointless

  • Excellent topic – but we need some “real-world” solutions.
    In my industry, blogging about home business conducted on the internet I am being smothered by “home business” spam/fake blogs.
    It’s just not fair that I’m trying to go the serious blogging route (doing everything right – there’s no keyword stuffing or overuse of affiliate links at my blog etc) and so many others utilize dodgy seo tactics and get ahead of me. And it’s not just at Technorati. I can’t get anywhere at Google because of such fake sites.
    Reporting fake/spam blogs is a good idea.
    I’m eagerly awaiting the outcome of Web 2.0 Spam Squashing Summit next month.

  • Web 2.0 This Week (August 7-13)

    Web 2.0 This Week
    August 6 – 13
    This has been a week of challenges and successes. Challenging because I am on a driving trip from Anacortes, WA to Los Angeles, and although you can now get internet access while flying, I have no tools for getting acc…

  • Web 2.0 Weekly Wrap-up, 8-14 August 2005

    This week: RSS branding, More Web 2.0 definitions, Spam and fake blogs, MBAs learn about Web 2.0, Techie post of the week.

  • I have a blog in which I put lots of interesting original content. But hardly anybody reads it. What kind of blog is that?
    (A cr@p one? Oh.)
    GM

  • Jim Parham

    I believe “spam that pollutes the blogosphere” should be referred to as “SMOG”
    Jim Parham (Swing Trader/Creative Thinker)
    Yuba City, CA
    stockmaverick@gmail.com

  • Jim Parham

    I believe “spam that pollutes the blogosphere” should be referred to as “SMOG”…….
    Jim Parham ~ Yuba City, CA
    stockmaverick@gmail.com

  • casino reviwes

    Thank you for the time it takes to find and publish this information!Keep on good work!