Today I will write about some of the darker sides of the blogosphere, including the increase in spam and fake blogs, comment and trackback spam. Along with the growth in the blogosphere (as reported in parts 1, 2 and 3 last week), Technorati has also been tracking an increase in the number of people who are trying to manipulate the blogosphere. First off, some defintions:
Spam blogs are blogs that are created in order to influence results on a search engine by filling the results with spam or fake postings. Sometimes it is done to influence page rank-type algorithms, which monitor the number of pages (in this case blog postings) what link to a page or a site. In the more general web sense, these are called "Link Farms". Sometimes it is to push higher rankings of those posts and blogs for certain keywords, also known as "keyword stuffing". There's been quite a bit already written about link farms and keyword stuffing, it is a pretty well-known technique used by some people to influence search ranking. It is also pretty easy to catch, and most search engines actively penalize or exclude these sites from their index. Here's some example spam blogs.
Fake Blogs are blogs that appear "blog-like" on the surface: They have numerous posts, usually around a particular area or subject, and at first glance look as if they were created by a person. However, these blogs are actually automated creatures created by programs usually in order to get highly targetting Adsense advertising, or in some cases are built to be become a portal for affiliate systems like the Amazon Associates program. They are created in order to perpetuate click fraud or sometimes as a part of a "make money fast" scam on the internet by again taking advantage of traffic brought to them by search engines and web rings. Here's some example fake blogs.
I should note that some fake blogs may very well contain interesting and relevant content, which opens a debate onto how useful or valuable they are. This is why I don't include fake blogs in with Spam blogs (as defined above) because it is debatable that these systems are actually providing readers some value.
Comment and Trackback Spam
Modern blogging systems allow for comments and trackbacks as ways of allowing readers or other bloggers to easily add their thoughts and comments to a post. Unfortunately, some spammers have been abusing these systems as well. Many hosting providers and tool makers have incorporated authentication mechanisms and captchas to make it more difficult to automate the tasks. They have also added moderation capabilities and many vendors have made these moderation system turned on by default on new blogs. Early this year, a number of search engines including Technorati adopted the rel="nofollow" microformat. This latest set of salvos have worked quite well in many cases, but there are thunderclouds on the horizon as research into defeating captcha systems has been effective, and my expectation is that this will continue to be an ongoing battleground in the future.
So what's being done about it?
The people who build spam and fake blogs think that they can get some kind of advantage - usually by getting additional search engine rankings or affiliate income by building these systems. In essence, they believe that there is an economics that spurs them on - and at Technorati, we've been working together with leading players to eliminate that economic incentive. We're working with the folks who run web advertising systems and at major affiliate programs to alert them of spammers as quickly as possible. We've been building real-time systems to identify spammers and fake blogs and sharing that information with other web search engines so that link farms and keyword stuffers see no increases in search rankings.
Now, that doesn't mean that some of these blogs won't slip through - it requires a lot of algorithms, deep thinking, and human intervention to build and monitor systems that deal with these problems. It is also an ongoing issue that needs time, care and attention as spammers come up with new and innovative ways to get game search engines and affiliate networks. It would be disingenuous of me to proclaim that the folks at Technorati have got it all solved. We don't. But we've been putting a lot of time and effort into building those systems, and we're going to continue to innovate as well.
Technorati doesn't index comments or trackback content or links, and we also support the nofollow tag (you'll note I used it when linking to the example spam and fake blogs above) to give greater control to bloggers who want to point to spam or fake blogs without implicitly endorsing the site.
We've also been working on a number of social methods to help filter through the blogosphere so that bloggers and readers can help to filter wheat from the chaff. Expect to see more from us on this in the coming months.
Web 2.0 Spam Squashing Summit
In February 2005, the first Web 2.0 Spam Squashing Summit was held in Silicon Valley. Key industry players such as AOL, Google, MSN, Six Apart and Yahoo were all in attendance at the standing room-only event, and it engendered a lot of industry cooperation and communication.
Working together with the same group of folks, the second Web Spam Squashing Summit will be held in the second half of September in Silicon Valley again. Final details are still being arranged, but representatives from Amazon, AOL, Ask Jeeves, Drupal, Google, MSN, Six Apart, Tucows, and Wordpress have all confirmed their plans to attend the event.
More to come, including an open invitation to others in the industry, in the next few weeks. Watch this space.
Coming next: Blogs and the Mainstream Media.Posted by dsifry at August 9, 2005 09:54 AM | TrackBack | View blog reactions