Building with Blogs
Name a topic with a community of interest
around it. Now go to Google and look it up. There’s a good chance one or more of the top results will include somebody’s weblog
(aka blog). Let’s take three examples:
- 802.11b: the top listing out of 687,000 results is WiFi news, a weblog by a pair of journalists, Glenn Fleishman and Adam Engst. The IEEE working group’s site is in the #2 position. The WiFi Alliance’s portal is #3. (update: The WiFi Alliance has dropped to #5)
- Segway: the top listing out of 81,500 results is the Segway company’s site, followed by the Berkeley Segway Portal and then followed by Segway News–a weblog by Paul Nakada.
- Weblog: out of 2,620,000 results, the “top
listing is Aaron Swartz’s Google Weblog, followed by the Guardian Limited’s weblog. Following them is the blog of a certain Linux
Journal editor who also happens to be writing what you’re reading right now. (update: Doc’s blog has actually moved up in the rankings!)
Blogs succeed largely because they are extremely native to the Web as Tim Berners-Lee conceived it in the first place. Here’s how weblog software pioneer Dave Winer explains it:
The first weblog was the first web site, info.cern.ch,
the site built by Tim Berners-Lee at CERN. From this page TBL pointed
to all the new sites as they came on-line. Luckily, the content of this
site has been archived at the World Wide Web Consortium. (Thanks to Karl
Dubost for the link.)
Linking to sources and crediting them (as Dave does in that last line) has always been a native ethical and journalistic practice on the Web. While big-time broadcasters, publishers and VC-funded wannabes continue to see the Net as nothing more than a plumbing system for
distributing “content” to “consumers”, blogging software developers have quietly added enormous value to the linking and crediting functionality of the Web. Dave and other independent developers
have created standards like XML-RPC, SOAP and RSS, plus open APIs that together turn the Web into a writing and publishing medium like nothing we’ve ever seen before.
Blogging is not about “architecting”, “building”, “designing” or “authoring” anything, because
blogs aren’t “sites” in the usual sense. Blogs are journals. With blogs you write directly on the Web. Most posts are short, though they don’t have to be. All are topical and current, or they
disappear from the aggregation sites and services and eventually from the “blogrolls” of listed favorite links on other blogs.
Each blog is like a fireplace, and each post is like a log heaved on top to keep the fire burning. Every post has its own “permalink”,so others can point directly to it. As long as a blog puts out heat and
light, others who care about the author’s subject are drawn to it. So are Google and other search engines, which sift constantly through the ashes.
At their best, blogs are link magnets as well as sources of links,
which is why Google likes them so much. Google equates inbound links
with authority and ranks the results accordingly. More links from more
highly linked-to pages result in higher page ranks. That’s why
so many blogs rise to the top of so many subject searches. The whole
system–which includes blogs, aggregators, web services and Google
itself–feeds, builds and grows on itself. It also attracts and feeds
on the RSS streams offered up by discussion and news sites ranging
from Slashdot and Linux Journal to the New York Times. RSS is
one among a growing number of free and open technologies created and/or
improved by weblog developers.
As weblogs account for more and more of the traffic in knowledge about a
given subject, they become powerful instruments for hacking common wisdom.
In many categories, they are moving ahead of mainstream journals and
portals and building useful community services where over-funded dot-com
efforts failed spectacularly. One example from that last category is John
Hiler’s Cityblogs, which appeared in December 2002. Hiler explains:
Local sites are caught between a rock and a hard place: either
they hire expensive full-time writers to create content they can’t
afford or they fire their writers and turn to automated content:
weather reports, local news and movie listings.
It’s a Gordian knot–hire expensive writers or you have no
content–that blogs are uniquely positioned to cut in two. Now
that I’ve set up the site, I can cover these three categories in
an hour or two a day. As Glenn “Instapundit” Reynolds put it at a recent conference on weblogs, “blogging is cheap”.
So blogging is becoming an option in all kinds of places, which
means there’s a good chance that you, as a Linux Journal
reader, fall into one or both of two groups:
- Users who want to set up a blog and start writing on the Web.
- System administrators and others with scripting and programming
skills, who are either looking to set up a blogging system or to manage
or change a system that’s already in place.
To get a handle on both, let’s go back to Google.
For better or worse, Google is a commercial company whose
services have become de facto web infrastructure. This is especially
true for blogs, which make liberal use not only of Google’s search
engine but also of its APIs, which allow automated queries from programs.
Google’s APIs are part of a growing raft of sites and services,
mostly hacked together by enterprising independent developers.
Technorati (see Sidebar), for example, pays attention to fresh links between
blogs (leapfrogging referrer logs) and organizes the information
into “watchlists” and other useful listings. If you want
Technorati to tell you who’s currently linking to your blog, you
can ask for this information on the site or pay $5 per year for a watchlist
sent out each day by e-mail.
Technorati is the creation of this article’s coauthor, David
Sifry, who is a cofounder of Linuxcare and Sputnik. It’s a good
example of a LAMP program–one based on the de facto platform of
Linux, Apache, MySQL and Python, PHP or Perl. Another LAMP creation
is Phillip Pearson’s Blogging Ecosystem, which keeps two Top 300 lists: one for the most-linked-to blogs and one for the blogs that do
the most linking.
Both Technorati and the Blogging Ecosystem are made possible in large part
by RSS, the XML dialect whose acronym means really simple syndication.
Thanks to RSS, every story you read on the Linux Journal web site
is syndicated automatically to anyone who wants
to read and point to it or aggregate it with other sources.
The Blogger API is another enabling infrastructure hack. It’s
used not only by Blogger (the most popular weblog system, in terms of sign-ups) but by Radio Userland and Movable Type, the other
two leading weblog systems. This API is what made it possible for one
of your authors to hack methods for posting to various breeds of blogs
through e-mail and Jabber. There are many other hacks just as there are
many other blogging systems. SourceForge alone lists dozens of weblog systems in various states of completion. In the LAMP vein, Geeklog and b2/CafeLog are PHP-based and use MySQL, as does Drupal. In fact, PHP-Nuke,
PostNuke, Drupal and Slashcode all are flexible enough to serve as weblog
systems. So is Rusty Foster’s Scoop, which is written in Perl,
as is Movable Type. Roller is written in Java for J2EE environments;
in fact, there is a whole community of Java bloggers. The list goes on,
and it’s a long one. So let’s break the list down a bit into four family trees.
Packages designed from the ground up specifically for
blogging include Radio Userland, Blogger, Movable Type, Greymatter,
Roller and b2/CafeLog. Most include advanced features like the Blogger
API, MetaWeblog API, RSS feeds and subscriptions and XML-RPC pings to
aggregator sites such as www.weblogs.com. Without RSS feeds and XML-RPC pings, blogs don’t get included in blog-based web services
like DayPop, Blogdex, and Technorati. You hear a lot about the theory
of web services, but in the blog world they’re easy to put
Also falling into this category is LiveJournal, an open-source project
with a large following that puts a high emphasis on community ties
and participation. It’s designed as a centralized system open
to many clients on different platforms, all of which are also open source. While
LiveJournal does RSS feeds, it doesn’t make XML-RPC pings to
aggregator sites. It also lacks some of the formatting characteristics
one associates with blogs, such as blogrolls in the margins, all of
which makes it less blog-like than the others in this family.
After the wild success of discussion sites like Slashdot and
Kuro5hin, a number of software packages emerged with Slashdot-like functionality. Slash is Slashdot’s own code base, Scoop is
Kuro5hin’s, and mod_virgule is the trust system on which Advogato is based. These first-generation packages allow you to set up story posting, site membership, comment moderation, topics or categories (with icons),
polls, post archives and so on. These cover the main requirements of
blogging tools, although many lack more-advanced functions, such as RSS
feeds and XML-RPC pings to aggregators. Given the essentially personal
nature of blogging, even on blogs with more than one author, features like karma and moderation are
The growing popularity of PHP invited the second generation of
discussion-oriented sites. PHP-Nuke launched the generation; but when
it went unmaintained, a number of programmers forked the code base and
created PostNuke, Geeklog and PostTEP, among others. This generation
also allows easy theming and better plugin management, so you can easily
write code that might go, say, in a sidebox on the site to perform a
specific task. PHP-Nuke is maintained again, by the way. This family also includes hybrids like PHP-Slash, which is Jay
Bloodworth’s port of Slash code from Perl and mod_perl to PHP.
Content Management Systems
Some blogging tools come out of the traditional web site content
management space, including support for multiple authors and permissions
and work-flow enforcement. This is the Vignette StoryServer tradition,
which includes Zope, which is more than just a CMS,
Nucleus and Drupal. Content management
sites provide for a more formalized model-view-controller approach, with
clear separations between content, markup and work flow. These tools
often are ideal for creating dynamic web sites with a lot of authors and
editors, but they tend to lack some of the advanced features found
in blogging tools, such as RSS feeds and XML-RPC pings. They also lack
Blogger and MetaWeblog APIs for posting and editing content from other
non-web browser-based applications.
Wikis are another way to create dynamic web sites. They allow
anyone to edit and mark up a page easily, while still maintaining
version control. Some blogging tools are based on Wiki code or on Wiki
ideas. These include SnipSnap, TWiki, Tiki, WikiWiki, MoinMoin and ZWiki. They offer some content management but focus mostly
on ease of editing and posting and speed of updates, neglecting such
blogging functions as dated entries and RSS feeds. Other families
worth noting are the Java-based blogging tools, such as Roller and
WebForum; and Python- and Ruby-based tools, such as Pyblosxom and tDiary, which is hot in Japan.
We list all these families because your particular needs may not
be restricted to blogging. But if blogs are what you want to do, or what you
want to support, you need to pay close attention to what works best for
Simply put, your blogging system doesn’t qualify for the label if
it can’t answer yes to the following questions:
- Can a user dynamically post to a site?
- Are posts easy to create, review and edit again after
- Can an administrator limit who posts to the front page?
- Can a user edit in a browser (at a minimum) or another tool of
his or her choice?
- Does its page format allow blogrolls and other sections outside
the daily posting area?
- Does it produce RSS feeds?
- Does every post have a permanent URL (or permalink)?
- Do current posts have unique URLs?
- Can search engines crawl the archives?
- Are the archives stable and safe from rot?
The big three–Blogger, Movable Type and Radio
Userland–qualify on all those grounds, because they were built
from the ground up as pure blogging systems. This is also why most of
the blogs listed in the Blogging Ecosystem’s Top 300 lists are
produced by big three tools.
We’ve been experimenting at Linux Journal with
various weblog systems, hosted on a server kindly provided by Penguin
Computing. Most of our efforts have been focused on Movable Type,
which is the only one of the big three that hosts on Linux and the
one that appears to have the most momentum in the Linux development
community. The source code is available to end users but not under
an open-source license.
Movable Type offers two licenses of its own: a free (as in beer) one
for noncommercial use and a $150 US one for commercial use. While this
disqualifies Movable Type as an option for writers and publishers,
including Linux Journal, who prefer to use free software, many
blogging organizations that support the Open Source and Free Software
movements, such as the Electronic Frontier Foundation and Creative
Commons, are using Movable Type.
So what are your choices here? If you want to put up a bare-bones blog and don’t mind if it
doesn’t say yes to all the questions listed above, LiveJournal is
a handy and popular choice. You’ll be left outside the Blogging
Ecosystem (as roughly defined by Phil Pearson’s aggregation site
by that name), but you’ll be using mostly GPLed open-source
software and be involved with a lively community.
If you want your blog to thrive in the Ecosystem and don’t care
too much about what’s happening on the back end (just as you
might not care what’s behind a Hotmail-type web e-mail system),
you might consider Blogger or Radio Userland. Both allow you to blog
from any server to which you can FTP data and serve HTTP. For example,
to do a Blogger blog, go to www.blogger.com, set up an account,
make it point to your FTP server with your user name, password and web
server HTML directory. In Debian the default is /var/www. On Red Hat,
it’s /home/httpd/html. This is a simple, easy-to-set-up blog system
and a popular choice–even for a hacker who doesn’t want to
be bothered setting up everything from scratch.
If you want a full-featured Linux-based blogging system for yourself
or your organization, and you don’t have a problem with its licensing
scheme, Movable Type is your best choice. Installing it is almost easy
enough for novices. It’s also extremely flexible, capable and easy
If platform and licensing issues keep you away from the big three, you
need to look among the discussion site and content management systems.
If you’re already using a PHP-based system such as PHP-Nuke, you
might consider adapting your current system or going to another one in
the same general family, such as Geeklog or Drupal. None are as easy to
install as Movable Type, but none are hard to maintain once you master
Command-line tools for blog editing and posting exist on Linux as
well. One example is Philip Myelin’s Bzero. Philip also is the
author of the Python Community Server and phpStorageSystem, both of which
are clones of the Userland Radio Community Server. Running one of these
applications on your web server allows you to host Radio-based blogs on
your Linux box.
If none of the alternatives suits your fancy, you might consider creating
or improving an open-source built-for-blogging system. Greymatter, which
is written in Perl and GPLed, had some good momentum going until 2001,
when Noah Grey decided he had better things to do. You could pick up
where he left off, or you could build a new blog system from scratch.
Countless options are available. URLDIR can generate a variety of feature
comparison tables that cover all the systems listed here and then
some. If you’re at the tire-kicking stage, it’s a good place
But don’t make a decision before looking at the blogs produced by
these different systems. Follow David Ogilvy’s classic advice to
companies looking for an advertising agency: “Look for work you
envy, and find out who does it.”