March 31, 2003
Heading to VON
Tomorrow (Tuesday April 1), I'll be speaking at Kevin Werbach's mini-Supernova
going on at the Spring VON conference
down at the San Jose Convention Center. I'll be on a panel from 3:30 - 4:45PM with Duncan Davidson
, Doc Searls
, and Kevin Werbach
, and the topic is Decentralized Communications
. Hey, just come and see "pretty boy" Davidson (Duncan, where did
you get that picture?) and the slim and svelte Searls, two of the best looking technology luminaries at the conference. Seriously, it should be a great time as we discuss weblogging, wifi, and web services. The folks at Pulver have also set up a blog
and a trackback page, so if you're at the show or just blogging about it, you can shoot over a trackback ping
to get listed.
Looking for a few good web designers (3/31/03)
Here's a request for all you web designers out there looking for some extra paying work: My brother, Micah L. Sifry
, is looking for an experienced web designer who can help him set up a weblog for his upcoming book, The Iraq War Reader
. The blog will be Movable Type
-based, so you've got to know MT's template and plugin system like nobody's business, and you've got to have an excellent design sense to create an effective MT site. He already has a bunch of graphics from the publisher, so this shouldn't require a tremendous amount of design work, just someone to fit everything together and make it all seamless.
This is going to be a great, timely, balanced blog that will gain the designer a lot of recognition. He does have a small budget to get the work done, but this isn't a corporate gravy train. Having said that, a good designer who can put together the basic design and translate it into the correct MT templates and style sheets can get some quick cash and some visible credit on the site.
He doesn't need to purchase web hosting, a pre-existing MT setup or any other technical infrastructure, he's already got that handled. He needs to find someone who is:
Drop me a line
- Available (he/she's ready to start immediately)
- Experienced (list a portfolio, please)
- Capable of working within a fixed budget
- Dedicated to excellent work
- Ideally, physically located in NYC, but remote work is OK.
or leave feedback on this blog item
and I'll pass the info on to him so he can directly respond to you.
March 21, 2003
Technorati's Current Events
Technorati's got a new feature called Current Events
that I just whipped up. It is a list of the top links to "professional" news sites by bloggers in the last two hours, along with comments and analysis. I created it because, like most people, I've been following the progress of the war, watching and reading the mass media, and I wanted to know what people out there were saying about the news. What are the most important stories? What is real, and what is propaganda? What is not being reported, or is being underreported? These were the questions on my mind when I created Technorati's Current Events. Ever since the Google purchase of Blogger
, the thing that struck me as the most compelling potential new feature was the combination of Google News
with Blogger users' commentary. Perhaps they'll still do it, but I think I just beat them to it.
I'm constantly amazed by the collective wisdom of a huge number of individuals, each publishing their thoughts, and voting their attention by linking to things. I wanted to tap into this collective brainpower, organize it, and present it back to us all.
Here's how it works: Since Technorati is already keeping track
of 150,000 blogs every hour (wow, we hit 150k today!), I tuned the engine to spot trends in recent events by only looking at blog posts in the previous two hours. This helps to increase churn on the page, as only articles and links that are immediately relevant will stay on top of the Current Events page. By the way, I'm not sure that two hours is the best balance of immediacy versus trivia, so I expect that I'll play around with it a bit as I have time, perhaps over the weekend, to tweak the settings to get things just right. The good news is that as more people take up blogging, the results should get better and better even as they get fresher and fresher. The page data is refreshed every 15 minutes, so one eigth of the links are always new, and one eigth are removed. The number in parentheses net to each result is the number of new links to that article in the previous two hours. Clicking on the (Cosmos) link shows you all of the bloggers who have linked to that article since it was published. And underneath each article is a set of short descriptions or context
, written by bloggers in the past two hours.
Would you kind readers be interested in seeing different views into the current events page? I could create one that allowed links over the last 12 hours, or the last 24 hours - but too much more history and the page will start to look the same as Blogdex
Or would you be interested in following other kinds of news? I've been thinking of implementing a categorization system, so people interested in sports can see results filtered towards those results, for example. Also, I've been thinking about the non-English-speaking bloggers out there, seen most often in the Interesting Newcomers
list. Would you be interested in seeing a set of language-specific Technorati lists?
Let me know your feedback
. I don't think that I'll have the time to implement anything soon, as I have a bunch of other very very interesting projects that are taking up the large majority of my time, and, as work projects, frankly demand a higher priority than Technorati and blogging. I'll still get in a few late night and weekend hacks on Technorati, but don't be surprised if you don't hear from me very much for the next month or so...
March 17, 2003
Rasmus Lerdorf at BALUG (San Francisco) Tuesday March 18, 7PM
Rasmus Lerdorf, creator of PHP
, will be giving a talk at the Bay Area Linux Users Group
at the Four Seas Restaurant
in San Francisco's Chinatown. Rasmus is a great speaker, and he always puts on a interesting talk. I haven't heard him talk in about a year, which means that he's sure to blow my mind with all of the cool new things you can do with PHP. Last year is was dynamically-generated flash animations
, I wonder what it'll be this time? Rasmus is sure to have lots to talk about as well, as he's been busy helping Yahoo
convert all of its web services to use PHP. Talk about real-world scalability issues! I can't wait.
You don't need to be a member of BALUG to come to the event, but it would be polite if you RSVP
'd so that we had a decent count of who was coming. BALUG is one of the oldest Linux Users groups in the world - it has been going continuously since 1994, and there's always a fantastic Chinese dinner
served before the speaker gets going. It costs $10 for the dinner, or you can eat elsewhere and just come for the talk. There will also be some good door prizes including books and Linux CDs, and the networking is always fun as well. And of course, dear reader, I will be there as the master of ceremonies. :-)
March 16, 2003
Zen and the Art of Bugfixing
I fixed a big Technorati
bug today. As the database has grown (it is now tracking over 135,000 blogs every hour) I've grown concerned about its performance. Sometimes queries would come back very quickly, and sometimes the site seemed incredibly bogged down - almost useless, even for simple queries. The worst part of it was that these performance slowdowns happened at infrequent intervals, and they seemed to be getting worse as the database grew. I fretted that I was facing a worst-case scenario - that I simply needed more RAM or disk spindles to increase the speed of the site. Even with the optimizations I did a few weeks ago
, performance had slowed again. Even worse, a number of daily housekeeping chores were taking longer and longer to complete, which meant that (a) the system load was higher than it needed to be because of these extra tasks running in the background, and (b) I was in danger of allowing the data to get out-of-date, and one of the things I like the best about Technorati is the freshness
of its data feeds.
This got me thinking about a great book called Zen and the Art of Motorcycle Maintenance
by Robert Pirsig
. This is one of my all-time favorite books. Like Thoreau
, it is one of those books that I keep taking down every few years and rereading. In it, Pirsig talks about the relationship of Science, Art, Engineering, and Zen, and the enormous rift in our culture created when so many people rely on technology but so few understand it. He tells this story with analogies to motorcycle maintenance in particular, all the while discussing the scientific method - observation, hypothesis, experimentation, and analysis.
Sometimes we observe a problem and immediately jump to a conclusion - oh, that's happening because I just disconnected the battery, no wonder the lights don't work
. Sometimes, we need to compare the problem with our mental map of how things should work, and then use logical deduction to figure out what is going wrong - think of Click and Clack's uncanny diagnostic powers on Car Talk
; they can often figure out someone's problem just by knowing the make and model of car (mental map) and then by asking some probing questions (does it squeal when you're in neutral?) can often figure out what is wrong with a car within a sound-bite interval, along with witty comments.
But sometimes you're stuck. You can't figure out why something is going wrong. Your usual swami-like powers are just not clicking when it comes to this problem. So you get frustrated. This is a very dangerous time - Pirsig calls this a gumption trap
opportunity. Because you're stuck, you search for answers, and are willing to take blind chances in order to fix the problem expeditiously. That's usually right about when all hell breaks loose
, and you really start to fuck things up. So Pirsig recommends breaking out the big artillery, the gumption trap killer, the big monster: The scientific method. In Pirsig's words:
When I think of formal scientific method an image sometimes comes to mind of an enormous juggernaut, a huge bulldozer...slow, tedious lumbering, laborious, but invincible. It takes twice as long, five times as long, maybe a dozen times as long as informal mechanic's techniques, but you know in the end you're going to get it. There's no fault isolation problem in motorcycle maintenance that can stand up to it. When you've hit a really tough one, tried everything, racked your brain and nothing works, and you know that this time Nature has really decided to be difficult, you say, ``Okay, Nature, that's the end of the nice guy,'' and you crank up the formal scientific method.
For this you keep a lab notebook. Everything gets written down, formally, so that you know at all times where you are, where you've been, where you're going and where you want to get. In scientific work and electronics technology this is necessary because otherwise the problems get so complex you get lost in them and confused and forget what you know and what you don't know and have to give up. In cycle maintenance things are not that involved, but when confusion starts it's a good idea to hold it down by making everything formal and exact. Sometimes just the act of writing down the problems straightens out your head as to what they really are.
The logical statements entered into the notebook are broken down into six categories: (1) statement of the problem, (2) hypotheses as to the cause of the problem, (3) experiments designed to test each hypothesis, (4) predicted results of the experiments, (5) observed results of the experiments and (6) conclusions from the results of the experiments. This is not different from the formal arrangement of many college and high-school lab notebooks but the purpose here is no longer just busywork. The purpose now is precise guidance of thoughts that will fail if they are not accurate.
The real purpose of scientific method is to make sure Nature hasn't misled you into thinking you know something you don't actually know. There's not a mechanic or scientist or technician alive who hasn't suffered from that one so much that he's not instinctively on guard. That's the main reason why so much scientific and mechanical information sounds so dull and so cautious. If you get careless or go romanticizing scientific information, giving it a flourish here and there, Nature will soon make a complete fool out of you. It does it often enough anyway even when you don't give it opportunities. One must be extremely careful and rigidly logical when dealing with Nature: one logical slip and an entire scientific edifice comes tumbling down. One false deduction about the machine and you can get hung up indefinitely.
That's where I was at with regard to Technorati's performance. Things were too unpredictable, and I couldn't figure out why there were problems - only that a problem did indeed exist. So, I broke out the scientific method. The first thing I did was to state the problem and to start observing.
is the database I've been using to backend the Technorati link data. It has a great reputation - robust, fast, and it has nearly all of the features you'd expect in a SQL database. Lots of people use it, and its code is open source
, which means that its bugs are few and far-between. I also know it pretty well, so I was unafraid to make it Technorati's backbone. I dug into the MySQL manuals
, and found an interesting log file configuration parameter - the "log-slow-queries
" configuration file. By turning this on, I started to collect a log of all of the queries that took a long time to process - observations for my scientific method log book. I also delved deep into MySQL's analysis tool, called the "EXPLAIN
" command. Using it, I could find out why a certain query was taking a long time; was it hogging the CPU? Was it chewing through disk accesses? Was it not using a database index? This was my experimental playground. Given enough observations (slow queries), I could run them through my test scenarios (individual explanations) and see what happened as I performed experiments on the database.
The first thing that I found out is that MySQL locks the entire database table
when it does an INSERT or an UPDATE on a table. What that means is that all queries into the database are locked out while the Technorati spider is adding newly refreshed blogs into the database. I found that by batching INSERTs and UPDATES and by using MySQL's LOW_PRIORITY
flag, I could significantly reduce the latency of database queries - which meant that interactive performance of the site rose. Good news!
Unfortunately, that didn't entirely solve the problem. I kept seeing some really slow database calls show up in the slow-queries log, often taking anywhere from 60-180 seconds to complete. That's unacceptable, most people will just click reload on their browser, which sends off ANOTHER query, loading down the database even further. Other people will just get frustrated with it and will go elsewhere. Not good.
After I had a week's worth of slow query data, I sat down with it and looked for patterns. Something niggled at my brain. I looked more closely. Then I had it. Almost all of the slow queries came from people requesting information on sites that had an underscore in the domain name or in the URL. In other words, people were looking for the link cosmos
for sites like "http://p_o_l_e_c_a_t.blogspot.com
", and the queries were taking forever to execute. What I remembered is that the underscore is a character that has a special meaning in MySQL queries - it is a wildcard character, which means it can stand for any character in the alphabet. So when doing a search on the URL above, instead of making one database query, MySQL was actually making tens of thousands of queries, trying out each alternative of the wildcard. All I needed to do was to tell MySQL not to treat the underscore as a special character anymore, and it just might solve the performance problem.
Lo and behold, it did. Previously, a search on "http://p_o_l_e_c_a_t.blogspot.com
" took 82 seconds and searched through 180,785 rows of the database. Now, it takes less than a hundredth of a second and searches through 31 rows. All of a sudden, Technorati started firing on all cylinders again. Since a small number of queries were no longer hogging the database, all of the remaining queries got more of a chance to run, and executed even more quickly. Response time has returned to acceptable levels. I am a happy man, and even though I had to pull out the enormous bulldozer of the scientific method, wait a week for some decent observations, and spend time pulling my hair out trying to figure it out on my own, my faith is unshaken and I stand victorious.
March 11, 2003
Micah Alpern's BlogSearch
has an interesting idea - sometimes you want to search the backup brains
of people you respect. Next best thing? Search their RSS feeds.
I like it.
Big Cometa, Intel WiFi announcements
Yahoo News is reporting
on upcoming Cometa
announcements: they are reporting that Cometa closed the contract to set up Wi-Fi access for McDonalds and Borders. The McDonalds deal is small in initial rollout - only 10 locations in NYC to start, then 300 in 3 cities - NYC, Chicago, and an unnamed California town (could it be Cometa's hometown of San Francisco? Here's hoping!) but has tremendous potential, given the number of McDonalds across the US and around the world. The Borders deal is a larger initial rollout, with 400 bookstores getting WiFi access.
The pricing is interesting too: The McDonalds service will be free for an hour if an extra value meal is purchased, then $3 per hour.
Expect more to come as Intel readies its PR mothership over the new Centrino
release tomorrow. Hilton, Mariott, Sheraton, Westin and W hotels will tout wireless access points in hundreds of hotels in the United States, Canada, the United Kingdom and Germany, and SFO has already announced
its new WiFi rollout.
Can anyone say runaway train? What a great day, because this is the tip of the WiFi iceberg - and all sorts of problems will crop up in these deployments, large and small - security, managability, maintenance, and operational costs needed to keep all of those network connections and wireless connections up and running. Hint, hint, hint
. Don't forget the single largest cost in doing these kinds of widespread deployments - it is the cost of the putting someone in a truck to install/maintain/repair a system in the field. At a minimum of $500 per truck roll, whoever can minimize engineer field time will win this race. Stay tuned.
March 9, 2003
Building with Blogs
Below is the text of the Linux Journal
cover story that Doc Searls
and I wrote for the February issue.
Click on the MORE link
to read the entire article.
Name a topic with a community of interest
around it. Now go to Google
and look it up. There's a good chance one or more of the top results will include somebody's weblog
(aka blog). Let's take three examples:
Blogs succeed largely because they are extremely native to the Web as Tim Berners-Lee conceived it in the first place. Here's how weblog software pioneer Dave Winer explains it:
The first weblog was the first web site, info.cern.ch,
the site built by Tim Berners-Lee at CERN. From this page TBL pointed
to all the new sites as they came on-line. Luckily, the content of this
site has been archived at the World Wide Web Consortium. (Thanks to Karl
Dubost for the link.)
Linking to sources and crediting them (as Dave does in that last line) has always been a native ethical and journalistic practice on the Web. While big-time broadcasters, publishers and VC-funded wannabes continue to see the Net as nothing more than a plumbing system for
distributing ``content'' to ``consumers'', blogging software developers have quietly added enormous value to the linking and crediting functionality of the Web. Dave and other independent developers
have created standards like XML-RPC
, plus open APIs that together turn the Web into a writing and publishing medium like nothing we've ever seen before.
Blogging is not about ``architecting'', ``building'', ``designing'' or ``authoring'' anything, because
blogs aren't ``sites'' in the usual sense. Blogs are journals. With blogs you write directly on the Web. Most posts are short, though they don't have to be. All are topical and current, or they
disappear from the aggregation sites and services and eventually from the ``blogrolls'' of listed favorite links on other blogs.
Each blog is like a fireplace, and each post is like a log heaved on top to keep the fire burning. Every post has its own ``permalink'',so others can point directly to it. As long as a blog puts out heat and
light, others who care about the author's subject are drawn to it. So are Google and other search engines, which sift constantly through the ashes.
At their best, blogs are link magnets as well as sources of links,
which is why Google likes them so much. Google equates inbound links
with authority and ranks the results accordingly. More links from more
highly linked-to pages result in higher page ranks. That's why
so many blogs rise to the top of so many subject searches. The whole
system--which includes blogs, aggregators, web services and Google
itself--feeds, builds and grows on itself. It also attracts and feeds
on the RSS streams offered up by discussion and news sites ranging
from Slashdot and Linux Journal to the New York Times. RSS is
one among a growing number of free and open technologies created and/or
improved by weblog developers.
As weblogs account for more and more of the traffic in knowledge about a
given subject, they become powerful instruments for hacking common wisdom.
In many categories, they are moving ahead of mainstream journals and
portals and building useful community services where over-funded dot-com
efforts failed spectacularly. One example from that last category is John
Hiler's Cityblogs, which appeared in December 2002. Hiler explains:
Local sites are caught between a rock and a hard place: either
they hire expensive full-time writers to create content they can't
afford or they fire their writers and turn to automated content:
weather reports, local news and movie listings.
It's a Gordian knot--hire expensive writers or you have no
content--that blogs are uniquely positioned to cut in two. Now
that I've set up the site, I can cover these three categories in
an hour or two a day. As Glenn ``Instapundit'' Reynolds put it at a recent conference on weblogs, ``blogging is cheap''.
So blogging is becoming an option in all kinds of places, which
means there's a good chance that you, as a Linux Journal
reader, fall into one or both of two groups:
- Users who want to set up a blog and start writing on the Web.
- System administrators and others with scripting and programming
skills, who are either looking to set up a blogging system or to manage
or change a system that's already in place.
To get a handle on both, let's go back to Google.
For better or worse, Google is a commercial company whose
services have become de facto web infrastructure. This is especially
true for blogs, which make liberal use not only of Google's search
engine but also of its APIs, which allow automated queries from programs.
Google's APIs are part of a growing raft of sites and services,
mostly hacked together by enterprising independent developers.
Technorati (see Sidebar), for example, pays attention to fresh links between
blogs (leapfrogging referrer logs) and organizes the information
into ``watchlists'' and other useful listings. If you want
Technorati to tell you who's currently linking to your blog, you
can ask for this information on the site or pay $5 per year for a watchlist
sent out each day by e-mail.
The Technorati Story: How a New Web-Services
Product Review Grew Out of a Research Assignment
Technorati is the creation of this article's coauthor, David
Sifry, who is a cofounder of Linuxcare and Sputnik. It's a good
example of a LAMP program--one based on the de facto platform of
Linux, Apache, MySQL and Python, PHP or Perl. Another LAMP creation
is Phillip Pearson's Blogging Ecosystem, which keeps two Top 300 lists: one for the most-linked-to blogs and one for the blogs that do
the most linking.
Both Technorati and the Blogging Ecosystem are made possible in large part
by RSS, the XML dialect whose acronym means really simple syndication.
Thanks to RSS, every story you read on the Linux Journal web site
is syndicated automatically to anyone who wants
to read and point to it or aggregate it with other sources.
The Blogger API is another enabling infrastructure hack. It's
used not only by Blogger (the most popular weblog system, in terms of sign-ups) but by Radio Userland and Movable Type, the other
two leading weblog systems. This API is what made it possible for one
of your authors to hack methods for posting to various breeds of blogs
through e-mail and Jabber. There are many other hacks just as there are
many other blogging systems. SourceForge alone lists dozens of weblog systems in various states of completion. In the LAMP vein, Geeklog and b2/CafeLog are PHP-based and use MySQL, as does Drupal. In fact, PHP-Nuke,
PostNuke, Drupal and Slashcode all are flexible enough to serve as weblog
systems. So is Rusty Foster's Scoop, which is written in Perl,
as is Movable Type. Roller is written in Java for J2EE environments;
in fact, there is a whole community of Java bloggers. The list goes on,
and it's a long one. So let's break the list down a bit into four family trees.
Packages designed from the ground up specifically for
blogging include Radio Userland, Blogger, Movable Type, Greymatter,
Roller and b2/CafeLog. Most include advanced features like the Blogger
API, MetaWeblog API, RSS feeds and subscriptions and XML-RPC pings to
aggregator sites such as www.weblogs.com. Without RSS feeds and XML-RPC pings, blogs don't get included in blog-based web services
like DayPop, Blogdex, and Technorati. You hear a lot about the theory
of web services, but in the blog world they're easy to put
Also falling into this category is LiveJournal, an open-source project
with a large following that puts a high emphasis on community ties
and participation. It's designed as a centralized system open
to many clients on different platforms, all of which are also open source. While
LiveJournal does RSS feeds, it doesn't make XML-RPC pings to
aggregator sites. It also lacks some of the formatting characteristics
one associates with blogs, such as blogrolls in the margins, all of
which makes it less blog-like than the others in this family.
After the wild success of discussion sites like Slashdot and
Kuro5hin, a number of software packages emerged with Slashdot-like functionality. Slash is Slashdot's own code base, Scoop is
Kuro5hin's, and mod_virgule is the trust system on which Advogato is based. These first-generation packages allow you to set up story posting, site membership, comment moderation, topics or categories (with icons),
polls, post archives and so on. These cover the main requirements of
blogging tools, although many lack more-advanced functions, such as RSS
feeds and XML-RPC pings to aggregators. Given the essentially personal
nature of blogging, even on blogs with more than one author, features like karma and moderation are
The growing popularity of PHP invited the second generation of
discussion-oriented sites. PHP-Nuke launched the generation; but when
it went unmaintained, a number of programmers forked the code base and
created PostNuke, Geeklog and PostTEP, among others. This generation
also allows easy theming and better plugin management, so you can easily
write code that might go, say, in a sidebox on the site to perform a
specific task. PHP-Nuke is maintained again, by the way. This family also includes hybrids like PHP-Slash, which is Jay
Bloodworth's port of Slash code from Perl and mod_perl to PHP.
Content Management Systems
Some blogging tools come out of the traditional web site content
management space, including support for multiple authors and permissions
and work-flow enforcement. This is the Vignette StoryServer tradition,
which includes Zope, which is more than just a CMS,
Nucleus and Drupal. Content management
sites provide for a more formalized model-view-controller approach, with
clear separations between content, markup and work flow. These tools
often are ideal for creating dynamic web sites with a lot of authors and
editors, but they tend to lack some of the advanced features found
in blogging tools, such as RSS feeds and XML-RPC pings. They also lack
Blogger and MetaWeblog APIs for posting and editing content from other
non-web browser-based applications.
Wikis are another way to create dynamic web sites. They allow
anyone to edit and mark up a page easily, while still maintaining
version control. Some blogging tools are based on Wiki code or on Wiki
ideas. These include SnipSnap, TWiki, Tiki, WikiWiki, MoinMoin and ZWiki. They offer some content management but focus mostly
on ease of editing and posting and speed of updates, neglecting such
blogging functions as dated entries and RSS feeds. Other families
worth noting are the Java-based blogging tools, such as Roller and
WebForum; and Python- and Ruby-based tools, such as Pyblosxom and tDiary, which is hot in Japan.
We list all these families because your particular needs may not
be restricted to blogging. But if blogs are what you want to do, or what you
want to support, you need to pay close attention to what works best for
Simply put, your blogging system doesn't qualify for the label if
it can't answer yes to the following questions:
- Can a user dynamically post to a site?
- Are posts easy to create, review and edit again after
- Can an administrator limit who posts to the front page?
- Can a user edit in a browser (at a minimum) or another tool of
his or her choice?
- Does its page format allow blogrolls and other sections outside
the daily posting area?
- Does it produce RSS feeds?
- Does every post have a permanent URL (or permalink)?
- Do current posts have unique URLs?
- Can search engines crawl the archives?
- Are the archives stable and safe from rot?
The big three--Blogger, Movable Type and Radio
Userland--qualify on all those grounds, because they were built
from the ground up as pure blogging systems. This is also why most of
the blogs listed in the Blogging Ecosystem's Top 300 lists are
produced by big three tools.
We've been experimenting at Linux Journal with
various weblog systems, hosted on a server kindly provided by Penguin
Computing. Most of our efforts have been focused on Movable Type,
which is the only one of the big three that hosts on Linux and the
one that appears to have the most momentum in the Linux development
community. The source code is available to end users but not under
an open-source license.
Movable Type offers two licenses of its own: a free (as in beer) one
for noncommercial use and a $150 US one for commercial use. While this
disqualifies Movable Type as an option for writers and publishers,
including Linux Journal, who prefer to use free software, many
blogging organizations that support the Open Source and Free Software
movements, such as the Electronic Frontier Foundation and Creative
Commons, are using Movable Type.
So what are your choices here? If you want to put up a bare-bones blog and don't mind if it
doesn't say yes to all the questions listed above, LiveJournal is
a handy and popular choice. You'll be left outside the Blogging
Ecosystem (as roughly defined by Phil Pearson's aggregation site
by that name), but you'll be using mostly GPLed open-source
software and be involved with a lively community.
If you want your blog to thrive in the Ecosystem and don't care
too much about what's happening on the back end (just as you
might not care what's behind a Hotmail-type web e-mail system),
you might consider Blogger or Radio Userland. Both allow you to blog
from any server to which you can FTP data and serve HTTP. For example,
to do a Blogger blog, go to www.blogger.com, set up an account,
make it point to your FTP server with your user name, password and web
server HTML directory. In Debian the default is /var/www. On Red Hat,
it's /home/httpd/html. This is a simple, easy-to-set-up blog system
and a popular choice--even for a hacker who doesn't want to
be bothered setting up everything from scratch.
If you want a full-featured Linux-based blogging system for yourself
or your organization, and you don't have a problem with its licensing
scheme, Movable Type is your best choice. Installing it is almost easy
enough for novices. It's also extremely flexible, capable and easy
If platform and licensing issues keep you away from the big three, you
need to look among the discussion site and content management systems.
If you're already using a PHP-based system such as PHP-Nuke, you
might consider adapting your current system or going to another one in
the same general family, such as Geeklog or Drupal. None are as easy to
install as Movable Type, but none are hard to maintain once you master
Command-line tools for blog editing and posting exist on Linux as
well. One example is Philip Myelin's Bzero. Philip also is the
author of the Python Community Server and phpStorageSystem, both of which
are clones of the Userland Radio Community Server. Running one of these
applications on your web server allows you to host Radio-based blogs on
your Linux box.
If none of the alternatives suits your fancy, you might consider creating
or improving an open-source built-for-blogging system. Greymatter, which
is written in Perl and GPLed, had some good momentum going until 2001,
when Noah Grey decided he had better things to do. You could pick up
where he left off, or you could build a new blog system from scratch.
Countless options are available. URLDIR can generate a variety of feature
comparison tables that cover all the systems listed here and then
some. If you're at the tire-kicking stage, it's a good place
But don't make a decision before looking at the blogs produced by
these different systems. Follow David Ogilvy's classic advice to
companies looking for an advertising agency: ``Look for work you
envy, and find out who does it.''
The Technorati Story: How a New Web-Services Product Grew out of a Research Assignment
Below is the text of the Linux Journal
sidebar on Technorati that I wrote for the February issue.
Click on the MORE link
to read the entire article.
When Doc and I started doing research for this feature, I was
still something of a blogging neophyte. While I was experienced with
all the components of the LAMP platform (Linux, Apache, MySQL, PHP,
Perl and Python), had an account with Advogato
and had even set up my own blog using Moveable Type
, I was mostly a blog software consumer. This project gave me a chance to hack.
After we set up our blog software server (kindly loaned to us by Penguin Computing), I installed Moveable Type and immediately looked for ways to improve it. First I wrote an e-mail-to-blog and a Jabber-to-blog set of interfaces, so I could post using my favorite e-mail and IM clients. (I've always believed the web browser is a lousy web-services client--something we all intuitively know whenever filling out a long
form on the web.) Then I began to think about the kinds of web services I'd like to see as a regular blogger.
The result, after three weekends of hacking, is Technorati.com, a new site that provides four services:
- Link Cosmos shows you what blogs are linking to your blog, or any other blog or any arbitrary URL. Every time another blog
saves a post, it sends out an RSS notification that Technorati receives,
puts in a database and lists on request in your current cosmos. This is
a major advance over referrer logs, which only show followed links.
- Google Rank shows you the top 100 sites on Google for a given search term. Technorati rechecks Google's rankings daily,
so you can see how the rankings change over time.
- Google Juice tells you where your blog ranks in the top 1,000 for any search term.
- Watchlists keep and track historical information about a given site, allowing you to see new links to your site quickly as well
as historical ranking information not shown in standard views.
Watchlists are how Technorati answers the question ``How can
you make money with web services?'' For $5 a year you get a daily
e-mail with your latest cosmos listings. For $10 a year you get instant
access to live watchlist information through an RSS feed.
As I write this, Technorati is only a few days old, and it's
already been to the top of both Blogdex and DayPop. It's also made about $250. (update: that number is up to about $2,000 now)
Technorati is brand new, a work-in-progress and invited into the world
by a plethora of open-source tools and open protocols, many of which are
products of the blogging development community. I used a LAMP combination
with Linux, Apache, MySQL and PHP for the live scripting and Perl for
the back-end web robot and other back-end tasks. XML-RPC pings power the
activation of the web spider, so Technorati is always full of up-to-date
information. It exports RSS feeds for people who want to view output in
a more structured format, such as an RSS browser. I set up payment and
billing through PayPal's open APIs. These use HTTP POST methods
that easily perform credit-card processing and billing.
In programming we stand on the shoulders of giants. So I'd like to
run a bunch of people in Technorati's credits: Dave Winer, who wrote
XML-RPC and was a driving force behind SOAP and RSS; Ben and Mena Trott,
who wrote Movable Type; Rasmus Lerdorf and many others who developed PHP; Larry Wall and many others who developed Perl; the entire Apache team; Monty Widenius, David Axmark and the MySQL development team; Evan Willams and the folks behind the Blogger API; the Google team behind the Google API; and, of course, Linus Torvalds and the Linux development community.
March 6, 2003
Ammunition for your cluestick
are at it again. The dynamic duo, half of the Cluetrain Manifesto
authorship, have put up a new site called World of Ends
, and it is a must-read for anyone who wants a clue about what the Internet is all about. In much the same way as they espoused in Cluetrain that markets are conversations
, here the fundamental premise is that the Internet isn't a thing, nor is it really a place, but it is an agreement
- an agreement to pass information between networks that cooperate. And from that flow the big 3 properties of the net, abbreviated as NEA: Nobody owns it
, Everybody can use it
, Anyone can improve it
March 5, 2003
Over 100,000 Blogs served
has broken the 100,000 blog barrier. We are now actively tracking over 100,000 blogs, in near real-time, meaning that new blog entries show up in Technorati in under an hour after they are posted.
The Interesting Recent Blogs
and Interesting Newcomers
lists have proven fruitful as well - I found a few interesting new blogs today - one of them is a news service offered by the Mainichi Shimbun called "Wai Wai
", which is a selection of wacky news from the Japanese press, translated by the Mainichi
. I always loved the Japanese weeklies, they add a whole new meaning to the term "tabloid", and they're always full of interesting opinion on the underside of the Japanese culture. Some interesting recent links: "Moral decline sees increase in child abuse, mothers sleeping with sons
", "Masturbation diddling with Married Life
", and the ever-popular "Dominatrix whips up donations for refugees
". Great stuff.
March 3, 2003
RSS 2.0 for Popular Technorati Feeds
I had a few minutes tonight to put up the auto-generated RSS feeds for three of Technorati
Technorati Top 100
: The blogging A-List, based on a linear ranking of the number of incoming links from other blogs. I recently removed a bunch of sites that really aren't weblogs themselves, like CNN, Google,
and blogging software sites like Blogger, Movable Type
. Of course you can still see the Link Cosmos
es for these sites, but I felt that they weren't really blogs or they didn't provide significant interesting information to bloggers. I decided to leave in the blogging news sites, like Daypop
, Popdex, Blogdex,
and Technorati. OK, so call me biased, but I felt that those sites, even though they are largely automated, provide a significant source of content to bloggers by aggregating blogs. Hey, if you don't like it, leave a comment and an argument for why some site should (or shouldn't) be in the Top 100.
Interesting Recent Blogs
: A list of blogs that people have been talking about for the last 48 hours. The Blogging A-List
is eligible to be on this list, but it is based on the ratio of new links to existing links, so you've got to do something to significantly increase your flow to get on this list.
: A list that is similar to the Interesting Recent Blogs
list above, but does not include the A-List bloggers, and has a lower threshold so that bloggers who haven't been around for as long or who are "up-and-comers" can get more easily noticed.
March 1, 2003
At the Stanford Spectrum Policy Conference
I'm here at the very interesting Spectrum Policy Conference. The discussions are very very interesting. Unfortunately, for the first 90 minutes, I couldn't really listen to the speakers (thank goodness they're archiving the feeds) because I was setting up an emergency Sputnik Network using the beta Sputnik-powered Access Point
I brought. I admit, I sort of figured that there would be some problems
with Stanford's Cisco-based WiFi network, based on MAC address filtering
. Basically, to get a MAC address-based network to work well, you've got to have everyone at the conference pre-register their MAC address with the IT department. Of course, this doesn't scale. Some people will miss the preregistration email, some won't bring the WiFi card they registered, some will make errors in their reporting, perhaps even the IT staff will finger-slip a few of the addresses on the list (after all it is a gobbeldygook
of 17 pseudorandom characters) or some people will register at the door, and thus won't be able to get access to the network.
Anyway, because of this, I pulled out my handy-dandy Sputnik-powered AP, grabbed a hub from one of the organizers, and within minutes had a custom captive portal set up for the conference. A bunch of people are using the network or have written about it: Dan Gillmor
, Cory Doctorow
, David Isenberg
, Dave Winer
, and others.
Now that the network is working and handling the load, I can get back to sitting back and grokking the conference. More to come.