Posted by David Sifry on March 31, 2003 at 11:03 pm
Tomorrow (Tuesday April 1), I’ll be speaking at Kevin Werbach’s mini-Supernova going on at the Spring VON conference down at the San Jose Convention Center. I’ll be on a panel from 3:30 – 4:45PM with Duncan Davidson, Doc Searls, and Kevin Werbach, and the topic is Decentralized Communications. Hey, just come and see "pretty boy" Davidson (Duncan, where did you get that picture?) and the slim and svelte Searls, two of the best looking technology luminaries at the conference. Seriously, it should be a great time as we discuss weblogging, wifi, and web services. The folks at Pulver have also set up a blog and a trackback page, so if you’re at the show or just blogging about it, you can shoot over a trackback ping to get listed.
Continue Reading
Posted by David Sifry on March 31, 2003 at 1:34 pm
Here’s a request for all you web designers out there looking for some extra paying work: My brother, Micah L. Sifry, is looking for an experienced web designer who can help him set up a weblog for his upcoming book, The Iraq War Reader. The blog will be Movable Type-based, so you’ve got to know MT’s template and plugin system like nobody’s business, and you’ve got to have an excellent design sense to create an effective MT site. He already has a bunch of graphics from the publisher, so this shouldn’t require a tremendous amount of design work, just someone to fit everything together and make it all seamless.
This is going to be a great, timely, balanced blog that will gain the designer a lot of recognition. He does have a small budget to get the work done, but this isn’t a corporate gravy train. Having said that, a good designer who can put together the basic design and translate it into the correct MT templates and style sheets can get some quick cash and some visible credit on the site.
He doesn’t need to purchase web hosting, a pre-existing MT setup or any other technical infrastructure, he’s already got that handled. He needs to find someone who is:
- Reliable
- Available (he/she’s ready to start immediately)
- Experienced (list a portfolio, please)
- Capable of working within a fixed budget
- Dedicated to excellent work
- Ideally, physically located in NYC, but remote work is OK.
Drop me a line or leave
feedback on this blog item and I’ll pass the info on to him so he can directly respond to you.
Continue Reading
Posted by David Sifry on March 21, 2003 at 12:43 am
Technorati’s got a new feature called Current Events that I just whipped up. It is a list of the top links to "professional" news sites by bloggers in the last two hours, along with comments and analysis. I created it because, like most people, I’ve been following the progress of the war, watching and reading the mass media, and I wanted to know what people out there were saying about the news. What are the most important stories? What is real, and what is propaganda? What is not being reported, or is being underreported? These were the questions on my mind when I created Technorati’s Current Events. Ever since the Google purchase of Blogger, the thing that struck me as the most compelling potential new feature was the combination of Google News with Blogger users’ commentary. Perhaps they’ll still do it, but I think I just beat them to it.
I’m constantly amazed by the collective wisdom of a huge number of individuals, each publishing their thoughts, and voting their attention by linking to things. I wanted to tap into this collective brainpower, organize it, and present it back to us all.
Here’s how it works: Since Technorati is already keeping track of 150,000 blogs every hour (wow, we hit 150k today!), I tuned the engine to spot trends in recent events by only looking at blog posts in the previous two hours. This helps to increase churn on the page, as only articles and links that are immediately relevant will stay on top of the Current Events page. By the way, I’m not sure that two hours is the best balance of immediacy versus trivia, so I expect that I’ll play around with it a bit as I have time, perhaps over the weekend, to tweak the settings to get things just right. The good news is that as more people take up blogging, the results should get better and better even as they get fresher and fresher. The page data is refreshed every 15 minutes, so one eigth of the links are always new, and one eigth are removed. The number in parentheses net to each result is the number of new links to that article in the previous two hours. Clicking on the (Cosmos) link shows you all of the bloggers who have linked to that article since it was published. And underneath each article is a set of short descriptions or context, written by bloggers in the past two hours.
Would you kind readers be interested in seeing different views into the current events page? I could create one that allowed links over the last 12 hours, or the last 24 hours – but too much more history and the page will start to look the same as Blogdex or Daypop. Or would you be interested in following other kinds of news? I’ve been thinking of implementing a categorization system, so people interested in sports can see results filtered towards those results, for example. Also, I’ve been thinking about the non-English-speaking bloggers out there, seen most often in the Interesting Newcomers list. Would you be interested in seeing a set of language-specific Technorati lists?
Let me know your feedback. I don’t think that I’ll have the time to implement anything soon, as I have a bunch of other very very interesting projects that are taking up the large majority of my time, and, as work projects, frankly demand a higher priority than Technorati and blogging. I’ll still get in a few late night and weekend hacks on Technorati, but don’t be surprised if you don’t hear from me very much for the next month or so…
Continue Reading
Posted by David Sifry on March 17, 2003 at 4:53 pm
Rasmus Lerdorf, creator of PHP, will be giving a talk at the Bay Area Linux Users Group at the Four Seas Restaurant in San Francisco’s Chinatown. Rasmus is a great speaker, and he always puts on a interesting talk. I haven’t heard him talk in about a year, which means that he’s sure to blow my mind with all of the cool new things you can do with PHP. Last year is was dynamically-generated flash animations, I wonder what it’ll be this time? Rasmus is sure to have lots to talk about as well, as he’s been busy helping Yahoo convert all of its web services to use PHP. Talk about real-world scalability issues! I can’t wait.
You don’t need to be a member of BALUG to come to the event, but it would be polite if you RSVP‘d so that we had a decent count of who was coming. BALUG is one of the oldest Linux Users groups in the world – it has been going continuously since 1994, and there’s always a fantastic Chinese dinner served before the speaker gets going. It costs $10 for the dinner, or you can eat elsewhere and just come for the talk. There will also be some good door prizes including books and Linux CDs, and the networking is always fun as well. And of course, dear reader, I will be there as the master of ceremonies.
Continue Reading
Posted by David Sifry on March 16, 2003 at 1:57 am
I fixed a big Technorati bug today. As the database has grown (it is now tracking over 135,000 blogs every hour) I’ve grown concerned about its performance. Sometimes queries would come back very quickly, and sometimes the site seemed incredibly bogged down – almost useless, even for simple queries. The worst part of it was that these performance slowdowns happened at infrequent intervals, and they seemed to be getting worse as the database grew. I fretted that I was facing a worst-case scenario – that I simply needed more RAM or disk spindles to increase the speed of the site. Even with the optimizations I did a few weeks ago, performance had slowed again. Even worse, a number of daily housekeeping chores were taking longer and longer to complete, which meant that (a) the system load was higher than it needed to be because of these extra tasks running in the background, and (b) I was in danger of allowing the data to get out-of-date, and one of the things I like the best about Technorati is the freshness of its data feeds.
This got me thinking about a great book called Zen and the Art of Motorcycle Maintenance by Robert Pirsig. This is one of my all-time favorite books. Like Thoreau‘s Walden, it is one of those books that I keep taking down every few years and rereading. In it, Pirsig talks about the relationship of Science, Art, Engineering, and Zen, and the enormous rift in our culture created when so many people rely on technology but so few understand it. He tells this story with analogies to motorcycle maintenance in particular, all the while discussing the scientific method – observation, hypothesis, experimentation, and analysis.
Sometimes we observe a problem and immediately jump to a conclusion – oh, that’s happening because I just disconnected the battery, no wonder the lights don’t work. Sometimes, we need to compare the problem with our mental map of how things should work, and then use logical deduction to figure out what is going wrong – think of Click and Clack’s uncanny diagnostic powers on Car Talk; they can often figure out someone’s problem just by knowing the make and model of car (mental map) and then by asking some probing questions (does it squeal when you’re in neutral?) can often figure out what is wrong with a car within a sound-bite interval, along with witty comments.
But sometimes you’re stuck. You can’t figure out why something is going wrong. Your usual swami-like powers are just not clicking when it comes to this problem. So you get frustrated. This is a very dangerous time – Pirsig calls this a gumption trap opportunity. Because you’re stuck, you search for answers, and are willing to take blind chances in order to fix the problem expeditiously. That’s usually right about when all hell breaks loose, and you really start to fuck things up. So Pirsig recommends breaking out the big artillery, the gumption trap killer, the big monster: The scientific method. In Pirsig’s words:
When I think of formal scientific method an image sometimes comes to mind of an enormous juggernaut, a huge bulldozer…slow, tedious lumbering, laborious, but invincible. It takes twice as long, five times as long, maybe a dozen times as long as informal mechanic’s techniques, but you know in the end you’re going to get it. There’s no fault isolation problem in motorcycle maintenance that can stand up to it. When you’ve hit a really tough one, tried everything, racked your brain and nothing works, and you know that this time Nature has really decided to be difficult, you say, “Okay, Nature, that’s the end of the nice guy,” and you crank up the formal scientific method.
For this you keep a lab notebook. Everything gets written down, formally, so that you know at all times where you are, where you’ve been, where you’re going and where you want to get. In scientific work and electronics technology this is necessary because otherwise the problems get so complex you get lost in them and confused and forget what you know and what you don’t know and have to give up. In cycle maintenance things are not that involved, but when confusion starts it’s a good idea to hold it down by making everything formal and exact. Sometimes just the act of writing down the problems straightens out your head as to what they really are.
The logical statements entered into the notebook are broken down into six categories: (1) statement of the problem, (2) hypotheses as to the cause of the problem, (3) experiments designed to test each hypothesis, (4) predicted results of the experiments, (5) observed results of the experiments and (6) conclusions from the results of the experiments. This is not different from the formal arrangement of many college and high-school lab notebooks but the purpose here is no longer just busywork. The purpose now is precise guidance of thoughts that will fail if they are not accurate.
The real purpose of scientific method is to make sure Nature hasn’t misled you into thinking you know something you don’t actually know. There’s not a mechanic or scientist or technician alive who hasn’t suffered from that one so much that he’s not instinctively on guard. That’s the main reason why so much scientific and mechanical information sounds so dull and so cautious. If you get careless or go romanticizing scientific information, giving it a flourish here and there, Nature will soon make a complete fool out of you. It does it often enough anyway even when you don’t give it opportunities. One must be extremely careful and rigidly logical when dealing with Nature: one logical slip and an entire scientific edifice comes tumbling down. One false deduction about the machine and you can get hung up indefinitely.
That’s where I was at with regard to Technorati’s performance. Things were too unpredictable, and I couldn’t figure out why there were problems – only that a problem did indeed exist. So, I broke out the scientific method. The first thing I did was to state the problem and to start observing.
MySQL is the database I’ve been using to backend the Technorati link data. It has a great reputation – robust, fast, and it has nearly all of the features you’d expect in a SQL database. Lots of people use it, and its code is
open source, which means that its bugs are few and far-between. I also know it pretty well, so I was unafraid to make it Technorati’s backbone. I dug into the
MySQL manuals, and found an interesting log file configuration parameter – the "
log-slow-queries" configuration file. By turning this on, I started to collect a log of all of the queries that took a long time to process – observations for my scientific method log book. I also delved deep into MySQL’s analysis tool, called the "
EXPLAIN" command. Using it, I could find out why a certain query was taking a long time; was it hogging the CPU? Was it chewing through disk accesses? Was it not using a database index? This was my experimental playground. Given enough observations (slow queries), I could run them through my test scenarios (individual explanations) and see what happened as I performed experiments on the database.
The first thing that I found out is that MySQL
locks the entire database table when it does an INSERT or an UPDATE on a table. What that means is that all queries into the database are locked out while the Technorati spider is adding newly refreshed blogs into the database. I found that by batching INSERTs and UPDATES and by using MySQL’s
LOW_PRIORITY flag, I could significantly reduce the latency of database queries – which meant that interactive performance of the site rose. Good news!
Unfortunately, that didn’t entirely solve the problem. I kept seeing some really slow database calls show up in the slow-queries log, often taking anywhere from 60-180 seconds to complete. That’s unacceptable, most people will just click reload on their browser, which sends off ANOTHER query, loading down the database even further. Other people will just get frustrated with it and will go elsewhere. Not good.
After I had a week’s worth of slow query data, I sat down with it and looked for patterns. Something niggled at my brain. I looked more closely. Then I had it. Almost all of the slow queries came from people requesting information on sites that had an underscore in the domain name or in the URL. In other words, people were looking for the
link cosmos for sites like "
http://p_o_l_e_c_a_t.blogspot.com", and the queries were taking forever to execute. What I remembered is that the underscore is a character that has a special meaning in MySQL queries – it is a wildcard character, which means it can stand for any character in the alphabet. So when doing a search on the URL above, instead of making one database query, MySQL was actually making tens of thousands of queries, trying out each alternative of the wildcard. All I needed to do was to tell MySQL not to treat the underscore as a special character anymore, and it just might solve the performance problem.
Lo and behold, it did. Previously, a search on "
http://p_o_l_e_c_a_t.blogspot.com" took 82 seconds and searched through 180,785 rows of the database. Now, it takes less than a hundredth of a second and searches through 31 rows. All of a sudden, Technorati started firing on all cylinders again. Since a small number of queries were no longer hogging the database, all of the remaining queries got more of a chance to run, and executed even more quickly. Response time has returned to acceptable levels. I am a happy man, and even though I had to pull out the enormous bulldozer of the scientific method, wait a week for some decent observations, and spend time pulling my hair out trying to figure it out on my own, my faith is unshaken and I stand victorious.
Continue Reading
Posted by David Sifry on March 11, 2003 at 8:50 am
Yahoo News is reporting on upcoming Cometa and Intel announcements: they are reporting that Cometa closed the contract to set up Wi-Fi access for McDonalds and Borders. The McDonalds deal is small in initial rollout – only 10 locations in NYC to start, then 300 in 3 cities – NYC, Chicago, and an unnamed California town (could it be Cometa’s hometown of San Francisco? Here’s hoping!) but has tremendous potential, given the number of McDonalds across the US and around the world. The Borders deal is a larger initial rollout, with 400 bookstores getting WiFi access.
The pricing is interesting too: The McDonalds service will be free for an hour if an extra value meal is purchased, then $3 per hour.
Expect more to come as Intel readies its PR mothership over the new Centrino release tomorrow. Hilton, Mariott, Sheraton, Westin and W hotels will tout wireless access points in hundreds of hotels in the United States, Canada, the United Kingdom and Germany, and SFO has already announced its new WiFi rollout.
Can anyone say runaway train? What a great day, because this is the tip of the WiFi iceberg – and all sorts of problems will crop up in these deployments, large and small – security, managability, maintenance, and operational costs needed to keep all of those network connections and wireless connections up and running. Hint, hint, hint. Don’t forget the single largest cost in doing these kinds of widespread deployments – it is the cost of the putting someone in a truck to install/maintain/repair a system in the field. At a minimum of $500 per truck roll, whoever can minimize engineer field time will win this race. Stay tuned.
Continue Reading