David Sifry

Heading to Paris for Les Blogs

The SFO International Cafe

I'm sitting here at SFO getting ready for the long flight to Amsterdam, heading to Les Blogs. I got invited to the first one, organized by the remarkable Loic Lemeur, but I was unable to attend last year. After hearing about it, I resolved to make sure I didn't miss it this year. I'm also testing out some new photo workflow on my Mac - I've finally completely moved away from iPhoto - which, while fine for small collections, is just atrocious and buggy for avid photographers like myself. I recently plunked down the cash and bought iView Media Pro 3, and so far it has been meeting my expectations. It certainly has a much fuller set of features, more natural inclusion of RAW images into workflow, and much better natural metadata support. The only thinkg that bugs me now is the loss of simple imports of photos from my memory cards into the program, and I'm also debating the "one large catalog to hold them all" or the "many small catalogs all combined into sets" philosophies.

The Plane to Amsterdam, through a windowshade

In short I'm just surprised I even have time to think about this stuff right now, life and work have been so incredibly busy that I haven't had the few moments or tranquility of mind to sit down and just write. I hope that this short trip to Paris (my first!) will provide me with some time to step out of the trenches a bit and think about some more big picture stuff. Perhaps even blog more!

If you're going to be out in Paris at the show and you want to get together, drop me a line. And I'm always looking for interesting things to do and photograph during the off hours...

Technorati Tags: , , ,

Happy 3 year Birthday, Technorati!

Wow, it has been 3 years since the public release of Technorati, which I described in this blog post on November 27,2002. I'm frankly surprised to be doing what I'm doing with such a great team. Niall has put together an unofficial timeline of Technorati's life.

As a cursory read over what the original Technorati site did, you can see that our track record is about 50/50 - of the 4 features that I thought would be most interesting to people, two of them (Link Cosmos and Watchlists) made it, and are an integral part of our services today, and two of the features (Google Rank and Google Juice) got dropped. One of the interesting things that I've found during the three years since is that the success percentage has stayed about the same - about 50% of what we do (like tags, Blog Finder, Keyword Search, etc) have turned out successfully, a whole bunch of things we did were dead ends, missed opportunities, and outright failures.

And I gotta say, I'm awfully grateful to have had such a high success ratio. And I'm incredibly grateful to have the best users, customers, and partners in the world. Many thanks to our advisors and many friends of Technorati. You folks are great. Thanks for sticking by us during our tough times as well as during our successes.

We're redoubling our commitment to our corporate mantra: Be Of Service. That's what drives me and the whole team, to focus on being of service to you, our users, to our members, customers, partners, and to each other. I and the whole team look forward to your feedback.

Now, back to work for me!

Technorati Tags: , , ,

Technorati Performance Improvement Update

I'm incredibly proud of the Technorati Engineering and Ops teams. They have been working day and night to deal with all of the growth that Technorati has been going through. Back in July, we made it our highest priority to improve site performance, and I and our team members kept you updated in some earlier posts.

Here's some facts:

  • Since beginning our infrastructure improvements, Technorati's uptime has improved significantly - in fact, we had 100% uptime for searches last month!
  • According to GrabPerf, even while our overall traffic has increased, our response times have consistently decreased: We are currently averaging subsecond response times for all core products: Search (766 ms), single-word search (651 ms), and Tags (836 ms).
  • A quick comparison of search speed of various blog engines is below, and you can see the continuous improvements that our engineers have been making over the past weeks and months. Technorati is the light blue line that is sandwiched right in there with Google Blog Search (731 ms) and Yahoo Blog Search (835 ms).

Blogperf20051121

  • Here's another insight: Open the following three links in separate tabs in your browser, and click back and forth between them to get a quick view of the frequency distribution of search results: Technorati, Google, and Yahoo. The further the results are to the left hand side of the graph (consistently fast results), the better it is.
  • Technorati's index is the most comprehensive, and has the fastest updates. The index is over 3 years old, currently tracks over 21.5 million blogs, over 280 million unique posts and over 1.7 billion links are indexed. We are tracking over 60,000 new blogs each day, and over 700,000 new posts per day. Our median time to index is now under 3 minutes from the moment a blog post is created.

Notes: All engines noted are for the companies' blog search services. All of those graphs are updated dynamically, so don't be surprised if the numbers loook a little different when you click on them.

All this, and we rolled out a number of new features and improvements to current services as well. I'm just very, very proud of our team, and I'm really grateful to be able to work with them each day. And we're constantly striving to do better. Please send us feedback and let us know how we can make things even better for you - what are we missing? How can we be more of service to you?

Update: Many thanks to Randy Charles Morin for pointing out a typo in the stats on number of blogs indexed, I have updated the post to include the correct number of blogs and posts. I also added in the number of new blogs and posts we're tracking every day as well. Thanks, Randy!

Technorati Tags: , , , , , , , , , , , , , , ,

State of the Blogosphere, October 2005 Part 1: On Blogosphere Growth

It is that time of the year again, and I've got some new information on the continued growth of the blogosphere. I made this presentation as part of my 10 minute talk at Web 2.0 on October 6, 2005. You can download the entire presentation, complete with underlying data as well, for research use, or to make part of other presentations. All I ask is that you keep attribution and the Technorati logo in a prominent place wherever the data is used.

Earlier State of the Blogosphere reports are available as well - from July 2005, from
March 2005, and from October 2004.

So, What's New?

Well, first, the basics. The chart below shows the continued growth of the blogosphere. Technorati is now tracking 19.6 Million weblogs, and the total number of weblogs tracked continues to double about every 5 months. This trend has been consistent for at least the last 36 months. In other words, the blogosphere has doubled at least 5 times in the last 3 years. Another way of looking at it is that the blogosphere is now over 30 times as big as it was 3 years ago:

Slide0002-1

The next chart shows the number of new blogs tracked each day by Technorati. About 70,000 new weblogs are tracked every day, which is about a new weblog created each second, somewhere in the world. It also appears that blogging is taking off around the world, and not just in English. Some of the significant increases we've seen over the past 3 months have been due to a proliferation of chinese-speaking weblogs, both on MSN Spaces as well as on Chinese sites like blogcn.com.

Now that we've been tracking spam and fake blogs, we've included the daily tracking statistics for spam and fake blogs from June 1, 2005. We are currently tracking about 2% - 8% of new weblogs are fake or spam weblogs. They are represented as the red spikes that are over and above the legitimate (human-created and updated) blogs shown in blue below.

Slide0003-4

Spam Attacks

In the last couple of days, there's been a lot of talk about a set of spam blogs that have been set up to do keyword stuffing using a lot of popular phrases, including many popular bloggers' names. Lots of people have discussed this, including Tim Bray, Dave Winer, Ed Cone, Robert Scoble, Chris Pirillo, Jeff Jarvis, and others. In order to adequately analyze this, I updated the chart to include the blog data we've been tracking all the way up to yesterday, October 16, 2005.

In the past 2 weeks, there were 805,000 new weblogs created. In addition, Technorati tracked an additional 39,000 new fake and spam weblogs, which means that about 4.6% of the total weblogs tracked were fake or spam.

One of the remarkable things that comes out of looking at the data is that while spam and fake blogs are a problem, they are not an overwhelming problem - In fact, we've experienced much worse spam attacks in the past. The key difference in the spam attack over the weekend is that the attackers' posts included many popular search terms including popular bloggers' names - which is a common ego search on engines like Technorati. This made this particular attack much more visible to a number of high profile bloggers than attacks in the past.

A look at the posting volume over the last year is illustrative as well, and is included in the chart below:

Slide0004-4

You can see from the post statistics that there are on average, between 700,000 and 1.3 Million posts made each day. That's about 33,000 posts per hour. Spam and fake posts are reported here as well, and on average an additional 5.8% of posts (or about 50,000 posts/day) seen each day are spam or fake. This number changes on a daily basis as we track spam attacks, and have reached as high as an additional 18% over the regular daily volume.

One may argue that the numbers I'm reporting are way too low, or that Technorati isn't finding all of the spam weblogs out there. That's a legitimate argument, and by no means am I asserting that Technorati is capturing all of the spam and fake weblogs in existence. We know we're not getting them all, and every day we're working on improving our algorithms and data quality. However, our hard work means that we can still provide you with comprehensive timely results without having to do anything drastic, like removing a major hosting provider with millions of legitimate blogs from our indexes.

We're also working closely with the other players in the industry in order to close the gaps. In October, we helped organize the second Web 2.0 Spam summit, and representatives from Google, Yahoo, Microsoft, AOL, Six Apart, Tucows, Wordpress, Feedster, and many more companies and organizations participated. The summit was quite successful, and I expect that there will be many more to come.

To summarize:

  • As of October 2005, Technorati is now tracking 19.6 Million weblogs
  • The total number of weblogs tracked continues to double about every 5 months
  • The blogosphere is now over 30 times as big as it was 3 years ago, with no signs of letup in growth
  • About 70,000 new weblogs are created every day
  • About a new weblog is created each second
  • 2% - 8% of new weblogs per day are fake or spam weblogs
  • Between 700,000 and 1.3 Million posts are made each day
  • About 33,000 posts are created per hour, or 9.2 posts per second
  • An additional 5.8% of posts (or about 50,000 posts/day) seen each day are from spam or fake blogs, on average

What's Next?

Of course, one important question rears its head - how to make sense out of this monstrous onrush of conversation, and just get what you want - the best information from the most authoritative or influential people, in the most timely manner.

More on that in my next two posts, covering the growth of tags and of context in search and discovery.

Technorati Tags: , , , , , , , , , , , , , , , , , ,

Technorati Site Improvement Update #3

Our illustrious VP Of Engineering, Adam Hertz, has just posted a progress report on the service improvements , scalability increases, and performance gains that have been going on under the covers over the last couple of months, extensions to the work we began back in July and reported upon at the beginning of September.

More to come. Looking forward to seeing folks at the Web 2.0 conference next week - Technorati, along with del.icio.us, Flickr, Flock, Odeo, wink, and WordPress are all hosting a party on Thursday October 6 from 9PM at Swig at 561 Geary Street in San Francisco. Please stop by if you're in the city!

Technorati Tags: , , , , , ,

Technorati/Edelman Blogger PR Survey

Technorati and Edelman are partnering in an attempt to better understand how blogging and traditional PR intersect, and what bloggers think about communication from mainstream companies. Edelman is a global public relations firm representing brands such as Xbox, Nissan, and Dove. Working together, we created a 18-question survey to better understand the blogging community and your preferred methods of hearing from companies.Is receiving a press release from a PR agency just more spam? What about product discounts or free goods? Are there better ways for traditional marketers and bloggers to interact? What is the implicit contract created when marketers and bloggers communicate? What are the ethical questions? What are companies not listening to that they should be listening to? Please take a few minutes to answer the survey. This survey is intended as a starting point for discussion, and not a comprehensive be-all, end-all survey. Personally identifiable information is not tracked but you may explicitly give us permission upon submission to send you the final results of the survey. We will review the aggregated results and include the findings in a public white paper next month to help inform bloggers, companies, and public relations firms. All survey respondents have the option to receive this white paper via e-mail as soon as it is available.

Technorati Tags: , , , , ,

Welcome to the Blogosphere, Google!

The blogosphere is abuzz with Google's launch of their Blog Search. So far things look pretty interesting, and having a big traditional search player like Google working on blog search is a validation moment for the entire blogosphere.

This will mark a major milestone for the World Live Web. At Technorati, we have a tremendous amount of respect for the Google team and for everything they've done in the world of search. I'm sure that they'll continue to improve over the coming months, perhaps including tags, recent images and links, zeitgeists, blogger tools, and other types of semistructured data. I'm sure that they'll also start indexing the full-text of blog posts, not just the partial text found in most blog feeds.

I welcome the competition. We've got some tricks up our sleeves too - and there's no doubt that in the end, the competition will end up producing more innovation and better services for bloggers and readers.

Welcome to the party, Google!

Technorati Tags: , , , , , , , , , , , , , ,

Technorati Blog Finder Beta Launches

So, while all of our infrastructure work progresses and our backend search and infrastructure engineers are busy upgrading and scaling the service, Technorati's front-end designers and developers cooked up a fun new feature: Technorati Blog Finder.

Blog Finder helps answer the question, "How can you find authoritative blogs on a subject?"

Derek Powazek posts about Blog Finder on the Technorati Weblog. My favorite feature is that it allows you, the blogger, to tag your blog, and add yourself to the categories that you want to be listed under. We've made some educated guesses based on how people tagged their posts over the last 6 months - and over 2 million blogs are already included in the index, but now you have the power to add yourself to the directory in a few simple steps.

Performance and Scalability improvement progress report #2

It's been a long and busy month, and I wanted to give y'all an update on the infrastructure, performance and scalability progress over at Technorati. There's been a lot going on as I described earlier in the year, and we've made some progress, but there's important things that are still broken, and are being fixed this month.

The situation as of couple of months ago

The blogosphere has been growing at an explosive rate - Technorati is now indexing over 16 million blogs, with about 100,000 new blogs created every day. And there's over 1.4 Million new posts every day, and about 22% of those posts are from spam or fake blogs, which means that even after we pull out the spam and fake blogs from the indexes, we are dealing with about 1.2 Million posts each day.

We just weren't expecting that kind of sudden growth, both on the posting side and also on the search side, and frankly we didn't plan well enough to handle the load. We've been adding new machines to our datacenter, - over 400 now - and more coming each week, and we've been fixing bugs and making performance enhancements on the web site as well.

We also made some pretty significant performance improvements to keyword search - most now returning in 1--2 seconds; you can see some details on those statistics and also a month view.

However, Cosmos search (or URL search) is still being worked on, and is often timing out under the increased load. Unfortunately this is also one of the searches that bloggers find most compelling, as it helps you to all know who is linking to your blog, and it is the very first type of search that Technorati made available, so it is near and dear to our hearts. Everyone here also uses it every day, so it really sucks when it isn't working right.

As search traffic has grown, we've also seen an increase in support and feedback requests. It's my goal to make sure that we respond to all support requests within 24 hours of getting the request. right now, we're not meeting those goals, and some people haven't had a human response in over a week from when they sent in their request.

What we're doing

Once we got our keyword search infrastructure back on track, our infrastructure team has been working 100% on fixing Cosmos search. Our current plan is to have Cosmos search back up and running by the end of September, and you'll see incremental improvement throughout the coming month. I'll keep you informed on progress of this critical project. As the project progresses throughout the month, you'll be able to see progress because you'll see fewer and fewer error messages when you do a URL search as September progresses.

We're busy expanding out our support capabilities, and also putting together tools to make it easier for users to help answer their own questions before a Technorati support staffer has to get involved, and we've already made a bunch of fixes and feature enhancements to help fix the most common support requests, like fixes in our blog claiming code.

What about new stuff?

While we work on these core infrastructure issues, we're not resting on our laurels in our dedication to provide great tools and services for bloggers and for people who want to keep track of what's happening on the web right now. There'll be more to announce in the coming days and weeks, stay tuned...

Thanks for your support

I am consistently humbled and amazed at how great our users are. You guys have stood by us as the service has grown and has gone through growing pains. We take this trust very seriously, and are working very very hard to live up to your expectations. Thanks.

Technorati and Newsweek

Newsweek Blog Roundup I'm proud to announce that Technorati and Newsweek are working together, including a deep integration of posts and links from bloggers (here's an example) into Newsweek's site. This includes the Newsweek Blog Roundup and summary widget on every Newsweek page (shown here on the right). This acts just like a "most viewed articles" or "most emailed articles" widget - only the determinations are made by watching the number of bloggers that are linking to Newsweek articles. It shows the top 10 Newsweek stories generating the most discussion on Weblogs within the past 7 days. You can see it on the Newsweek homepage and on each of the article pages, simply scroll down a bit and look on the right hand side.

In addition, Newsweek has launched a section covering the conversations in the blogosphere about Newsweek's columnists as well. For example, here's the data on Steven Levy, Anna Quindlen, Michael Isikoff and Mark Hosenball. You can also subscribe to the search via RSS feed by Technorati Watchlist (available at the top of each Blog Talk), and you can dive as deep as you like by getting all of the posts as well.

My kudos to the folks at Newsweek for their forward thinking recognizing and including bloggers inside their tent. This is just the beginning of many ways that mainstream media and bloggers can work together to provide a more complete picture of a story - facts, opinions and feedback all shown in one place, making for a better reader experience.

One more thing - if you think that this is a cool feature, and you want to see Newsweek and other media companies roll out systems like this, leave a comment, or send a trackback, or better yet, go the the Blog Roundup and at the bottom of the page, rate the article. That'll send Newsweek a message that this is something you want to see more of.

Technorati Tags: , , , , , , , , , ,