May 12, 2003

Technorati API 0.9

I'm proud to announce the first public release of the Technorati API, the application programming interface to Technorati's weblog index and search engine.

Over the past few months, I've gotten a lot of requests from people who wanted to be able to use the Technorati database for a variety of purposes - everything from social network research to mini-applications that would send them a page or an IM whenever someone posted a link to their website.  I created the Technorati API order to help foster these creative ideas and developers. I've also got a whole bunch of other work going on, so I didn't want to become a bottleneck between you and the data.

One thing to note: This is beta-level software right now.  Nothing is set in stone, but the code is working, and a number of people have already given some great feedback that has led to changes in the API that make it significantly more portable and forward-compatible than I originally designed it.  I'm planning on adding additional attributes and functions to the API.  Things may change again if someone finds an important structural issue or bug, so you may want to wait a bit (until the 1.0 version comes out) to build big enterprise applications with it. :-)

First off, regarding licensing:  I'm making API calls free for personal, noncommercial use, and you'd be capped at a certain number of requests per day, currently 500 queries a day. The draft terms of service is up on the website.  This is pretty much what some other folks are doing with their API.  I would also like to get some attribution added that points a hyperlink (and maybe a small picture) back to Technorati for people who use and republish the information, sort of like a Creative Commons non-commercial and attribution license.  Of course people who pay money will get to use the data without these restrictions.  If you're interested in licensing Technorati data for commercial applications, or if you want more than 500 queries per day, please send an email to api-sales@technorati.com.

Skip the next 3 indented paragraphs if you're not a super-geek, and read on, starting at "So here's the initial API calls that I've built"...

        The API calls would be built (at first) with a REST-ful interface - a standard web-browser-like program (curl, wget, lynx, etc)would pull down the info, something like:
       
http://api.technorati.com/PROCEDURENAME?key=KEY&val1=VAL1&val2=VAL2 and so forth.
       
        If enough people clamor for it, I can do a SOAP or XML-RPC interface, but REST-ful is easier for me to do, and I don't think people need the multiple-way communications channel that something like XML-RPC or SOAP gives them in addition to the REST interface.  In other words, each API call will be stateless, AFAICT. Besides, I have never written a WSDL file :-)
       
So here's the initial API calls that I've built:


API call #1: Cosmos

GET format:
http://api.technorati.com/cosmos?VAR=VALUE&VAR=VALUE ...

POST format:
http://api.technorati.com/cosmos
with the variables and values returned in an HTTP POST call.

This returns the Link Cosmos for the URL you specify.
       
Set the following variables to get the results you want:
       
url: Set this to the target URL you are searching for.
       
type: Set this to 'link' and you'll get the freshest 20 links to your target URL.  Set it to 'weblog' and you'll get a reverse blogroll - the last 20 blogs who linked to the target URL.
       
start: Set this to a number > 0 and you'll get the start+20 freshest items (links or blogs), e.g. set it to 21, and you'll get the second page of rankings - items 21-40.

format: This allows you to request an output format. I anticipate we'll have multiple output formats, like XHTML, RSS (different flavors), and whatever else comes along.  For now, the only valid format accepted is "xml", which adheres to this DTD.  The format variable is optional - if you don't specify it, it will default to "xml".

version: This allows you to specify a particular Technorati API version response.  Right now, there is only one Technorati API, called version 0.9, but I anticipate newer versions, some of which may not be backwards compatible with earlier versions.  By implementing this variable across all Technorati API calls, application developers can feel assured that they will always get reliable results if they specify a particular API version for requests and results.  The version variable is optional - if you don't specify it, it will default to the more recent version of the API, which at this time is "0.9".

key: Put your Technorati API key in here.  You can get your API key by going to: http://www.technorati.com/members/apikey.html
       
If you're already a Technorati member, just log in and your key information (along with some accounting) is listed for you.  If you don't currently have a Technorati account, setting one up is easy and free, just follow the instructions on the signup page.
       
For example: This call (copy the link location to see thesyntax) will get you the Link Cosmos for the www.sifry.com website (including my blog, Sifry's Alerts).  That API key is my key, so please don't abuse it, get your own!

API call #2: Bloginfo

GET format:
http://api.technorati.com/bloginfo?url=URL&key=APIKEY

POST format (see the cosmos call above for explanation):
http://api.technorati.com/bloginfo

Give it any URL and it'll tell you what blog, if any, that URL came from, and all the info it has on that blog, like cosmos stats and RSS feed, etc.
       
For example: This call(copy the link location to see the syntax) will tell you information about the weblog that created the following permalink: http://www.sifry.com/alerts/archives/000286.html (BTW, the answer is Sifry's Alerts)


API Call #3: Outbound Blogs

GET format:
http://api.technorati.com/outbound?url=BLOGURL&key=APIKEY

POST format:
http://api.technorati.com/outbound

This call will list for you all of the blogs that are pointed to by a blogger (well, most of them anyway).  Simply give the URL of the weblog and it will return for you a list of weblogs that are linked to by that blog. (The database is currently a bit old on this one, I'll move it into production status over the next few weeks)
       
For example, it is a relatively simple call to iterate and get a 2 degrees of separation function from this, which I'm working on... ;-)

Finding out more

The place to go to find out more information on the Technorati API is the api-discuss mailing list.  You can subscribe by sending a message to api-discuss-subscribe@technorati.com or by going to the mailing list page above.  There is a web site that I created just for developers, called developers.technorati.com but right now it is pretty sparse. I'm looking for volunteers who would like to keep that site updated.  Right now I just don't have time.

If you've gotten this far and you're still interested, come join us!  Subscribe to the api-discuss list.  Get an API key, and start playing around.  Leave a comment and let us all know how you're using the API.  Have fun!

UPDATE: The API hasn't been out for a full day yet, and already there exist plugins and drivers for Radio Userland(Dave Winer), Python (pyTechnorati (Mark Pilgrim), two different technorati.py's - Phil Pearson's and Aaron Swartz', and at least three Movable Type plugins - Ben Hammersley's unreleased plugin, Joi Ito's SSI-based plugin, and Adam Kalsey's publicly released plugin!

Amazing. So here's my next thought: Dave Winer and others brought up the idea of Link Cosmos as trackback - and here's the simple equation that popped into my head:

Technorati Link Cosmos on each of your blog permalinks = instant trackback... Of course, you'd have to pull the information several times a day rather than being event based, but it wouldn't require any agreement between the linker and the linkee, and no messy XML-RPC calls involved... I'd love to see that in an MT plugin, I'd put it up on my blog! Posted by dsifry at May 12, 2003 2:28 AM | TrackBack | View blog reactions
Comments

Glue for the Technorati API for Frontier and Radio is available here.

http://blogs.law.harvard.edu/crimson1/technoratiApi

Feedback for Big Dave is here.

http://blogs.law.harvard.edu/crimson1/technoratiApi#feedback

Posted by: Dave Winer at May 12, 2003 4:56 AM

Any idea why my blog is showing up (multiple times) as a related story? The Technorati API looks really cool, but I have not mentioned it on my blog yet.

Posted by: Dave Johnson at May 12, 2003 9:11 AM

Any idea why my blog is showing up (multiple times) as a related story? The Technorati API looks really cool, but I have not mentioned it on my blog yet.

Posted by: Dave Johnson at May 12, 2003 9:11 AM

Following up on Dave's linked feedback: why use a ey? Why not have the server count by IP address? Because then a software vendor could create and distribute an end-user application that embedded their key, and probably never need to pay license fees, since each end-user (IP address) would likely never hit the per-IP limit. Integrating it into an application like you have (by shipping without an embedded key, and stating in the documentation that the end user is responsible for key management) limits its scope to enthusiasts, which is fine. Any wide-scale deployment for non-enthusiasts would require purchasing a key.

Also, as a point of history, while Google was the first in this space to set per-key limits, keys have been part of APIs like this since the original Blogger API. But wrapper libraries (like my PyBlogger) simply embedded the key (since there was no per-key limit, and no legal reason not to), which defeated Evan's purpose for requiring them.

Posted by: Mark at May 12, 2003 10:14 AM

Python wrapper is up: http://diveintomark.org/projects/pytechnorati/

Posted by: Mark at May 12, 2003 12:58 PM

Any reason why the api.technorati.com server is sending out the results with text/html http headers instead of as xml? I'm having trouble getting coldfusion to parse the file, and I'm not sure that's the problem, but it seemed odd.

Posted by: Matt Haughey at May 12, 2003 1:30 PM

Very nice. I've implemented a Movable Type plugin that uses the API: http://kalsey.com/2003/05/technorati_plugin/

Posted by: Adam Kalsey at May 12, 2003 3:50 PM

Extremely cool. Amazing - in less than a day, the API now has bindings for Python (more than one!), Radio, XML-RPC, and now Movable Type. Now that's the web in action.

I was thinking - doing a technorati cosmos on each of your permalinks = instant trackback... Of course, it would require polling rather than being event based, but it wouldn't require any agreement between the linker and the linkee...

I'd love to see that in an MT plugin...

Dave

P.S. join api-discuss! send an email to api-discuss-subscribe@technorati.com.

Posted by: David Sifry at May 12, 2003 4:26 PM

Tim Bray gives an excellent reason not to send any data out as text/* in his article on the iTunes format:

http://tbray.org/ongoing/When/200x/2003/04/30/AppleWA

nice work!

-steve

Posted by: steve jenson at May 12, 2003 5:48 PM

If you use the Technorati MT plugin in your individual entry archive template, you pretty much have wthe Cosmos trackback. The plugin will automatically see that it's being called from within an entry and grab the Cosmos for that entry. All you need to do is set the lastn to a high enough value and rebuild your individual archives periodically.

Posted by: Adam Kalsey at May 12, 2003 8:03 PM

How come all these feeds are encoded for Latin-1 rather than Unicode? They won't pass Unicode characters to a utf-8-encoded page, am I right? What if I want to link to 第三只眼看电信 and have 第三只眼看电信 see that I've linked to him? The same goes for changes.xml at Weblogs.com. Language metadata would also be really handy.

Posted by: blogal villager at May 12, 2003 9:20 PM

My feed was crapping out the CFMX parser due to ampersands in the excerpt field, any chance those could be escaped?

Posted by: Matt Haughey at May 12, 2003 11:03 PM

Dave: Matt is correct, Technorati is in some cases serving up invalid XML--try

http://api.technorati.com/cosmos?key=KEY&type=link&format=xml&url=http%3A%2F%2Fwww.aldaily.com%2F

Posted by: Michael S. at May 13, 2003 12:09 AM

Kick Ass.

I hacked up a new module for myself: http://jeremy.zawodny.com/blog/archives/000730.html

Keep the good stuff coming. :-)

Posted by: Jeremy Zawodny at May 13, 2003 12:11 AM

This is damn suave, Sifry.

Small problem: PHP's xml parser is choking on excerpts in which html entities are chopped, e.g. <excerpt>she said this isn&#82</excerpt>

Maybe a regex that uses a lookahead to make sure the the closing semicolon is in place, otherwise trim off the entity?

Posted by: Dean Allen at May 13, 2003 12:55 AM

Er, I mean, what Matt said.

Posted by: Dean Allen at May 13, 2003 12:56 AM

No-one seemed to be doing anything for .Net, and since I'm still trying to get up to speed on C#, I slapped something together: http://www.thequietone.org/archives/000035.html. Still needs work, but it's usable.

Posted by: Jonathan at May 13, 2003 2:45 AM

Okay, the comment link stripper/constructor doesn't play nice with periods.
That URL should be http://www.thequietone.org/archives/000035.html

Posted by: Jonathan at May 13, 2003 2:50 AM

Jonathon beat me to it :)
But here's another TAPI implementation written in C#.

http://kilic.net/code/csTAPI.cs.txt

Posted by: Serdar Kilic at May 13, 2003 4:43 AM

God damn you all move fast!

I realise that this is all so yesterday, but I knocked up a java version:

http://xurble.users.btopenworld.com/code/jTechnorati.html

Posted by: Gareth Simpson at May 14, 2003 8:18 AM

I have an idea for a killer app for the Technorati API (okay, maybe "killer app" is exaggerating a little), but you'd need to add calls for "interestingblogs" and "interestingnewcomers" for me to fully realize it....

Posted by: Dougal Campbell at May 16, 2003 11:02 AM

1) When type="weblog", the API now always returns an empty <url></url> element. Yesterday, it was returning the URL of the blog main page, as it should.

2) When type="link", almost all items have a blank
<nearestpermalink></nearestpermalink> element. A few do have a correct permalink URL there, so this function is not completely broken. But the permalink-finding algorithm seems to need some improvement.

How can we, as weblog authors, help it along?

Posted by: Jacques Distler at May 16, 2003 11:06 AM

I coded up a Java wrapper too, but mine uses JDOM/XPath for parsing and is available under Apache license.

http://www.rollerweblogger.org/page/roller/20030517#technoratj

Posted by: Dave Johnson at May 17, 2003 7:33 AM

Here's a wrapper in Perl:

TechnoratiAPI.pm
http://erikbenson.com/index.cgi?node=Technorati%20API

Posted by: Erik Benson at May 17, 2003 6:02 PM

A lazy web request. Anybody thinking of writing a PHP wrapper? I'd like one for SARS Watch Org. Thanks,
Tim

Posted by: Tim at May 22, 2003 10:45 AM

I'm thinking of whipping up a PHP API wrapper -- shouldn't take too long. If anyone else out there has started one please let me know!

Posted by: Jason Anderson at July 7, 2003 11:10 AM

I've just uploaded an ASP wrapper which includes search as well:
http://www.murraywoodman.com/code/technorati4asp/

Posted by: Murray Woodman at July 8, 2003 3:03 AM

do anyone know a good site about the api basics?

Posted by: reinhard at October 1, 2003 6:07 AM