Breaking the (power) law

Clay Shirky has gotten a lot of people talking about power laws and how they relate to the blogosphere.  Dave Winer and others disagree, and there's a bunch of other interesting conversations going on as well.  What's interesting is that both Clay and Dave are right, depending on how you look at things.  Dave sees the world from the microeconomic point of view - it is really easy to create new communities with blogs, and there is no scarcity of links.  Clay looks at things from the macroeconomic view, seeing overall patterns of thought leaders evolving as people look for editors and subject matter experts to help guide them through all the links.  Clay is also right in his observation that blog linking will tend to follow a power law - that is, a small proportion of bloggers will get a huge number of incoming links as the Technorati Top 100 will attest across the blogosphere.  However, what Clay doesn't emphasize, is that blogging communities, even though they have some lightposts, tend to form into small open communities.  That's why my blogroll looks very different from Glenn Renolds', or Doc Searls', or Joel Spolsky's.  Even though we might all have a few bloggers in common, most of the links are different.  In other words, the blogging space has a high degree of dimensionality

I thought about the problem that this presented to a traditional link engine.  When you rank bloggers simply by the number of people who link to them, you get a very static list of "a-list" bloggers, as shown by the Technorati Top 100.  What I wanted to do was to break that power law, and give more exposure to the lesser known, but still interesting bloggers, especially on days when they stand out and do something interesting.

I think I've found a way to do that, and it all boils down to the fact that Clay described a power law.

The Technorati Top 100 ranks based on a linear relationship of the incoming links to a blog.  A linear equation looks like this:

y = ax + b

As we all know, that leads to a boring top 100 page.

So, I started playing around with the ranking algorithm.  Now, a power law looks something like this:

y = ax2 + bx + c

Or, a more skewed graph looks like:

y = ax3 + bx2 + cx + d

Remember high school algebra?  I'm sorry to make your brain hurt.  The key point to remember is that equations that follow a power law start to get really big really fast as you increase x.  You start to get a graph that looks like a parabola.

What I wanted to do was to give some of the lesser-known bloggers some visibility.  The way I did this was to invert the power law when I did my rankings. I looked at two variables:  The number of new inbound links to a blog, and the current number of total blogs already linking to it.

In order to reverse the power law, I used the following as the ranking algorithm for the Interesting Newcomers page:

n = Number of new inbound links
c = Current number of inbound blogs (as of the day before)

(n3)/(c+n)2  where c > 30

And I'm using a quadratic equation for the "Interesting Recent Blogs" page:

(n2)/(c+n)2  where c > 40

The results are very interesting.

What the ranking algorithms described above does is make it progressively harder to move up in ranking as the number of current inbound blogs increases.  This effectively negates the power law that Clay describes, and gives us a way of measuring apples to apples.

Basically, the idea is that for a relatively obscure blogger who has, say, 40 people currently linking to his blog, getting 4 or 5 new blogs linking to him can have the same effect as a a-list blogger getting 40 or 50 new links.

Intuitively, we know that this is right - After all, it's very easy for Doc Searls to get 20 new links to him - he has such a large readership.  But for a smaller blogger to get a bunch of new links, he must have posted something really interesting that day.

One more point - why does c have to be greater than 30 or 40?  Well, there's two reasons for this: This equation doesn't work well when the number of current incoming blogs is very small - someone who has only one person linking to him can jump right to the top of the ranking if he gets one or two new links, and that's not very interesting for us.  The second reason to set the bar at a certain level is to ensure that the blogger in question actually has an audience.  The audience may be small, but at least some people are linking to him, which is a good way to knock out the cruft at the real tail of the power curve.

These equations probably aren't perfect - I haven't done any curve fitting or formal statistical analysis to make sure that the equations are correct, but I'm just using my holistic "feels good" barometer.  The power law may not be a quadratic or cubic relationship - it could be of a different power, but the quadratic and cubic relationships give a decent spread of both a-list and unknown interesting bloggers in the Technorati Interesting Recent blogs and Interesting Newcomers lists.  For the Interesting Newcomers list, I simply cut out all of the bloggers who already have an audience - so you won't see any a-list bloggers on that list, at least not once they become a-list. :-)

This is interesting research for me, but the most satisfying thing about it is that I've found a way to identify interesting new writers and add them to my blogroll - people who I would have never had found out about otherwise.  I can also use the other Technorati tools, like the link cosmos, to find out who is linking to them - which gives me a quick feeling for who is in their community. 

Let me know your thoughts - do the new rankings look reasonable to you?  Ar you finding new and interesting blogs?  More of the same old same old, or just boring crap?  I know I've already found one new blog I'd never heard about before - Exploding Cigar, currently number 4 on the Interesting Newcomers list.  Very funny, great blog.


UPDATE: Jason Kottke has done the analysis, and comes up with the following formula:

y = 5989.8x-0.8309

The important part of that equation is the power degree - -0.8309. To counteract that effect, we need to invert that (hope I'm getting my math right), which would make the power needed to ocunteract the power law to be approximately x1.2038. That about matches up with the formula I spelled out for the Interesting blog list, which approximates a x1.5 relationship, for reasonable values of c. Note: This is something I just did on the back of a napkin with only 4 hours of sleep, and no coffee, so I may be way off, but if some kind mathematician can check the work and comment, I'd be much obliged.