Breaking the (power) law

Home
None
Breaking the (power) law

Breaking the (power) law

Clay Shirky has gotten a lot of people talking about power laws and how they relate to the blogosphere.  Dave Winer and others disagree, and there’s a bunch of other interesting conversations going on as well.  What’s interesting is that both Clay and Dave are right, depending on how you look at things.  Dave sees the world from the microeconomic point of view – it is really easy to create new communities with blogs, and there is no scarcity of links.  Clay looks at things from the macroeconomic view, seeing overall patterns of thought leaders evolving as people look for editors and subject matter experts to help guide them through all the links.  Clay is also right in his observation that blog linking will tend to follow a power law – that is, a small proportion of bloggers will get a huge number of incoming links as the Technorati Top 100 will attest across the blogosphere.  However, what Clay doesn’t emphasize, is that blogging communities, even though they have some lightposts, tend to form into small open communities.  That’s why my blogroll looks very different from Glenn Renolds‘, or Doc Searls‘, or Joel Spolsky’s.  Even though we might all have a few bloggers in common, most of the links are different.  In other words, the blogging space has a high degree of dimensionality



I thought about the problem that this presented to a traditional link engine.  When you rank bloggers simply by the number of people who link to them, you get a very static list of "a-list" bloggers, as shown by the Technorati Top 100.  What I wanted to do was to break that power law, and give more exposure to the lesser known, but still interesting bloggers, especially on days when they stand out and do something interesting.



I think I’ve found a way to do that, and it all boils down to the fact that Clay described a power law.



The Technorati Top 100 ranks based on a linear relationship of the incoming links to a blog.  A linear equation looks like this:



y = ax + b



As we all know, that leads to a boring top 100 page.



So, I started playing around with the ranking algorithm.  Now, a power law looks something like this:



y = ax2 + bx + c



Or, a more skewed graph looks like:



y = ax3 + bx2 + cx + d



Remember high school algebra?  I’m sorry to make your brain hurt.  The key point to remember is that equations that follow a power law start to get really big really fast as you increase x.  You start to get a graph that looks like a parabola.



What I wanted to do was to give some of the lesser-known bloggers some visibility.  The way I did this was to invert the power law when I did my rankings. I looked at two variables:  The number of new inbound links to a blog, and the current number of total blogs already linking to it.



In order to reverse the power law, I used the following as the ranking algorithm for the Interesting Newcomers page:



n = Number of new inbound links

c = Current number of inbound blogs (as of the day before)



(n3)/(c+n)2  where c > 30



And I’m using a quadratic equation for the “Interesting Recent Blogs” page:



(n2)/(c+n)2  where c > 40



The results are very interesting.



What the ranking algorithms described above does is make it progressively harder to move up in ranking as the number of current inbound blogs increases.  This effectively negates the power law that Clay describes, and gives us a way of measuring apples to apples.



Basically, the idea is that for a relatively obscure blogger who has, say, 40 people currently linking to his blog, getting 4 or 5 new blogs linking to him can have the same effect as a a-list blogger getting 40 or 50 new links.



Intuitively, we know that this is right – After all, it’s very easy for Doc Searls to get 20 new links to him – he has such a large readership.  But for a smaller blogger to get a bunch of new links, he must have posted something really interesting that day.



One more point – why does c have to be greater than 30 or 40?  Well, there’s two reasons for this: This equation doesn’t work well when the number of current incoming blogs is very small – someone who has only one person linking to him can jump right to the top of the ranking if he gets one or two new links, and that’s not very interesting for us.  The second reason to set the bar at a certain level is to ensure that the blogger in question actually has an audience.  The audience may be small, but at least some people are linking to him, which is a good way to knock out the cruft at the real tail of the power curve.



These equations probably aren’t perfect – I haven’t done any curve fitting or formal statistical analysis to make sure that the equations are correct, but I’m just using my holistic "feels good" barometer.  The power law may not be a quadratic or cubic relationship – it could be of a different power, but the quadratic and cubic relationships give a decent spread of both a-list and unknown interesting bloggers in the Technorati Interesting Recent blogs and Interesting Newcomers lists.  For the Interesting Newcomers list, I simply cut out all of the bloggers who already have an audience – so you won’t see any a-list bloggers on that list, at least not once they become a-list. :-)



This is interesting research for me, but the most satisfying thing about it is that I’ve found a way to identify interesting new writers and add them to my blogroll – people who I would have never had found out about otherwise.  I can also use the other Technorati tools, like the link cosmos, to find out who is linking to them – which gives me a quick feeling for who is in their community. 



Let me know your thoughts – do the new rankings look reasonable to you?  Ar you finding new and interesting blogs?  More of the same old same old, or just boring crap?  I know I’ve already found one new blog I’d never heard about before – Exploding Cigar, currently number 4 on the Interesting Newcomers list.  Very funny, great blog.





UPDATE: Jason Kottke has done the analysis, and comes up with the following formula:



y = 5989.8x-0.8309


The important part of that equation is the power degree – -0.8309. To counteract that effect, we need to invert that (hope I’m getting my math right), which would make the power needed to ocunteract the power law to be approximately x1.2038. That about matches up with the formula I spelled out for the Interesting blog list, which approximates a x1.5 relationship, for reasonable values of c. Note: This is something I just did on the back of a napkin with only 4 hours of sleep, and no coffee, so I may be way off, but if some kind mathematician can check the work and comment, I’d be much obliged.


Sifry's Alerts
About The Author
I'm founder of a number of companies, including Offbeat Guides and Technorati. I was the cofounder and CTO of Sputnik and Linuxcare, founding board member of Linux International, and a WEF Technology Pioneer. I've been around the block a few times. Some might call me a serial entrepreneur. You can contact me at david-blog@sifry.com.

12 Comments:


  • By xian 12 Feb 2003

    When you say “I simply cut out all the bloggers who already have an audience,” what is your metric for that? I notice Shelley Powers mentioned that she appears on all three of your lists.
    It’s interesting to note that RFB (my blog) was on the interesting newcomers list a few days ago and got about 12-16 hits over a day or so, which is a lot of traffic for one link. Then it was off the list yesterday, so it’s nice to see there’s some churn there. I suppose not that many people added links to me on Monday. I think I may be back on today, as I saw a link in my referrers this morning once again from the newcomers list.
    I imagine you’ll keep tweaking the algorithms.

  • By James Joyner 12 Feb 2003

    Well, you make a good case that these laws aren’t immutable. But the point that Shirkey makes–or, that I get from him–is that being first has huge advantages because it’s easier to get noticed when there are a handful of blogs than thousands and it’s therefore more likely that the early bloggers get on people’s “favorite blog” lists and therefore be read by new users who will, in turn, be more likely to add them to their lists than, say, a blog they haven’t read. So, yes, a group of people can get together and make a newcomer’s blog more popular; but, even then, it’s more likely that it’ll be a newcomer that started yesterday than one that starts tomorrow.

  • By Wonder 12 Feb 2003

    Power to the blogger!
    I did not double check your maths because math ain’t my bag, but I like the idea.
    It is an excellent thing to gain sources of new information and perspective. After all, the internet and blogging is for free market thought.
    Thank you for illustrating how that might work. I’ll link to you!

  • By Wonder 12 Feb 2003

    Power to the blogger!
    I did not double check your maths because math ain’t my bag, but I like the idea.
    It is an excellent thing to gain sources of new information and perspective. After all, the internet and blogging is for free market thought.
    Thank you for illustrating how that might work. I’ll link to you!

  • By Richard Bennett 13 Feb 2003

    I like what you’ve done here, and would like to suggest a further refinement. The best way to drive up traffic (and links probably follow traffic) is to post a lot. It doesn’t matter so much what you post, since every time you post and ping weblogs you get more visitors.
    But not all posts are interesting, and not all the big posting volume sites are worth reading because they have so much junk on them: Instapundit is the classic example.
    So the way I would measure the quality of a blog is by the number of links to specific posts divided by the total number of posts, and then run all that through your weighting factor that looks for percent change to the installed base of linkers or whatever.
    The reasoning is that a blog with light posting volume that consistently has posts that are linked by many others is a good one to read in terms of the ratio of what you get out of it vs. what you put into it.
    So when you’ve got nothing better to do, etc.

  • By Nrrrdboy 13 Feb 2003

    Given the “A-list” and power law discussion and such, you can actually find an analysis of mobility within the myelin bloggin ecosystem (for the past 6 months). In brief, growth and movement are a definite, though I very much like your “flattening” proposal.
    http://goatee.net/2003/02#_12we

  • By Steve 13 Feb 2003

    You’re doing discovery based on at-large changes in popularity. Since you’ve got the cosmos data, why not personalize the recommendations? For instance,
    blog X is the most similar to your blog for inbound links (the same people link to both of you). Or
    blog Y is the most similar to your blog in the outbound links (the stuff you link to). All the things that blog Y links to that you do not are things you are highly likely to enjoy.
    Either form of discovery will fight the power law trend.

  • By Kevin Marks 14 Feb 2003

    Can I just point out the irony that Dave Sifry’s blogs are ranked at numbers 1 & 2 on ‘Interesting Newcomers’ and 4&5 on ‘Interesting Recent Blogs’… no wonder he likes these new metrics.
    One thing I haven’t seen anyone mention is that in general power law relationships are subject to rapid flux – the field is sometimes known as ‘catastrophe theory’ for this reason – that upheavals follow a power law too.
    More is made of this aspect in ‘Ubiquity’ than in ‘Emergence’, but they are two halves of the same phenomenon as far as I can see.

  • By Taran 08 Oct 2003

    I’m a newcomer in looking at all of this, and I am certain other minds have thought more about this… but…
    If I link something, I may link it for a completely different reason than someone else may. These are aspects; I may link to something because it served as a muse for an entry whereas someone else may have linked to it because they are directly involved.
    In my book – and it’s a new book with fresh blank pages, so please be patient – if link relevance AND your algorithm worked together, wouldn’t that better define how relevant the blogger’s links are?
    Only this evening have I come across your blog (an interesting journey), and at Technorati I noted that a lot of the links to the popular blogs are semi-static or static. Is this what you are trying to achieve, or is this artifact?

  • By http://www.juniors.net 14 Oct 2003

    that’s a gud one! Weblogs are actually becoming powerful.

  • By phemy 20 Oct 2003

    I’m reiterating for the faction who wish to be heard—Is this a popularity contest, to have your content listed? Okay, so it might be on the hitlist, Technocrati’s Top 100, but how/why should it be such for the newcomers?
    I’ve only been doing this seven days…I haven’t even been indexed by the search engines yet, so how can I have any links without being seen?
    More importantly–is your criteria for interesting based on someone else’s opinion? Though I like input from other others, how will the really interesting stuff get seen? Why not use the content-based criteria suggestion (keywords)?

  • By phemy 20 Oct 2003

    I’m reiterating for the faction who wish to be heard—Is this a popularity contest, to have your content listed? Okay, so it might be on the hitlist, Technocrati’s Top 100, but how/why should it be such for the newcomers?
    I’ve only been doing this seven days…I haven’t even been indexed by the search engines yet, so how can I have any links?
    More importantly–is your criteria for interesting based on someone else’s opinion? Though I like input from other others, how will the really interesting stuff get seen? Why not use the content-based criteria suggestion (keywords)?

Search

Follow @dsifry