Word Bursts and Trend Spotting

Math geekout time:

An interesting article in The New Scientist, talking about how tracking changes in word frequency can be indicative of emerging trends.  For those of you with a mathematical bent, this is a rough approximate of what LSI (Latent Semantic Indexing) of a set of documents over time does as well.  LSI allows you to "reduce the dimensionality" of the word frequency lists by taking advantage of the fact that some words and phrases are synonyms, or are in a variety of ways related to each other. 

The big problem with LSI over large data sets (like the web) is that the calculations required to perform it (SVD) are difficult to solve numerically as the document sets get larger.

The "word burst" idea gets around all of that because it just follows individual word or phrase frequency trends.  It's an interesting idea, something that would be cool to implement...  But not for now.  Right now, other tasks have higher priorities.  Could be a fun weekend project, though...