Playing Around With Google Ngram  

Posted by T. Greer in , , ,

Google Labs has released a cool new tool: Google Ngram Viewer. CNET News explains how it works:


Google's Ngram Viewer: A time machine for wordplay
Lancey Whitney. CNET News. 17 December 2010.


Courtesy of the folks at Google Labs, Ngram Viewer can work its analysis as a result of Google's sometimes contentious digitization of vast quantities of books--more than 15 million since the project began in 2004. The Ngram tool draws on what the company calls "a subset of that corpus" totaling more than 5 million books, around 4 percent of all the books ever published. By tracing the 500 billion or so unique words that show up in those 5 million books, the tool can offer a glimpse into their history and popularity over the years.


Ngram Viewer works rather simply. After you enter a word or phrase (up to five words), the tool displays a graph charting how frequently your term has appeared in books over that half a millennium. By default, the Ngram Viewer taps into books written in English. But you can change that to a different "corpus" or category of books, such as American English, British English, English Fiction, Chinese, French, German, Russian, or Spanish.


You can vary the years tracked, all the way from 1500 to 2008 or anywhere in between. Providing a wide range of years gives you more of an overview, while narrowing the years lets the tool graph a word's usage in a more granular fashion year by year.


You can enter multiple terms to compare their popularity. For example, typing the two terms "frankfurter" and "hot dog" shows that frankfurter's usage has remained steady over the years, but the hot dog has continued to jump in popularity since the early 1920s.
This has the potential to be a great quantitative research tool. Here are some of the more interesting graphs I've developed after fiddling with Ngram for the better part of an hour:


International Affairs


Berlin, Moscow, Tokyo, and Beijing, 1900-2008

Communism, 1900-2008
Deterrence and Detente, 1950-2008 


Social Change


Negro and African American, 1900-2008

'hat', 1900-2008
Business


IBM and Microsoft, 1940-2008
Language Use


Thee, Thou, Ye, and You, 1550-1700


Intellectual Change


Clausewitz, Sun Tzu, On War, and the Art of War, 1830-2008





I encourage my readers to play around with Ngram for a bit and report any interesting findings.

This entry was posted on 17 December, 2010 at 11:33 AM and is filed under , , , . You can follow any responses to this entry through the comments feed .

0 comments

Post a Comment