Sunday, June 19, 2011

Billions and billions of words served

Mark Davies, Professor of Corpus Linguistics at Brigham Young University, has created several searchable databases (corpora) of words drawn from a variety of sources, with the newest from the Google Books database containing 155 billion(!) words. A comparison chart of three different databases highlights the possibilities afforded by searchable corpora.

As the comparison chart notes, the BYU version of the Google Books database and Google Books’s own database are identical, with the BYU version providing more analytical capabilities. Even so, Google Labs’s Books Ngram Viewer for Google Books allows case-sensitive searches yielding stunning visual representations of the rise and fall of words and phrases over time, as illustrated by the ngram for “Legal Writing” and “legal writing”. Google Labs provides a webpage explaining the Ngram Viewer; the page includes a link to tips for using the Google Books corpus for scholarly research. For other helpful explanations of the Google Books Ngram Viewer (including the meaning of ngram), visit here, here, here, here, and here.

All in all, these resources offer opportunities for fun and scholarship — or, perhaps best of all, scholarly fun.

(h/t June 19, 2011, broadcast of A Way with Words, “public radio’s lively show about words, language, and how we use them”)


| Permalink


Post a comment