November 4, 2010
How Does Google Books Search Work
Unlike searching the web where Google uses link relevancy to get you to the top of the hit list, Google can't do that with the book scanning project. There are no links between the books to use. A recent post in The Atlantic reviews how Google "tunes" for books. According to the Nov. 1 post:
The system they've come up with has become increasingly sophisticated, as highlighted by their latest tweak, Rich Results, which begins rolling out this afternoon. The feature selectively presents you with one extra-large result when it detects that you're probably searching for an individual title and not a specific mote of information or general topic.
Rich Results is the latest in a series of smaller front-end tweaks that have been matched by backend improvements. Now, the book search algorithm takes into account more than 100 "signals," individual data categories that Google statistically integrates to rank your results. When you search for a book, Google Books doesn't just look at word frequency or how closely your query matches the title of a book. They now take into account web search frequency, recent book sales, the number of libraries that hold the title, and how often an older book has been reprinted.
So, if you search "Help" now, you get a big blow-up of Kathryn Stockett's 2009 book, not one of the dozens of other books with the same title. Or if you search "dragon tattoo," you get Stieg Larsson's blockbuster, not the 2008 children's book actually called Dragon Tattoo.
Some of the other wiz-bang features pointed out in the post include sorting by date, refining by subject, and a book specific suggest feature - this is that feature that tries to guess what you are looking for. Before this summer, it was not book specific but word specific so it was making ridiculous suggestions. Sort of like the legal spelling corrector in the III opac that never gets anything right.
This is pretty cool stuff for a free search engine. Even if you aren't a fan of the Google book scan project, you have to admire the work accomplished. For 15 million books!
Of course, one of the reasons they can add cool features like Rich Results or do this, the post goes on to say, is because the data for books is much more structured (thank you cataloging librarians). I wonder why our ILSs can't do cool things like this in a more cohesive way. Google is using the same data that we are. Could it be that our ILS vendors are just not bold enough, creative enough, or interested enough to get it all together? I'm not sure, but it is an interesting post and you should read it. (VS)