August 9, 2010
More on That Google Book Count
Ars Technica is reporting that Google's estimate of books in the world is "bunk." Why? Because Google used notoriously bad metadata in distinguishing between titles, editions, and other variations on a work. Google blames the libraries where the metadata is sourced. Not so fast, critics say. Bibliographic records for older works may have incomplete descriptions, but they aren't populated with data that's plain wrong. There are examples, and they are available here, and here. Google, for its part, opened up on how it acquires and processes metadata at an ALA Forum held last January.
Google, though, still has problems when it gets inconsistent data from libraries and other sources. That's not a surprising as it sounds given that Google relies more on machines than physical people to make sense of data. It was when Google announced Legal Opinions and Journals as part of Google Scholar that it became known that as few as 3 people actually run that operation. Add library records that have legitimate variations in metadata and other bibliographic descriptions and its no wonder that Google metadata can be all over the place. There are no teams of catalogers pouring over book records in Mountain View.
So, when considering the actual number of unique titles/editions in the world, Google's finite number of 129,864,880 books as of August 5th, 2010 will settle drunken bar bets, but not much else. I'll add the question, is this a number even worth knowing? [MG]