October 29, 2007
Third Option for Libraries Interested in Scanning Projects
Computerworld has a good story on the efforts of Google and Microsoft to scan books into digital libraries. While the efforts are all worthy, libraries pay prices for working with both companies. The fee is not cash, but restrictions on how the scanned output is distributed. Google has a restrictive policy regarding what happens once a book is scanned. Google covers the complete cost of scanning a title. The digital file is given back to the library for it to use with its faculty, staff, and students. It cannot be shared with other search engines and cannot be distributed to other universities. In essence, Google controls the destiny of the digitized book. Microsoft is more forthcoming. They only restrict the file from use by other commercial competitors, not other academic institutions.
For these reasons, a number of libraries aren't going with either company to scan content. Some see the restrictions as not in keeping with the purpose of a library when it comes to sharing content. That's why they are willing to pay to have content scanned by the Open Content Alliance. The OCA has some pretty hefty collections as contributors, including the British Library, Columbia University, the University of Chicago, the University of Texas, the University of North Carolina at Chapel Hill, the University of Illinois at Urbana-Champaign, and a host of other heavyweights. The complete list of participating libraries is here.
The OCA says that it respects copyright and wants to work with copyright holders in determining what rights the public should have to a particular work. The public domain material would be free for anyone to use. One of the problems with any of the book scanning projects is the lack of government authored material which is essentially in the public domain. There are commercial efforts to scan the back content of the United States Serial Set. Congressional reports and documents since 1994 are available on the GPO Access web site in text and PDF formats. There really isn't much question about the status of these texts. Oddly, though, Google and Microsoft both treat them as copyrighted.
Take, for example, a House hearing held in 2005. The House Ways and Means Committee held a hearing entitled "Long Term Health Care" on April 19, 2005. GPO Access offers free download of the item. Google Books puts it in Full View without a download option. Microsoft doesn't offer the text at all as a result of obvious searches. Older federal government documents show up in snippet view or without a preview at all in Google. Microsoft has extremely old government documents available for view and download, but little to nothing of any substance for more modern materials. A whole class of documents that would be the easiest to make completely available to the public are missing or with limited access through the major library digitization projects. It makes no sense.
It's true that some congressional materials reprint copyrighted materials in appendices on occasion. These kinds of documents may be worth restricting by the search engines for legal reasons. There are, however, totally government authored materials that are restricted for no apparent reason. Again, it makes no sense. Google and Microsoft may be responding to the efforts by Lexis and others to digitize the complete United States Serial Set. A free version would seriously diminish the needs for a pay database. But that's the risk when someone wants to place a value on a collection of public domain materials. Some day there will be an organized open access database of government materials. The OCA may be the best bet so far for making that happen.
October 29, 2007 | Permalink
TrackBack URL for this entry:
Listed below are links to weblogs that reference Third Option for Libraries Interested in Scanning Projects: