April 6, 2006
D-Lib Examines the Million (Digital) Book Library
The March 2006 issue of D-Lib is a special issue that focuses on one topic, the evolution of the digital library. See Table of Contents
Of particular interest is Daniel Cohen's contribution, Data Mining Large Digital Collections. Cohn's writes that through his experience building APIs he has learned three lessons. Lessons, Dan writes, "that are not entirely in accord with key premises of some of those working in the world of digital libraries." They are:
- More emphasis needs to be placed on creating APIs for digital collections.
- Resources that are free to use in any way, even if they are imperfect, are more valuable than those that are gated or use-restricted, even if those resources are qualitatively better.
- Quantity may make up for a lack of quality.
Most controversial of course is Cohn's claim that quantity may trump quality. Cohn explains:
High-quality digitization and thorough text markup may be attractive for those creating digital collections, but a familiarity with information theory and data-mining techniques makes one realize that it may be more worthwhile to digitize a greater number of books or documents at a lower standard for the same cost. ... In our libraries – once analog and now digital – we come across countless similar phrases, wordings, and facts in numerous books, and many books refer to each other through footnotes and bibliographies. This repetition and cross-referencing should allow us to create tools for mining the vast information and knowledge that lies within the nearly limitless digital collections we are about to encounter.
TrackBack URL for this entry:
Listed below are links to weblogs that reference D-Lib Examines the Million (Digital) Book Library: