« Ms. Gloria Marcela Orrego Hoyos, Recipient of the FCIL Schaffer Grant for Foreign Law Librarians, to Speak at AALL's Annual Meeting in Philly on the Role of Libraries and Archives in Reconstructing Memory of Argentina's Dirty War | Main | Thomson Reuters Is Unloading BAR/BRI and Has Anyone Noticed that Lexis Unloaded Some Updated Monographs on Juris? »
April 28, 2011
The Chesapeake Digital Preservation Group's Fourth Annual Analysis Finds the Pace of Link Rot May Slowing Down but ...
Link rot is still present in more than 30% of the URLs in the Group's sample of URLs originally collected in 2007 and 2008. Do note that the sample includes URLs primarily from state government (.state.__.us), government (.gov), and organization (.org) top-level domains.
The Chesapeake Group conducted its first link rot assessment at the program's one-year mark in 2008. During the program’s first year, 1,266 online titles were harvested preserved within the digital archive. A random sample of 579 titles from the archive was generated for the link rot study, ensuring results at a 95 percent confidence level and confidence interval of +/- 3. When this sample was first analyzed in March 2008, link rot was found to be present in 48 of 579 URLs, or 8.3 percent.
One year later, in 2009, the sample was analyzed a second time as part of the program's second-year evaluation. The second analysis demonstrated that link rot was present in 83 out of the original sample of 579 URLs. In other words, 14.3 percent of the archived titles had disappeared from their original URLs within 12 to 24 months of harvest.
By March 2010, the prevalence of link rot had increased to 160 out of 579 URLs. Within two to three years of harvest, link rot among the sample URLs had increased to 27.9 percent, compared to 14.3 percent in 2009 and 8.3 percent in 2008.
The current March 2011 analysis shows that 176 URLs have succumbed to link rot within a period of 12 to 48 months. This means that 30.4 percent, or nearly one-third, of the archived titles have disappeared from their original URLs. Although this figure is significant, it represents only an additional 2.5 percent of URLs lost to link rot within the past year.
Whereas the prevalence of link rot among URLs in the sample nearly doubled every year during the first three years of the study, it slowed significantly in the fourth year.
Another snip from this very informative Report:
In the original 2008 analysis, link rot was present in 10.8 percent of URLs with state top-level domains, 10 percent of URLs with government top-level domains, and 8.3 percent of URLs with organization top-level domains. Education (.edu) and commercial (.com) URLs were found to have relatively high inactivity levels of 11.8 and 15.4 percent in 2008, respectively.
In 2009, the prevalence of link rot increased among URLs with state, government, organization, education, network (.net), military (.mil), and information-oriented (.info) top-level domains. URLs with organization top-level domains increased significantly in 2009, to 35.3 percent from 11.8 percent in 2008, while no increase in link rot among commercial URLs was observed.
The 2010 analysis of the sample showed link rot to be present in more than 32 percent, nearly one-third, of the URLs with a state-government top-level domain. Link rot was found in more than 22 percent of URLs with an organization top-level domain and in 25 percent of government URLs. Commercial and network URLs both experienced a jump in link rot to nearly 30 percent among .com domains, and to more than 27 percent among .net domains. The single IP address and.uk top-level domain in the sample also succumbed to link rot in 2010.
New and interesting patterns among top-level domains emerged in 2011. While .org and .gov URLs continued to demonstrate an increase in link rot, link rot among state government and academic URLs actually began to reverse.
Link Rot and the Digital Archive Today. Also note that "[f]or the present analysis, a new, separate sample was generated representing all of the content in the archive at the time of the program’s fourth anniversary. In the four years since the program began, 3,246 born-digital online titles were harvested from the Web and preserved within the digital archive. A random sample of 803 titles was selected for the link rot study, ensuring results at a 95 percent confidence level and confidence interval of +/- 3."
For a detailed analysis, see "Link Rot" and Legal Resources on the Web: A 2011 Analysis by the Chesapeake Digital Preservation Group. Highly recommended.
Endnote. Hat tip to Sarah Rhodes, Digital Collections Librarian, Georgetown Law Library, for the heads-up. Participants in the Chesapeake Group include the Georgetown and Harvard Law Libraries and the State Law Libraries of Maryland and Virginia. Professionally speaking, I think we are all indebted to the law librarians who have dedicated their time and effort over the course of the last four years by executing this continuing series which provides an empirically sound analysis of link rot. As noted in the Group's announcement of its latest findings, this is National Preservation Week 2011 and their work product also is a valuable contribution in that context.
The Chesapeake Group is a founding member of the Legal Information Preservation Alliance (LIPA) Legal Information Archive, a collaborative digital preservation program for the law library community. For more information, visit the LIPA Web site or the Chesapeake Group website. [JH]