« Meet the New Chief Marketing Officer for Wolters Kluwer Law & Business: Might Need a Master Class Taught by Dick Spinelli | Main | WestlawNext: Pros and Cons and General Comments from Law Librarians »
June 16, 2011
Converting and Correcting Bulk-Distributed State Code Text into Well-Formed HTML: Hershowitz on His California State Code Project
Ari Herdhwitz discusses the process he went through to turn California's statutory code text into structured HTML with hyperlinked internal references after downloading the code titles the State makes available via FTP. He notes that some of the text errors he found where probably produced by the State's own conversion of text from print to electronic format. "With almost all legal research now being done electronically, I think it's reasonable to expect official government electronic sources that can be relied upon." Quoting from Cleaning Up California Law: Errors in online sections.
That, however, would require that "primary legal materials, and the methods used to access them, should be authenticated so people can trust in the integrity of these materials." Plus, the task Hershowitz went through to convert the distributed text into well-formed HTML would not have been such a strenuous effort had technical standards for document structure, identifiers, and metadata had been implemented. See LAW.GOV's Principles and Declarations.
Hershowitz has written about the conversion process he used in the following very interesting Tabulaw blog posts:
- How to Convert All Files in a Directory: CA Legislation.
- How to Convert Citations to Hyperlinks: CA Laws.
- How to: Convert Sections Into Hyperlink Targets.
- How to convert Text to HTML: Using txt2html Perl Module.
- California Laws: Converting Plain Text to HTML.
- California Law: Recovering Meaning and Metadata with RegEx.
Hat tip to Free Government Information blog. [JH]