June 15, 2009

Are Lawyers Competent to Construct Keyword Searches?

This issue raises its head every now and then in the context of electronic discovery.  One of the latest opinions on this comes from Magistrate Judge Andrew J. Peck in the Southern District of New York.  The case is William A. Gross. Constr. Assocs., Inc. v. Am. Mfrs. Mut. Ins. Co., 256 F.R.D. 134 (S.D.N.Y. Mar. 19, 2009).  It involves delays in the construction of the Bronx Criminal Court Complex.  The Dormitory Authority of the State of New York (DASNY) owns the project and agreed to produce emails from non-party Hill International, the project construction manager, as part of the discovery.  The parties could not agree on keywords for locating the relevant emails.  DASNY proposed one set of words that were specific to the project, but possibly too narrow to give comprehensive results.  The other side proposed terms that were broad, and almost generic enough to include the entire email archive.

The opinion, and others quoted in it, suggest that the problem here is more than simply a dispute between litigants.  Judge Peck thinks it may be that lawyers simply don't have a handle on what they are trying to find.  That includes more than the information in the relevant electronic archive.  The context is an archive of emails.  Searching it should take into account how individual emails were created, their purpose, how they are stored, and the form of the documents.  Practically every vendor of e-discovery systems offer contextual search using Boolean style connectors.  One would think that with some 30 or more years using a similar and  sophisticated search strategy with Westlaw and Lexis that constructing keyword searches in document sets wouldn't be that much of a problem.  Apparently it is.

Westlaw and Lexis are really misleading in this regard, and it's not their fault.  It's easy with practice and experience to extract relevant documents from Westlaw and Lexis once an individual masters the search strategy.  There is a combination of knowing the terminology of the legal subject, what kinds of documents are in an individual database, and some thought in the use of language as to how these legal concepts appear in text.  The misleading part is that these massive collections of documents have similar structure.  Cases have captions, docket numbers, counsel lists, authors, and a stylized language that uses consistent terms of art.  Statutes, law review articles,long-form commentary such as treatise, and even newspaper articles have enough of a regular structure that makes searching within them relatively easy for an experience searcher. An archive of emails or irregular documents is another matter entirely.  Westlaw and Lexis carefully run their additions through editors to eliminate misspelling's and other typographic issues.  Even then problems crop up.  Raw archives present problems with typos, short form language (LMAO anyone?), incomplete sentences and other off the cuff communication syntax.

Judge Peck bluntly lays it out:

This case is just the latest example of lawyers de-signing keyword searches in the dark, by the seat of the pants, without adequate (indeed, here, apparently without any) discussion with those who wrote the emails.

He quotes from Judge Grimm in the Victor Stanley case:

While keyword searches have long been recognized as appropriate and helpful for ESI search and retrieval, there are well-known limitations and risks associated with them, and proper selection and implementation obviously involves technical, if not scientific knowledge.

* * *

Selection of the appropriate search and information retrieval technique requires careful advance planning by persons qualified to design effective search methodology. The implementation of the methodology selected should be tested for quality assurance; and the party selecting the methodology must be prepared to explain the rationale for the method chosen to the court, demonstrate that it is appropriate for the task, and show that it was properly implemented.   Victor Stanley, Inc. v. Creative Pipe, Inc., 250 F.R.D. 251, 260, 262 (D.Md. May 29, 2008) (Grimm, M.J.).

And from Judge Fasciola in the O'Keefe case:

Whether search terms or “keywords” will yield the information sought is a complicated question in-volving the interplay, at least, of the sciences of computer technology, statistics and linguistics. Given this complexity, for lawyers and judges to dare opine that a certain search term or terms would be more likely to produce information than the terms that were used is truly to go where angels fear to tread. This topic is clearly beyond the ken of a layman and requires that any such conclusion be based on evidence that, for example, meets the criteria of Rule 702 of the Federal Rules of Evidence.  United States v. O'Keefe, 537 F.Supp.2d 14, 24 (D.D.C.2008) (Facciola, M.J.).

Judge Peck suggests applying some thought to conducting searches, and that counsel cooperate.  He also suggests using sampling techniques to see what's there and then refining the search.  This is a technique that librarians and other information professionals use all the time.  Sometimes one has to get a sense of what the archive contains before constructing specialized searches within it.  That means finding out about what is being searched.  Competent information management in discovery may mean that lawyers may need to hire an expert when necessary.  Judge Peck noted that in passing through other cases he cited.  He also endorsed the principles of the Sedona Conference, available on the Internet.  Ironically, the site address in the opinion was malformed.  The correct address is http://www.thesedonaconference.org.  (Judge Peck, or at least the editors of F.R.D. used spaces to separate elements of the domain name.) 

From the site description:

The Sedona Conference® is a nonprofit, 501(c)(3) research and educational institute dedicated to the advanced study of law and policy in the areas of antitrust law, complex litigation, and intellectual property rights. Through a combination of Conferences, Working Groups, and the "magic" of dialogue, The Sedona Conference® seeks to move the law forward in a reasoned and just way. The Sedona Conference® succeeds through the generous contributions of time by its faculties and Working Group members, and is able to fund its operations primarily through the financial support of its members, conference registrants, and sponsorships. See "About Us," "Working Group Series," and "Sponsorships" for further details.

Electronic discovery and best practices associated with it are one of the areas covered in detail at the site.  Much of the information there is free with registration.

