Wednesday, February 28, 2018

Troubling Study: Results may vary in legal research databases

How consistent are searches among various legal research data bases?  The answer is that they can vary considerably, and this is very troubling for the reliability of doing online research.

ABA Journal (Susan Nevelow Mart), Results may vary in legal research databases.

"In a comparison of six legal databases—Casetext, Fastcase, Google Scholar, Lexis Advance, Ravel and Westlaw—when researchers entered the identical search in the same jurisdictional database of reported cases, there was hardly any overlap in the top 10 cases returned in the results. Only 7 percent of the cases were in all six databases, and 40 percent of the cases each database returned in the results set were unique to that database. It turns out that when you give six groups of humans the same problem to solve, the results are a testament to the variability of human problem-solving. If your starting point for research is a keyword search, the divergent results in each of these six databases will frame the rest of your research in a very different way."

"The study also looked at the age of cases that were returned in each search. Overall, the oldest cases dominated Google Scholar’s results. Almost 20 percent of the results from Google Scholar were from 1921 to 1978. The highest percentage (about 67 percent) of newer cases were returned by Fastcase and Westlaw. Ravel and Lexis Advance had an average of 56 percent newer cases."

"Another area of diversity was the number of cases each database returned. The median number of cases returned in response to the same search varied from 1,000 for Lexis Advance to 70 for Fastcase. Casetext, Ravel and Westlaw each returned 180 results at the 50th percentile and Google Scholar returned 180. Each algorithm is set to determine what is responsive to the same search terms in vastly different ways."

"For the most part, these algorithms are black boxes—you can see the input and the output. What happens in the middle is unknown, and users have no idea how the results are generated."

In sum, what this study shows is that a researcher cannot rely on just one legal database or one approach to research.  Most of us have known this for a long time.  Maybe this study will help us convince our students.

(Scott Fruehwald)

| Permalink


Post a comment