« Happy Birthday Joe | Main | Authors File Amended Complaint Against the HathiTrust »
October 12, 2011
Traffickers of Very Expensive Online Legal Search: How do we use and teach today's legal search services when we don't know how the search engines work?
One can go back to when the issue of CALR was first much debated in the late 1970s, to view what some might characterize as a "luddite" response to the advent of very expensive online legal search but many of those articles sounded in the Great Unknowing at the time. The authors simply were not accustomed to thinking in terms of performing legal research in the context of selecting databases, using logical operators, overriding SEs predefined ordering of logical operators, performing segment searches, etc., and teaching others how to do the same. We are repeating history with a new Great Unknowing in very expensive online legal search. Unlike in the past, this is because advances in SE programing are metadata-rich proprietary information. This is not a luddite reaction calling for a return to terms and connections but our vendors are not providing sufficiently detailed information to replace our old Venn diagram understanding of yesterday's commercial legal search engines with something that maps today's more advanced SE programming.
Take for example, WestSearch. Does any practicing law librarian or legal research instructor really know how WestSearch's algorithms actually work? A fair number of specific WestSearch anomalies have been spotted by experienced legal researchers (that would be law librarians) that have been published in the blogosphere; you'll find some if you look for posts published 12-18 months ago when law librarians were given trial WLN accounts as well as posts published when academic law librarians were preparing to start instructing students in WLN. More recently, Anon, for example, has tested TR Legal marketing claims for WestSearch at The WestSearch Straitjacket For Legal Research - Thinking Beyond The Keyword, Part I and Part II. Ron Wheeler, for example, has closely examined some issues by way of trial-and-error testing, particularly the crowdsourcing "usage pattern" facet of WestSearch at Does WestlawNext Really Change Everything?: The Implications of WestlawNext on Legal Research, 103 LLJ 359 (2011). Do note that Ron Wheeler never got a definitive answer from TR Legal's WLN developers to the question, will crowdsourcing not miss what Wheeler calls "esoteric" materials? In other words, we do not have any idea if WestSearch will produce truely comprehensive search output for the diligent legal researcher. I do not rest any easier learning from Mike Dahn in his interview with Jason Wilson on rethinc.k that after testing, the contribution of the "usage patterns" component in WestSearch was reduced in WestSearch's algorithm.
None of those law librarian reports generate the kind of confidence one would hope from a very expensive 21st century search engine like WestSearch. All one can say is that this sort of trial and error process has been prompting the question: why did I get this WLN output in this search results display? This Great Unknowing also presents another important question, namely, how does one teach WLN to students, to members of the the bench and bar? Ah yes, all this is "proprietary," so much so that our vendors' reps and their sales managers who are trying to sell these services don't really know how WestSearch works.
TR Legal is a trafficker in online legal search using algorithms we know very little about except that some metadata is baked-in and crowdsourcing, oh, my bad, "usage patterns" are factors in WestSearch's algorithms. We have no clue what every factor is and how each factor is ranked because this is "proprietary." Therein lies the real problem with 21st very expensive online legal search.
For me, the issue is much broader than "usage patterns." What exactly are all the elements of the WestSearch algorithm and what is the ranking of each? How does each factor contribute to WLN search results. We don't know? Perhaps it wouldn't be an issue if long-time Classic Westlaw users didn't experience WLN search result display shock and then clicked on a button to switch to Classic Westlaw. What happens when Classic Westlaw disappears? Why? Because trial-and-error just isn't going to cut it. Hell, it can be damn expensive in conducting WLN research in the private sector.
This is hardly a new issue.
More than two decades ago, Robert Berring, speaking of electronic databases, wrote:
The danger of the high-end products is that each step in the research process that is carried out automatically by the front end system, is a step taken away from the purview of the researcher. Each decision that is built into the system makes the human who is doing the search one level further removed from the process.
Berring’s words should serve as a reminder to librarians and teachers of legal research. We must strive to understand as much of the research process as possible, even the steps carried out by online algorithms, so that we can develop and teach effective strategies for achieving our research goals.
(Citation omitted; emphasis added.) Quoting from the last paragraph of Ron Wheeler's LLJ article.
While not a new issue, I think it has become any increasingly more important one as greater complexity is being baked into today's very expensive legal search engines.
If we look at TR Legal's Patent Application, we can see the big picture. Download it here. If I was still teaching legal research, I would make it required reading for a class on WestSearch and I would use the graphics published in it, like the one displayed right (click to enlarge), as the 21st century but substantially less instructive version of Venn diagrams.
I'm very willing to embrace new search engines but the devil is in the details and those details are represented by the "lightening bolts" is some of the patent application's diagrams. According to Symbolism Wiki, a bolt of lightning "is a symbol of loss of ignorance. It also represents the punishment of humans from the Gods." When an experienced legal researcher has an all too human WTF reaction to WLN search results, it may be punishment from the gods of programmers in their equally all to human software routines. Our ignorance of WestSearch is based on not knowing how the algorithm (technically, algorightms) work. Mike Dahn's description in his rethinc.k interview, quoted below, does nothing to increase our knowledge with needed specificity:
To dramatically improve search beyond what standard keyword based search engines can do, our WestSearch algorithms primarily rely on our editorial enhancements, things like the Key Number System, KeyCite, Headnotes, Statutes Notes of Decision, and the language correlations we have in our proprietary indices – like “see also” references. We’ve literally been building up this collection of editorial enhancements for over a hundred years, and it provides both extraordinary search results and a significant competitive advantage over what others can do in the marketplace.
About the objectives of WestSearch in its development stage, Dahn explains:
One of our concerns was about user experience – we wanted researchers to get very noticeably better results – better enough to pay a premium for our new product. It couldn’t just be arguably better – it had to be noticeably better. Our other concern was a competitive one. We were investing a lot of money in WestlawNext, and in the search engine specifically. If employing usage data drove most of the benefits in terms of precision and recall, then our competitors could turn around quickly and do similar things. We needed to find out what mattered most and why.
Note that Dahn recognizes that upgrading very expensive online legal search is required to stay competitive. I personally don't believe a subscriber base needs to be paying for the corporate cost of staying competitive by paying a premium even if the search results were "noticeable better." That's how corporations maintain their competitiveness even in a duopoly. With TR Legal's profit margin plummeting from 33% to 25-26%, one might think the pricing gurus would consider eliminating a WLN premium since "noticably better" results might shore up its subscriber base, might even increase it.
But in this case, it is arguable to even say WestSearch results are even "arguable better" than Classic Westlaw except, perhaps, in caselaw research. Criticism from experienced legal researchers (ah, that would be us), have discussed some nasty results in federal and state statutory and regulatory research and in an apparent bias against secondary sources. We also know that West's traditional topical analysis (e.g., West Key Numbering System) now incorporated as metadata is caselaw top heavy. Perhaps that is why some WLN users find WestSearch "good" for caselaw, not so for statutory and regulatory law. Of course, we are merely speculating... . And, of course, those reports are products of trial-and-error research by law librarians.
TR Legal has had their WestSearch staff tackle some of those "anomalies" and WLN searches performed to test marketing claims. When they come up with different results that claim to contradict law librarian results, should we simply assume their claims are correct because TR Legal says so? I'm thinking it would be very interesting to see proof of those claims in ALR assignment fashion, namely the entire research log.
If we assume TR Legal's claims are true for the sake of argument (or proven to be true by law librarians replicating the research logs WestSearch staff used), what does that tell you? It tells me the WestSearch staff knows how the SE algorithms work, unlike us, their customer base, who do not have the same amount of detailed information.
Comprehending how legal search engines work must go far beyond trial and error. Law librarians are legal research specialists working in real time. Cost aside, we can perform online legal research using any SE as long as we can understand the output results based on a clear understanding of how the SE works. We can modify our online search results based on knowing how the SE actually works. The Great Unknowing is that today we don't know how 21st century SE algorithms work. Nor is TR Legal responding to this need in any substantive way. Want to sell me on the benefits of WLN, better provided more detailed information on how the algorithms produce the results I get (that is to say got when I gave it a test drive).
Law librarians and other legal information professionals are focusing on WestSearch, but WLN isn't the only 21st search engine that presents these issues. Bloomberg Law's SE "learns"? OK, how? Good luck trying to get an answer to that question. I guess we will have to wait and see if Lexis Advance learns from WLN's mistakes. Might be time to do some patent research.
Mike, buddy, no one is asking for the exact recipe of the secret sauce in WestSearch. The only people who would understand that would be search programming brainiacs employed by other very expensive online legal search vendors and they have/are probably reversing engineering WestSearch to dissect it. But just saying that X, Y and Z are part of the equation and that the "usage pattern" factor is part of that equation but was demoted in priority during testing isn't good enough. Every professional law librarian, including a former law librarian, knows that. [JH]
October 12, 2011 in Electronic Resource, Legal Research, Legal Research Instruction, Products & Services, Publishing Industry | Permalink