ContractsProf Blog

Editor: Jeremy Telman
Oklahoma City University
School of Law

Friday, August 18, 2023

Will Artificial Intelligence Make Contract Interpretation Easier?

In case you've been on a Barbieheimer binge for the last few weeks and missed it, Yonathan Arbel (below, left) and David Hoffman (below, right) have posted Generative Interpretation on SSRN, and the early reviews, e.g. here, are glowing.  A new land speed record has been established in that Jeff Lipshaw has already published his review on SSRN.

The thesis is that large language models (LLMs) can help us resolve the meaning of ambiguous contracts phrases.  The authors start with the much-cited case arising out of flooding caused by levee breaches caused by Hurricane Katrina.  The issue was whether the insurer's exclusion for damages caused by "floods" covered floods that resulted from human negligence.  The Fifth Circuit determined that it did.  It did so, say the Authors, employing "the most artisanal and articulated form of textualism available in late-stage Capitalism," a hodge-podge of dictionaries, encyclopedias, treatises, and caselaw.  That sounds bad, but the alternative, say the Authors, was "kitchen-sink contextualism," which they say, has a foul odor.  That odor is the stench of illegitimacy generated by suspicions that existing interpretive methods merely provide cover for motivated reasoning.

Arbel-YonathanWell, the Authors offer an appropriate cleanser to address bad kitchen odors.  It produces the reliability that purports to come with textualism's faux objectivism and the situation-specific accuracy that is squishy-minded contextualism's raison d'etre.  Generative interpretation can be used to establish not only what the words most likely meant in context but also what the parties most likely meant.  They deploy generative interpretation to show that "caused by flood" very reasonably could mean "caused by a flood that resulted from a levee break."  So the Fifth Circuit got it right.  But the model also usefully shows that Louisiana courts err in refusing to allow the flood exemption to cover floods that result from a failed water main, because ordinary language does connect floods with failed water mains and so the exemption arguably unambiguously exempts damage from such floods from coverage.  Courts would be correct, however, to reject any argument that the exclusion covers floods caused by joy, language models tell us, should any such case arise.  

The authors concede that even homely twentieth-century style textualism gets the right result with respect to flood exemptions.  But LLMs do so more convincingly and more unassailably because of the vast amounts of data they incorporate.  In this respect, they are like corpus linguistics, which has been with us for a while, but has not really changed the landscape much. 

In a presentation that I witnessed and which made a lasting impression on me, Stanley Fish voiced his powerful skepticism with respect to corpus linguistics approach here.  He poses for himself the rhetorical question, when will database-based approaches like corpus linguistics provide us with definitive answers to interpretive questions?  He answers, quoting King Lear contemplating the body of his "poor fool," Cordelia, "Never, never, never, never, never!"  This is so, because it usually tells us, as in the "flood" example, things we already know.  The rest of the time, it tells us nothing definitive, because, while it can tell us both the ordinary language meaning of a term and the technical meaning of the term, and we have no way of knowing whether the speaker intended the words in their ordinary sense or their technical sense or in some idiosyncratic sense that we can only learn, if at all, from a deposition.

But modern AI takes us a step beyond mere corpus linguistics, the authors tell us, because LLMs have the ability to become context sensitive.  They can consider not only the contract text but also relevant extrinsic evidence.  In addition, the authors make two claims on behalf of use of LLMs for interpretation.  First, because this technology makes textualism cheap, it provides access to justice to low-income litigants who might not otherwise have the resources to engage in the artisanal textualism of late-stage capitalism.  Second, LLMs can cut through the opposition between textualism and contextualism.  Parties inclined to trust the objectivity of textual approaches might well prefer contextualism when it comes with the data-driven precision of generative interpretation.  And of course, the authors remind us, generative interpretation is not robot judging; it is a tool that serves as an aid to interpretation for real, flesh-and-blood attorneys to deploy in their arguments before real, flesh-and-blood adjudicators.

Hoffman_David_Feb2023_Resized_v3Contracts interpretation is about prediction.  Yogi Berra is reputed to have said that it is hard to make predictions -- especially about the future.  But contracts interpretation is about reconstructing what the parties would have said at the time of contracting about the meaning of a provision.  Yogi Berra was wrong.  What's really hard is making predictions about the past!  

In Part I, the Authors paint portraits of textualism and contextualism, warts and all.  They then discuss corpus linguistics as well as Omri Ben-Shahar and Lior Strahilevitz's proposal that courts supplement traditional textualist approaches with survey data to add context.  Corpus linguistics is limited because it can only analyze small snippets of text; the value of survey data seems limited, at least thus far, to the trademark context.  

In Part II, the Authors test drive their LLM model, applying it to some real-world contracts and situations.  And those models provide some pretty clear indications of the most likely meanings of contested contracts.  The authors claim to have established that LLMs can provide a stronger, cheaper, and more robust form of textualism.  I think that sounds about right, given my view of the limitations of textualism.

Let me push back a bit.  The authors are cautious, as my phrases in the previous paragraph, "pretty clear" and "most likely," suggest.  So, if LLMs tell us that there is an 80% probability that the language has the insurer's meaning, does that mean the contract is unambiguous in the contra proferentem context?  Is 85% enough? 90%?  It seems like LLMs are most useful in establishing ambiguity in the face of an overconfident court (we're looking at you NY Court of Appeals!), but establishing the lack of ambiguity may not be LLMs' strong suit.  They look at everything -- they are bound to find outlier usages, and so we still would need to supplement LLMs' with more conventional discovery tools.

As to cheaper, I think the Authors, who have great facility with the technology at issue, either underestimate the costs of designing, running, and explicating to a trier of fact the sorts of experiments involved in their work, or they tragically undervalue their unique talents.  Let's assume that the cost of actually using LLMs is very low.  Still, someone has to design the inputs, so the basis of the work will be an expert report.  It will be interesting to see if parties do not end up coming to court with dueling versions of what the LLMs say the contract means. For example, when the Authors introduce extrinsic evidence to show the LLMs grappling with context, one model finds that a phone call moves the likelihood of a certain interpretation of the contract from 10% to 20%.  The other finds that the same call moves the likelihood of the same interpretation from 10% to 75%.  Results like this suggest that, at least with respect to some interpretation issues, investing in LLMs will yield nothing definitive.

The experts who generate findings using LLMs will have to explain those findings to the adjudicators in ways that make the material comprehensible.  The Authors write with both clarity and flair, but I have to admit when they start talking about the "temperatures" of the various models, I cannot summon up a notion that maps onto how LLM models work.  So there are challenges of translation when people who work with LLMs have to explain their findings to fact finders who don't.  I concede that this is a better version of textualism, but is it cheaper than a judge, four dictionaries, the briefs, and prior case law construing the terms?  

Part III begins by acknowledging that courts may already be making use of ChatGPT to assist them on interpretive issues, just as they may have used Google before.  That fact does not fill me with confidence that courts will produce better works of contracts interpretation.   As the Authors know, attorneys make unsupervised use of the technology at their peril.  Courts do so at our peril, but they may also may be overturned by less adventurous courts of appeal.  Still, the Authors are no doubt right that the technology will inevitably become yet another interpretive tool, and their work is highly suggestive of how it can be well-used.

But in Part III, the Authors also make their case that ease of access to LLMs addresses access to justice problems.  They think that judges can now use generative interpretation to level the playing field in asymmetrical litigation.  They also think that more cases will settle because LLMs will give the parties a better sense of the likely outcome of litigation.  With admirable thoroughness and thoughtfulness, the Authors then lay out the roadblocks ahead -- all of the ways in which use of LLMs has gone wrong and may continue to go wrong.  

But in the end, they are still quite optimistic about the future of generative interpretation.  It has all of the advantages of old-time textualism, with the added advantages of contextualism, without the main disadvantage to contextualism, which is expense.  

Criticisms aside, this is a great article.  The premise is so obvious, one smacks one's head and exclaims, why didn't I think of that?!?  But it would be hard to top the execution.  The Authors deploy their understanding of LLMs so that they can not only assert but demonstrate their usefulness.  They then deftly illustrate that understanding with well-chose examples and hypotheticals presented with scholarly rigor and narrative flare.

Contract Profs, Recent Scholarship | Permalink


In fairness, Jeremy, like most reviews, it uses their wonderful piece as a springboard for the reviewer's own idea, which (as here) often don't really have a lot to do with what the book or article had to say. And I had been thinking about, and even writing and storing, some of this for a long time. David and Yonathan simply jogged me out of my semi-retired inertia.

Posted by: Jeff Lipshaw | Aug 18, 2023 11:28:37 AM