## August 26, 2006

### Helping Legal Actors with Bayes’ Theorem

The importance of understanding base rates and Bayes’ Theorem cannot be overstressed, particularly in the case of many types of medical and scientific testimony. The importance of base rates is seen in the following problem: A disease occurs in 1% of the population, and a test has been developed which has an 80% accuracy rate (i.e., if you have the disease, there is a 80% chance the test will pick it up), and a 9.6% false positive rate (i.e., if you don’t have the disease, there is a 9.6% chance of getting a positive result anyway). Sam tests positive for the disease. What is the probability that Sam has the disease?

The general inclination is perhaps to say 90.4%, because the false positive rate is 9.6%. This conclusion, however, is wrong because it does not account for the rarity of the disease in the general population. (As doctors are often trained to think, if you hear hoofbeats, think horses, not zebras.) Using Bayes’ Theorem—and here I will spare the reader the mathematical details—one can show that the probability that Sam has the disease is 7.8%. Intuitively, this is because given the rarity of the disease, it is more likely that Sam is actually one of the false positives than one of the people with the disease. Short of being a math genius, however, crunching the numbers is extremely difficult to do intuitively, and merely plugging values into Bayes’ formula has a certain mystical quality that might make jurors (or judges) skeptical.

Psychological research by Gigerenzer & Hoffrage, however, suggests that people find analyzing the problem from a frequentist perspective far easier than from the probabilistic perspective shown above. We can see this by transforming the example above to series of frequencies: A disease afflicts 10 out of 1000 people in the population. For people with the disease, 8 out of 10 will have positive test results. For people without the disease, the test will still (erroneously) yield a positive result 95 out of 990 times. Sam tests positive for the disease. What is the probability that Sam has the disease?

The answer follows far more simply. Out of a population of 1000 people, 8+95=103 people will test positive. And of these 103 people, only 8 actually have the disease, so the probability that Sam has the disease is 8/103 = 7.8%.

Gigenrezer and Hoffrage write that only 16% of people presented with the percentages got the correct answer, compared to 46% of those presented with the frequencies.

This research is just one example of the many ways that psychology can help jurors and other legal actors grapple with scientific data and make better decisions. The underlying math is the same, but as we lawyers know, presentation is often half the battle (if not more). I’m indebted to Michael Bishop and J.D. Trout’s fascinating book, Epistemiology and the Psychology of Human Judgment (Oxford 2005) for calling my attention to this work and providing the above example. In their book, Bishop & Trout defend a new interdisciplinary approach to epistemology that takes advantage of current psychological research and argue how epistemology should be reconceived to help people make better everyday decisions.

--EKC

Listed below are links to weblogs that reference Helping Legal Actors with Bayes’ Theorem:

» What does a 18th Century Philosopher have to Offer the 21st? from Psychology of Compliance & Due Diligence Law
It is common to believe that our generation, has a monopoly of all the wisdom that is worth acquiring. But one of the great advances... [Read More]

Tracked on Aug 28, 2006 9:03:35 AM

I believe that there is a simpler way to understand Bayes Theorem. You simply need to fill out the 2 X 2 covariation table from the evidence and then read the conditional P(A|B) as the number of A's in the column/row named 'B'.

This should be a standard exercise in high school algebra courses, also.

Posted by: Michael Webster | Aug 28, 2006 11:43:21 AM

What is really going on here? Which of the following formulations of the problem are being compared, and what assumptions are being smuggled in?

Question 1. A disease occurs in 1% of the population. A test has a sensitivity of 80% (i.e., if you have the disease, there is an 80% chance the test says you do), and a 9.6% false positive rate (i.e., if you don’t have the disease, there is a 9.6% chance of getting a positive result anyway). Sam tests positive for the disease. What is the probability that Sam has the disease?

The Bayesian answer: The probability that Sam has the disease is 7.8%. There are some problems here, though, that become clear when one works through the solution. First, is Sam a random draw from the population? Nothing in the statement of the problem said that he was. Let's assume he is selected for testing at random, although, for all we know, he came to see the doctor because he was feeling sick, and people who feel sick are more likely to have a disease than perfectly healthy people. Assuming random selection, the prior probability that he has the disease is P(D) = 1/100.

Now, if Sam has the disease, the probability the test will say so is P(+|D) = 0.80. (This requires a further assumption -- that the test works the same for Sam as for the group on which it was calibrated.) If Sam does not have the disease, the probability of a positive result is P(+|ND) = 0.096. (The same assumption is being made here.)

There are two ways to end up with a positive test result. (A) It occurs when Sam has the disease. The probability of this sequence of events is the chance that Sam has the disease times the chance that the test says he does when he does: P(D)P(+|D) = (1/100)(8/10) = 8/1000 = 800/100000. (B) The positive result occurs when Sam does not have the disease. The probability of this sequence is P(ND)P(+|ND) = (99/100)(96/1000) = 9504/100000.

The probability that Sam has the disease given the positive test is the probability for (A) — he has disease and he tests positive — divided by the probability for (A) or (B) — he tests positive whether or not he has the disease: 800 / (800 + 9504) = 800/10304 =0.7764, or approximately 0.078, as claimed.

Question 2. A disease afflicts 10 out of 1000 people in the population. For people with the disease, 8 out of 10 will have positive test results. For people without the disease, the test will still (erroneously) yield a positive result 95 out of 990 times. Sam tests positive for the disease. What is the probability that Sam has the disease?

Answer (as stated by EKC). "Out of a population of 1000 people, 8+95=103 people will test positive. And of these 103 people, only 8 actually have the disease, so the probability that Sam has the disease is 8/103 = 7.8%."

Again, there are problems. What "population of 1000" has exactly 10 people with the disease? One would think that the population that the prevalence statistic applies to is larger. If so, how was the sample of 1000 obtained? What guarantees that it has exactly 10 cases of the disease? Random sampling does not guarantee this.

Notice also that the false postive rate of 95/990 differs from the one in Question 1. The rate in Question 1 was 9.6%, which is 96/1000, not 95/990 as in Question 2. My point is not that this is a big difference or that the posterior probabilities are very different in the two problems. Indeed, one might be tempted to dismiss these minor differences as round-off error. But as slight as they are, they reveal that G&H are doing a lot more than switching from percentages to "natural frequencies." They also are choosing specific denominators and rounding off the numerator, making the arithmetic far easier. In other words, they are presenting the question in a way that suggests a partial and informal solution of Bayes's theorem.

Let's try a third formulation of the problem that is a faithful translation of the problem from the percentages in Question 1 to "natural frequencies":

Question 3. A disease occurs in 1 in a hundred people in the population. A test has a sensitivity of 8 out of 10 (i.e., if you have the disease, then 8 times of of 10, the test will say you do), and a false positive rate of 96 per 1,000 (i.e., if you don’t have the disease, then 96 times out of 1,000, you will get a positive result anyway). Sam tests positive for the disease. What is the probability that Sam has the disease?

I seriously doubt that 46% of the population will get 7.8% for the answer to Question 3 as they did for Question 2. The contrast in the responses to Questions 1 and 2 is almost certainly due to something more than changing percentages to frequencies.

This does not undermine the broader point that how statistics or probabilities are presented can affect the impression they create. For instance, Jay Koehler has done some interesting work that has immediate implications for describing the import of trace evidence in some situations. See, e.g., J.J. Koehler, The Psychology of Numbers in the Courtroom: How to Make DNA Match Statistics Seem Impressive or Insufficient, 74 S. Cal. L. Rev. 1275-1306 (2001); J.J. Koehler, When Are People Persuaded By DNA Match Statistics?, 25 L. & Hum. Behav. 493-513 (2001) (both available at http://www.mccombs.utexas.edu/faculty/jonathan.koehler/articles.asp).

Posted by: DH Kaye | Aug 28, 2006 1:55:22 PM