## August 11, 2008

### The Birthday Problem in Las Vegas

The other week, an editorial in the Las Vegas Review-Journal misconstrued the now infamous 2001 findings of partial matches in the Arizona DNA database. The study was discussed on our blog on July 20, and I won't repeat the explanation of the birthday problem. You might think that people in Las Vegas would know more about winning combinations, but the editor presented the Arizona findings as proof that hundreds of thousands of Americans would be falsely incriminated by DNA profiling in criminal cases. His conclusion: "the odds of a 'coincidental match' with an innocent party -- the realistic odds, based on searches such as [the one in the Arizona database], not something out of astronomy book -- should be carefully explained."

The last remark got me thinking. Could the results of Arizona database study give an estimate of a random-match probability?

Mathematician Charles Brenner did some simplified calculations that can be adapted to this end. A standard DNA profile consists of 13 pairs of numbers. The numbers have to do with the lengths of various fragments of DNA at particular points (loci) on certain chromosomes. If a suspect and a crime-scene sample have the same fragment lengths at all 13 loci, then the match is strong evidence that the suspect (or an identical twin) is the source of the DNA. A partial match excludes the suspect as the source, and there are many ways for two 13-locus profiles to match in part but not in full. For example, Brenner pointed out that there are 715 ways to select 9 loci from the full 13. In the Arizona study, the analyst looked at all distinct pairings of the 65,493 people in the 2001 Arizona database.  (This is where the birthday problem, with its combinatorial explosion, comes in.) Brenner reported that 65,493 x 65,492/2, or approximately 2,140,000,000 pairs, were compared. Since each pair of genotypes were checked for all 715 ways to get a 9-locus partial match, some 715 x 2.14 x 109 = 1.5×1012 nine-locus comparisons were made. Only 122 yielded matches. The empirically determined proportion is therefore about 8 x 10-11, or 1 in 12 trillion.

Let's compare this number with a theoretical estimate of the random-match probability -- one that assumes statistical independence of DNA alleles and loci. Brenner presents 1/13.66 as the probability of a random match at a single locus. Assuming independence, the probability of an exact match at 9 out of 9 such loci would be (1/13.66)9, or 4.5 x 10-11.* This agrees rather well with the empirical value of 8 x 10-11 in Arizona.

If this numerical exercise is any indication, the approach favored by the Las Vegas editor will not change things. The "realistic" probabilities that can be quoted in court on the basis of the number of 9-locus matches still will be astronomically small.

--DHK

Note

* The probability of a partial match, that is, of a match at 9 loci and a mismatch at the remaining 4 loci would be 715 x (1/13.66)9 x (1 - 1/13.66)4 = 1/(3.1×107). Of course, nobody would introduce this number in a real case because such a partial match excludes the suspect as the source of the crime-scene DNA. Partial matches like the ones in the Arizona database are not used to convict anyone. Rather, they are of interest because they raise a question as to whether there is an an excess of partial matches compared to the numbers that would be expected if the usual random-match probabilities are accurate. If there is a surprising excess -- something that is not yet clear -- then perhaps the standard calculation of random-match probabilities needs to be altered.

References:

Charles Brenner, Arizona DNA Database Matches, Jan. 8, 2007, http://dna-view.com/ArizonaMatch.htm.

Editorial, DNA Evidence: What Are the Real Chances of Mistakes?, Las Vegas Review-Journal, Jul. 29, 2008,available at http://www.lvrj.com/opinion/26025944.html

D.H. Kaye, Letter, The Math Behind DNA Matching, Las Vegas Review-Journal, Aug. 01, 2008, available at http://www.lvrj.com/opinion/26171924.html