November 30, 2007

"Predictive Medical Information" in DNA Databases for Law Enforcement

One objection to amassing databases of DNA profiles for law enforcement purposes is that the profiles themselves contain "predictive medical information." E.g., Joh (2006). The average person might take this to mean that the profiles used for identification also reveal whether a person is at risk for particular diseases. Yet, no one knows how to make such predictions, and no physician would be interested in the allegedly "predictive medical information." Nevertheless, Professor Simon Cole (2007) of the University of California at Irvine suggests that such statements do not contradict the claim of forensic scientists that no one can use the set of numbers in the standard CODIS profiles to predict whether anyone represented in the database will develop any disease. In Cole's view, it is just a way of saying that someday, somehow, meaningful predictive value might be discovered.

So construed, the assertion of "predictive information" is irrefutable.  To give the claim some real meaning, however, one must show that the possibility that the identifying features will turn out to be predictive of disease is more than idle speculation. In this regard, Cole makes the following argument:

Presumably, Professor Kaye would respond that his extrapolation of the future is more defensible than others because it is “based on current knowledge and practice.” It may be more defensible, but that does not mean it is any more likely to be correct. Would the current capability of genetics have been predictable from the state of knowledge and practice in 1960? If not, there is no reason to assume that the capabilities of genetics in 2050—when the law enforcement DNA databases we are building today will likely still be in place and encompass a large portion of the population—must be wholly predictable from the current state of theory and knowledge.

Cole is referring a paper (Kaye 2007a) that explains why, in light of basic principles of statistics and genetics that date back to Bayes and Galton, the alleged predictive power of the profiles is likely to remain too slight to permit useful inferences about disease status or propensity. To reach this conclusion, the paper discusses of all the known ways in which the profiles in a database might be used to predict future diseases.

This is, of course, quite different from blithely assuming that the future will resemble the past. And, I think it is better than assuming that just because we know more about molecular biology and medical genetics than we did in 1960, we will be able to accomplish this particular feat by 2050. (Kaye 2007b). Perhaps we will -- such a development would not violate any known laws of physics. But do any readers have a more specific reason to suspect that the CODIS STRs profiles will turn out to powerful predictors of any medical conditions?

--DH Kaye

* On CODIS and STR profiles, see, for example, FBI brochure, NIJ webpage, NIST technical information

