May 17, 2008

Reproducible Analyses

John Cook has an interesting discussion about the problem of reproducing statistical analyses here.  (Thanks to Andrew Gelman for the link). 

The problem is this: even given the same dataset, statistical analyses are often difficult to replicate for a variety of reasons.  The process of analysis involves many (even if perfectly legitimate) data manipulations, and it is nearly impossible to document all of these in an expert report.   The opposing party then has to guess at what the expert did, an inefficient and imperfect exercise in reverse engineering.  This practice arguably exacerbates the battle of the experts problem.  Not only do the opposing experts reach different conclusions, but why they do so is unnecessarily hidden from view.  I recall seeing this problem regularly with economic analyses when I clerked -- it was nearly impossible to reconstruct how the expert reached his/her conclusions.

To the extent that Daubert emphasizes transparency and reasoned decisionmaking, it seems appropriate that requiring the disclosure of scripts and other code be part of the process.  That way, opposing experts can zero in on methodological differences.  This transparency by no means guarantees that legal actors will understand the differences, but it's a start.

Cook notes that the Sweave software package helps address this issue by allowing users to mix text (in LaTeX) and code (in R).  Unfortunately, the fact that most attorneys have no contact with LaTeX or R  suggests that we have a long way to go on this issue.


It is important the the processes behind decisions are transparent and accountable. Otherwise, abuses can and will occur. Independent investigation (as in investigative journalism or private investigators) is also a way to ensure sound reasoning behind desisions.

Posted by: piworkshop | May 22, 2008 12:48:53 PM

