Thursday, September 11, 2014

Barocas and Selbst on Big Data's Disparate Impact

Solon Barocas and Andrew D. Selbst have posted "Big Data's Disparate Impact," which focues on the relationship between big data mining and Title VII.  Here's the abstract:

Big Data promises to replace faulty intuitions with facts, granting employers, advertisers, manufacturers, and scientists access to richer, more informed, and less biased decisionmaking processes. But where data mining is used to aid decisions, it has the potential to reproduce existing patterns of discrimination, inherit the prejudice of prior decisionmakers, or simply reflect the widespread biases that persist in society. Sorting and selecting for the best or most profitable candidates means generating a model with winners and losers. If data miners are not careful, that sorting can create disproportionately adverse results concentrated within historically disadvantaged groups in ways that look a lot like discrimination.

This Article examines the operation of anti-discrimination law in the realm of data mining and the resulting implications for the law itself. First, the Article steps through the technical process of mining data and points to different places where a disproportionately adverse impact on protected classes may result from what may seem like innocent choices on the part of the data miner. Decisions such as how to transform a problem into one that a computer can solve, how much data to collect and where to collect it, and how to label examples of "good" and "bad" outcomes, are all decisions that can render data mining more or less discriminatory. Alternatively, in a hypothetical case of perfectly executed data mining, enough information will be revealed so as to accurately sort according to pre-existing inequities in society. A disparate impact resulting from this second option would merely reflect an unequal distribution of the sought-after traits in the world as it stands as of the time of data collection.

From there, the Article analyzes the disproportionate impacts due both to errors and reflections of the status quo under Title VII. The Article concludes both that Title VII is largely ill equipped to address the discrimination that results from data mining. It further finds that, due to problems in the internal logic of data mining as well as political and constitutional constraints, there is no clear way to reform Title VII to fix these inadequacies. The article focuses on Title VII because it is the most well developed anti-discrimination doctrine, but the conclusions apply more broadly as they are based on our society’s overall approach to anti-discrimination.

A related working paper by Sarocas, "Data Mining and the Discourse of Discrimination," is available here.


--Sachin Pandya

Employment Discrimination, Scholarship | Permalink

TrackBack URL for this entry:

Listed below are links to weblogs that reference Barocas and Selbst on Big Data's Disparate Impact:


Post a comment