The DESI VII Workshop titled “Using Advanced Data Analysis in eDiscovery & Related Disciplines to Identify and Protect Sensitive Information in Large Collections” was held on the Strand Campus of King’s College in London on June 12, 2017. DESI VII was particularly focused on privacy, and presented numerous papers that examined emerging protocols and novel techniques for identifying and protecting sensitive information in large collections of data, with specific references to the following four areas:

  • E-Discovery.
  • EU Privacy Policies and the “right to be forgotten.”
  • Audits and Investigations.
  • Public Access Requests.

As part of the proceedings, a presentation and supporting article titled “When is a Chair not a Chair? Big Data Algorithms, Disparate Impact, and Considerations of Modular Programming” focused on the rapid growth in predictive algorithms based on “real world experience” data. This article and its associated presentation also examined a number of challenges associated with algorithms that worked as intended, but as they worked, also demonstrated the law of unintended (and unwanted) consequences. These unintended consequences had very serious legal, regulatory and court of public opinion repercussions that the workshop then discussed in detail.

The article first addressed the unfettered use of high-quality, real-world data for algorithm construction – especially when these types of algorithms are developed without significant human direction or oversight. When real-world data sets are used in cognitive computing-developed solutions, the results are often very good, especially when tied to the issues the developers were considering. But satisfactory commercial solutions based on “automated, algorithm-driven decision-making” can also lead to “digital redlining,” or a continuation of practices that manifested in ways that negatively impacted minorities and other protected classes. This concern was discussed in the context of certain types of mortgages that were defined by racial guidelines; the workshop also considered prospective minority homeowners who were guided away from certain neighborhoods, and analyzed instances where past decision-making directed the automated algorithm development.

That automated algorithm development also often ignored differential privacy or the so-called mosaic effects that knit together pictures of otherwise anonymous individuals, interweaving separate data sources into a comprehensive whole, unintentionally (or otherwise) unmasking people through that automated process. The possibilities associated with digital redlining and the mosaic effect both, in turn, presented a challenge to new privacy regimes, including the General Data Protection Regulation (GDPR), where algorithms incorporating this data would encounter fatal challenges when asked to execute an individual’s “right to be forgotten.”

In response, the article and the DESI VII discussion questioned those current practices of algorithm development that seemed to “carve a chair out of a single piece of wood,” creating a rigid product without the ability to tweak or modify the final output. In contrast, the article and the presentation promoted the use of modular programming to separate the steps of an algorithm’s development in order to better understand improper or unwanted outcomes.