Practical approaches to big data privacy over time

Governments and businesses are increasingly collecting, analysing, and sharing detailed information about individuals over long periods of time. Vast quantities of data from new sources and novel methods for large-scale data analysis promise to yield deeper understanding of human characteristics, be...

Full description

Bibliographic Details
Main Authors: Altman, Micah (Author), Wood, Alexandra (Author), O'Brien, David R (Author), Gasser, Urs (Author)
Other Authors: Massachusetts Institute of Technology. Libraries (Contributor)
Format: Article
Language:English
Published: Oxford University Press (OUP), 2020-05-22T15:22:31Z.
Subjects:
Online Access:Get fulltext
Description
Summary:Governments and businesses are increasingly collecting, analysing, and sharing detailed information about individuals over long periods of time. Vast quantities of data from new sources and novel methods for large-scale data analysis promise to yield deeper understanding of human characteristics, behaviour, and relationships and advance the state of science, public policy, and innovation. The collection and use of fine-grained personal data over time, at the same time, is associated with significant risks to individuals, groups, and society at large. This article examines a range of long-term research studies in order to identify the characteristics that drive their unique sets of risks and benefits and the practices established to protect research data subjects from long-term privacy risks. We find that many big data activities in government and industry settings have characteristics and risks similar to those of long-term research studies, but are subject to less oversight and control. We argue that the risks posed by big data over time can best be understood as a function of temporal factors comprising age, period, and frequency and non-temporal factors such as population diversity, sample size, dimensionality, and intended analytic use. Increasing complexity in any of these factors, individually or in combination, creates heightened risks that are not readily addressable through traditional de-identification and process controls. We provide practical recommendations for big data privacy controls based on the risk factors present in a specific case and informed by recent insights from the state of the art and practice.
National Science Foundation (Grant 1237235)