Linking Sensitive Data – Applications, Techniques, and Challenges

Introduction The linking of sensitive databases containing personal identifying information across organisations is an increasingly important task in application domains ranging from health and social science research to national censuses. Various techniques have been developed to facilitate the li...

Full description

Bibliographic Details
Main Authors: Peter Christen, Thilina Ranbaduge, Rainer Schnell
Format: Article
Language:English
Published: Swansea University 2020-12-01
Series:International Journal of Population Data Science
Online Access:https://ijpds.org/article/view/1475
id doaj-6b82ebd00160482b85ac9b7ae6d3e2f1
record_format Article
spelling doaj-6b82ebd00160482b85ac9b7ae6d3e2f12021-02-10T16:43:03ZengSwansea UniversityInternational Journal of Population Data Science2399-49082020-12-015510.23889/ijpds.v5i5.1475Linking Sensitive Data – Applications, Techniques, and ChallengesPeter Christen0Thilina Ranbaduge1Rainer Schnell2Research School of Computer Science, Australian National University, Canberra, AustraliaResearch School of Computer Science, Australian National University, Canberra, AustraliaResearch Methodology Group, University Duisburg-Essen, Duisburg, Germany Introduction The linking of sensitive databases containing personal identifying information across organisations is an increasingly important task in application domains ranging from health and social science research to national censuses. Various techniques have been developed to facilitate the linking of sensitive databases while at the same time preserving the privacy of individuals represented in these databases. Objectives and approach We present several case studies where the privacy-preserving linking of sensitive databases is crucial, and then discuss the advantages and limitations of existing algorithms and techniques to link sensitive databases. We discuss privacy techniques such as Bloom filter encoding, hashing, and secure multi-party computation, from the point of view of a linkage practitioner. We highlight those aspects that are important when selecting or implementing a privacy-preserving linkage technique within practical applications. Results Conceptually, linkage techniques can be evaluated across three main dimensions linkage quality, scalability to linking large or multiple databases, and the privacy protection provided by a technique. From a practical perspective, however, several other dimensions are crucial, including the availability of software or ease of implementation, technical knowledge available in an organisation, and the suitability of techniques for a given linkage scenario. Our analysis of a diverse range of linkage techniques has shown that currently no technique provides an adequate solution along all conceptual as well as all practical dimensions. Conclusions More research is required to develop novel techniques that facilitate the privacy-preserving linkage of large sensitive databases across organisations; including new encoding methods and cryptanalysis attacks (where until now most attacks have neglected the attack vectors that likely occur in practice), and novel evaluation measures to assess the privacy provided by linkage techniques. We encourage practitioners to be aware of the identified limitations – as well as the opportunities – of existing privacy-preserving linkage techniques and carefully assess the technical and organisational requirements of such techniques within their institution. https://ijpds.org/article/view/1475
collection DOAJ
language English
format Article
sources DOAJ
author Peter Christen
Thilina Ranbaduge
Rainer Schnell
spellingShingle Peter Christen
Thilina Ranbaduge
Rainer Schnell
Linking Sensitive Data – Applications, Techniques, and Challenges
International Journal of Population Data Science
author_facet Peter Christen
Thilina Ranbaduge
Rainer Schnell
author_sort Peter Christen
title Linking Sensitive Data – Applications, Techniques, and Challenges
title_short Linking Sensitive Data – Applications, Techniques, and Challenges
title_full Linking Sensitive Data – Applications, Techniques, and Challenges
title_fullStr Linking Sensitive Data – Applications, Techniques, and Challenges
title_full_unstemmed Linking Sensitive Data – Applications, Techniques, and Challenges
title_sort linking sensitive data – applications, techniques, and challenges
publisher Swansea University
series International Journal of Population Data Science
issn 2399-4908
publishDate 2020-12-01
description Introduction The linking of sensitive databases containing personal identifying information across organisations is an increasingly important task in application domains ranging from health and social science research to national censuses. Various techniques have been developed to facilitate the linking of sensitive databases while at the same time preserving the privacy of individuals represented in these databases. Objectives and approach We present several case studies where the privacy-preserving linking of sensitive databases is crucial, and then discuss the advantages and limitations of existing algorithms and techniques to link sensitive databases. We discuss privacy techniques such as Bloom filter encoding, hashing, and secure multi-party computation, from the point of view of a linkage practitioner. We highlight those aspects that are important when selecting or implementing a privacy-preserving linkage technique within practical applications. Results Conceptually, linkage techniques can be evaluated across three main dimensions linkage quality, scalability to linking large or multiple databases, and the privacy protection provided by a technique. From a practical perspective, however, several other dimensions are crucial, including the availability of software or ease of implementation, technical knowledge available in an organisation, and the suitability of techniques for a given linkage scenario. Our analysis of a diverse range of linkage techniques has shown that currently no technique provides an adequate solution along all conceptual as well as all practical dimensions. Conclusions More research is required to develop novel techniques that facilitate the privacy-preserving linkage of large sensitive databases across organisations; including new encoding methods and cryptanalysis attacks (where until now most attacks have neglected the attack vectors that likely occur in practice), and novel evaluation measures to assess the privacy provided by linkage techniques. We encourage practitioners to be aware of the identified limitations – as well as the opportunities – of existing privacy-preserving linkage techniques and carefully assess the technical and organisational requirements of such techniques within their institution.
url https://ijpds.org/article/view/1475
work_keys_str_mv AT peterchristen linkingsensitivedataapplicationstechniquesandchallenges
AT thilinaranbaduge linkingsensitivedataapplicationstechniquesandchallenges
AT rainerschnell linkingsensitivedataapplicationstechniquesandchallenges
_version_ 1724275173849300992