Statistical Disclosure Control for Micro-Data Using the R Package sdcMicro

The demand for data from surveys, censuses or registers containing sensible information on people or enterprises has increased significantly over the last years. However, before data can be provided to the public or to researchers, confidentiality has to be respected for any data set possibly contai...

Full description

Bibliographic Details
Main Authors: Matthias Templ, Alexander Kowarik, Bernhard Meindl
Format: Article
Language:English
Published: Foundation for Open Access Statistics 2015-10-01
Series:Journal of Statistical Software
Subjects:
Online Access:https://www.jstatsoft.org/index.php/jss/article/view/2377
id doaj-970f214ef55545c5bf7bb309491b824a
record_format Article
spelling doaj-970f214ef55545c5bf7bb309491b824a2020-11-24T23:46:53ZengFoundation for Open Access StatisticsJournal of Statistical Software1548-76602015-10-0167113610.18637/jss.v067.i04934Statistical Disclosure Control for Micro-Data Using the R Package sdcMicroMatthias Templ0Alexander Kowarik1Bernhard Meindl2Vienna University of TechnologyStatistics AustriaStatistics AustriaThe demand for data from surveys, censuses or registers containing sensible information on people or enterprises has increased significantly over the last years. However, before data can be provided to the public or to researchers, confidentiality has to be respected for any data set possibly containing sensible information about individual units. Confidentiality can be achieved by applying statistical disclosure control (SDC) methods to the data in order to decrease the disclosure risk of data. The R package sdcMicro serves as an easy-to-handle, object-oriented S4 class implementation of SDC methods to evaluate and anonymize confidential micro-data sets. It includes all popular disclosure risk and perturbation methods. The package performs automated recalculation of frequency counts, individual and global risk measures, information loss and data utility statistics after each anonymization step. All methods are highly optimized in terms of computational costs to be able to work with large data sets. Reporting facilities that summarize the anonymization process can also be easily used by practitioners. We describe the package and demonstrate its functionality with a complex household survey test data set that has been distributed by the International Household Survey Network.https://www.jstatsoft.org/index.php/jss/article/view/2377confidentiality, micro-data, statistical disclosure control, R
collection DOAJ
language English
format Article
sources DOAJ
author Matthias Templ
Alexander Kowarik
Bernhard Meindl
spellingShingle Matthias Templ
Alexander Kowarik
Bernhard Meindl
Statistical Disclosure Control for Micro-Data Using the R Package sdcMicro
Journal of Statistical Software
confidentiality, micro-data, statistical disclosure control, R
author_facet Matthias Templ
Alexander Kowarik
Bernhard Meindl
author_sort Matthias Templ
title Statistical Disclosure Control for Micro-Data Using the R Package sdcMicro
title_short Statistical Disclosure Control for Micro-Data Using the R Package sdcMicro
title_full Statistical Disclosure Control for Micro-Data Using the R Package sdcMicro
title_fullStr Statistical Disclosure Control for Micro-Data Using the R Package sdcMicro
title_full_unstemmed Statistical Disclosure Control for Micro-Data Using the R Package sdcMicro
title_sort statistical disclosure control for micro-data using the r package sdcmicro
publisher Foundation for Open Access Statistics
series Journal of Statistical Software
issn 1548-7660
publishDate 2015-10-01
description The demand for data from surveys, censuses or registers containing sensible information on people or enterprises has increased significantly over the last years. However, before data can be provided to the public or to researchers, confidentiality has to be respected for any data set possibly containing sensible information about individual units. Confidentiality can be achieved by applying statistical disclosure control (SDC) methods to the data in order to decrease the disclosure risk of data. The R package sdcMicro serves as an easy-to-handle, object-oriented S4 class implementation of SDC methods to evaluate and anonymize confidential micro-data sets. It includes all popular disclosure risk and perturbation methods. The package performs automated recalculation of frequency counts, individual and global risk measures, information loss and data utility statistics after each anonymization step. All methods are highly optimized in terms of computational costs to be able to work with large data sets. Reporting facilities that summarize the anonymization process can also be easily used by practitioners. We describe the package and demonstrate its functionality with a complex household survey test data set that has been distributed by the International Household Survey Network.
topic confidentiality, micro-data, statistical disclosure control, R
url https://www.jstatsoft.org/index.php/jss/article/view/2377
work_keys_str_mv AT matthiastempl statisticaldisclosurecontrolformicrodatausingtherpackagesdcmicro
AT alexanderkowarik statisticaldisclosurecontrolformicrodatausingtherpackagesdcmicro
AT bernhardmeindl statisticaldisclosurecontrolformicrodatausingtherpackagesdcmicro
_version_ 1725491902575280128