Using the Bootstrap to Account for Linkage Errors when Analysing Probabilistically Linked Categorical Data

Record linkage is the act of bringing together records that are believed to belong to the same unit (e.g., person or business) from two or more files. Record linkage is not an error-free process and can lead to linking a pair of records that do not belong to the same unit. This occurs because linkin...

Full description

Bibliographic Details
Main Authors: Chipperfield James O., Chambers Raymond L.
Format: Article
Language:English
Published: Sciendo 2015-09-01
Series:Journal of Official Statistics
Subjects:
Online Access:https://doi.org/10.1515/jos-2015-0024
id doaj-99954f282d0e4de281a1a02b51d3214c
record_format Article
spelling doaj-99954f282d0e4de281a1a02b51d3214c2021-09-06T19:40:51ZengSciendoJournal of Official Statistics2001-73672015-09-0131339741410.1515/jos-2015-0024jos-2015-0024Using the Bootstrap to Account for Linkage Errors when Analysing Probabilistically Linked Categorical DataChipperfield James O.0Chambers Raymond L.1Australian Bureau of Statistics, Methodology Division, P O Box 10, Belconnen, Australian Capital Territory 2616 AustraliaUniversity of Wollongong, National Institute for Applied Statistics Research, Northfields Avenue Wollongong, New South Wales, 2500 AustraliaRecord linkage is the act of bringing together records that are believed to belong to the same unit (e.g., person or business) from two or more files. Record linkage is not an error-free process and can lead to linking a pair of records that do not belong to the same unit. This occurs because linking fields on the files, which ideally would uniquely identify each unit, are often imperfect. There has been an explosion of record linkage applications, particularly involving government agencies and in the field of health, yet there has been little work on making correct inference using such linked files. Naively treating a linked file as if it were linked without errors can lead to biased inferences. This article develops a method of making inferences for cross tabulated variables when record linkage is not an error-free process. In particular, it develops a parametric bootstrap approach to estimation which can accommodate the sophisticated probabilistic record linkage techniques that are widely used in practice (e.g., 1-1 linkage). The article demonstrates the effectiveness of this method in a simulation and in a real application.https://doi.org/10.1515/jos-2015-0024record linkagemeasurement errorparametric bootstrap.
collection DOAJ
language English
format Article
sources DOAJ
author Chipperfield James O.
Chambers Raymond L.
spellingShingle Chipperfield James O.
Chambers Raymond L.
Using the Bootstrap to Account for Linkage Errors when Analysing Probabilistically Linked Categorical Data
Journal of Official Statistics
record linkage
measurement error
parametric bootstrap.
author_facet Chipperfield James O.
Chambers Raymond L.
author_sort Chipperfield James O.
title Using the Bootstrap to Account for Linkage Errors when Analysing Probabilistically Linked Categorical Data
title_short Using the Bootstrap to Account for Linkage Errors when Analysing Probabilistically Linked Categorical Data
title_full Using the Bootstrap to Account for Linkage Errors when Analysing Probabilistically Linked Categorical Data
title_fullStr Using the Bootstrap to Account for Linkage Errors when Analysing Probabilistically Linked Categorical Data
title_full_unstemmed Using the Bootstrap to Account for Linkage Errors when Analysing Probabilistically Linked Categorical Data
title_sort using the bootstrap to account for linkage errors when analysing probabilistically linked categorical data
publisher Sciendo
series Journal of Official Statistics
issn 2001-7367
publishDate 2015-09-01
description Record linkage is the act of bringing together records that are believed to belong to the same unit (e.g., person or business) from two or more files. Record linkage is not an error-free process and can lead to linking a pair of records that do not belong to the same unit. This occurs because linking fields on the files, which ideally would uniquely identify each unit, are often imperfect. There has been an explosion of record linkage applications, particularly involving government agencies and in the field of health, yet there has been little work on making correct inference using such linked files. Naively treating a linked file as if it were linked without errors can lead to biased inferences. This article develops a method of making inferences for cross tabulated variables when record linkage is not an error-free process. In particular, it develops a parametric bootstrap approach to estimation which can accommodate the sophisticated probabilistic record linkage techniques that are widely used in practice (e.g., 1-1 linkage). The article demonstrates the effectiveness of this method in a simulation and in a real application.
topic record linkage
measurement error
parametric bootstrap.
url https://doi.org/10.1515/jos-2015-0024
work_keys_str_mv AT chipperfieldjameso usingthebootstraptoaccountforlinkageerrorswhenanalysingprobabilisticallylinkedcategoricaldata
AT chambersraymondl usingthebootstraptoaccountforlinkageerrorswhenanalysingprobabilisticallylinkedcategoricaldata
_version_ 1717767684059299840