Impact of linkage quality on inferences drawn from analyses using data with high rates of linkage errors in rural Tanzania

Abstract Background Studies based on high-quality linked data in developed countries show that even minor linkage errors, which occur when records of two different individuals are erroneously linked or when records belonging to the same individual are not linked, can impact bias and precision of sub...

Full description

Bibliographic Details
Main Authors: Christopher T. Rentsch, Katie Harron, Mark Urassa, Jim Todd, Georges Reniers, Basia Zaba
Format: Article
Language:English
Published: BMC 2018-12-01
Series:BMC Medical Research Methodology
Subjects:
HIV
Online Access:http://link.springer.com/article/10.1186/s12874-018-0632-5
id doaj-bf413ebd458e4208a81e82f335fe041b
record_format Article
spelling doaj-bf413ebd458e4208a81e82f335fe041b2020-11-25T02:53:07ZengBMCBMC Medical Research Methodology1471-22882018-12-011811910.1186/s12874-018-0632-5Impact of linkage quality on inferences drawn from analyses using data with high rates of linkage errors in rural TanzaniaChristopher T. Rentsch0Katie Harron1Mark Urassa2Jim Todd3Georges Reniers4Basia Zaba5Department of Population Health, Faculty of Epidemiology and Population Health, London School of Hygiene & Tropical MedicineUCL GOS Institute of Child HealthThe TAZAMA Project, National Institute for Medical ResearchDepartment of Population Health, Faculty of Epidemiology and Population Health, London School of Hygiene & Tropical MedicineDepartment of Population Health, Faculty of Epidemiology and Population Health, London School of Hygiene & Tropical MedicineDepartment of Population Health, Faculty of Epidemiology and Population Health, London School of Hygiene & Tropical MedicineAbstract Background Studies based on high-quality linked data in developed countries show that even minor linkage errors, which occur when records of two different individuals are erroneously linked or when records belonging to the same individual are not linked, can impact bias and precision of subsequent analyses. We evaluated the impact of linkage quality on inferences drawn from analyses using data with substantial linkage errors in rural Tanzania. Methods Semi-automatic point-of-contact interactive record linkage was used to establish gold standard links between community-based HIV surveillance data and medical records at clinics serving the surveillance population. Automated probabilistic record linkage was used to create analytic datasets at minimum, low, medium, and high match score thresholds. Cox proportional hazards regression models were used to compare HIV care registration rates by testing modality (sero-survey vs. clinic) in each analytic dataset. We assessed linkage quality using three approaches: quantifying linkage errors, comparing characteristics between linked and unlinked data, and evaluating bias and precision of regression estimates. Results Between 2014 and 2017, 405 individuals with gold standard links were newly diagnosed with HIV in sero-surveys (n = 263) and clinics (n = 142). Automated probabilistic linkage correctly identified 233 individuals (positive predictive value [PPV] = 65%) at the low threshold and 95 individuals (PPV = 90%) at the high threshold. Significant differences were found between linked and unlinked records in primary exposure and outcome variables and for adjusting covariates at every threshold. As expected, differences attenuated with increasing threshold. Testing modality was significantly associated with time to registration in the gold standard data (adjusted hazard ratio [HR] 4.98 for clinic-based testing, 95% confidence interval [CI] 3.34, 7.42). Increasing false matches weakened the association (HR 2.76 at minimum match score threshold, 95% CI 1.73, 4.41). Increasing missed matches (i.e., increasing match score threshold and positive predictive value of the linkage algorithm) was strongly correlated with a reduction in the precision of coefficient estimate (R2 = 0.97; p = 0.03). Conclusions Similar to studies with more negligible levels of linkage errors, false matches in this setting reduced the magnitude of the association; missed matches reduced precision. Adjusting for these biases could provide more robust results using data with considerable linkage errors.http://link.springer.com/article/10.1186/s12874-018-0632-5Record linkageLinkage errorBiasData accuracyHIVSub-Saharan Africa
collection DOAJ
language English
format Article
sources DOAJ
author Christopher T. Rentsch
Katie Harron
Mark Urassa
Jim Todd
Georges Reniers
Basia Zaba
spellingShingle Christopher T. Rentsch
Katie Harron
Mark Urassa
Jim Todd
Georges Reniers
Basia Zaba
Impact of linkage quality on inferences drawn from analyses using data with high rates of linkage errors in rural Tanzania
BMC Medical Research Methodology
Record linkage
Linkage error
Bias
Data accuracy
HIV
Sub-Saharan Africa
author_facet Christopher T. Rentsch
Katie Harron
Mark Urassa
Jim Todd
Georges Reniers
Basia Zaba
author_sort Christopher T. Rentsch
title Impact of linkage quality on inferences drawn from analyses using data with high rates of linkage errors in rural Tanzania
title_short Impact of linkage quality on inferences drawn from analyses using data with high rates of linkage errors in rural Tanzania
title_full Impact of linkage quality on inferences drawn from analyses using data with high rates of linkage errors in rural Tanzania
title_fullStr Impact of linkage quality on inferences drawn from analyses using data with high rates of linkage errors in rural Tanzania
title_full_unstemmed Impact of linkage quality on inferences drawn from analyses using data with high rates of linkage errors in rural Tanzania
title_sort impact of linkage quality on inferences drawn from analyses using data with high rates of linkage errors in rural tanzania
publisher BMC
series BMC Medical Research Methodology
issn 1471-2288
publishDate 2018-12-01
description Abstract Background Studies based on high-quality linked data in developed countries show that even minor linkage errors, which occur when records of two different individuals are erroneously linked or when records belonging to the same individual are not linked, can impact bias and precision of subsequent analyses. We evaluated the impact of linkage quality on inferences drawn from analyses using data with substantial linkage errors in rural Tanzania. Methods Semi-automatic point-of-contact interactive record linkage was used to establish gold standard links between community-based HIV surveillance data and medical records at clinics serving the surveillance population. Automated probabilistic record linkage was used to create analytic datasets at minimum, low, medium, and high match score thresholds. Cox proportional hazards regression models were used to compare HIV care registration rates by testing modality (sero-survey vs. clinic) in each analytic dataset. We assessed linkage quality using three approaches: quantifying linkage errors, comparing characteristics between linked and unlinked data, and evaluating bias and precision of regression estimates. Results Between 2014 and 2017, 405 individuals with gold standard links were newly diagnosed with HIV in sero-surveys (n = 263) and clinics (n = 142). Automated probabilistic linkage correctly identified 233 individuals (positive predictive value [PPV] = 65%) at the low threshold and 95 individuals (PPV = 90%) at the high threshold. Significant differences were found between linked and unlinked records in primary exposure and outcome variables and for adjusting covariates at every threshold. As expected, differences attenuated with increasing threshold. Testing modality was significantly associated with time to registration in the gold standard data (adjusted hazard ratio [HR] 4.98 for clinic-based testing, 95% confidence interval [CI] 3.34, 7.42). Increasing false matches weakened the association (HR 2.76 at minimum match score threshold, 95% CI 1.73, 4.41). Increasing missed matches (i.e., increasing match score threshold and positive predictive value of the linkage algorithm) was strongly correlated with a reduction in the precision of coefficient estimate (R2 = 0.97; p = 0.03). Conclusions Similar to studies with more negligible levels of linkage errors, false matches in this setting reduced the magnitude of the association; missed matches reduced precision. Adjusting for these biases could provide more robust results using data with considerable linkage errors.
topic Record linkage
Linkage error
Bias
Data accuracy
HIV
Sub-Saharan Africa
url http://link.springer.com/article/10.1186/s12874-018-0632-5
work_keys_str_mv AT christophertrentsch impactoflinkagequalityoninferencesdrawnfromanalysesusingdatawithhighratesoflinkageerrorsinruraltanzania
AT katieharron impactoflinkagequalityoninferencesdrawnfromanalysesusingdatawithhighratesoflinkageerrorsinruraltanzania
AT markurassa impactoflinkagequalityoninferencesdrawnfromanalysesusingdatawithhighratesoflinkageerrorsinruraltanzania
AT jimtodd impactoflinkagequalityoninferencesdrawnfromanalysesusingdatawithhighratesoflinkageerrorsinruraltanzania
AT georgesreniers impactoflinkagequalityoninferencesdrawnfromanalysesusingdatawithhighratesoflinkageerrorsinruraltanzania
AT basiazaba impactoflinkagequalityoninferencesdrawnfromanalysesusingdatawithhighratesoflinkageerrorsinruraltanzania
_version_ 1724726594081128448