Record linkage under suboptimal conditions for data-intensive evaluation of primary care in Rio de Janeiro, Brazil

Abstract Background Linking Brazilian databases demands the development of algorithms and processes to deal with various challenges including the large size of the databases, the low number and poor quality of personal identifiers available to be compared (national security number not mandatory), an...

Full description

Bibliographic Details
Main Authors: Claudia Medina Coeli, Valeria Saraceni, Paulo Mota Medeiros, Helena Pereira da Silva Santos, Luis Carlos Torres Guillen, Luís Guilherme Santos Buteri Alves, Thomas Hone, Christopher Millett, Anete Trajman, Betina Durovni
Format: Article
Language:English
Published: BMC 2021-06-01
Series:BMC Medical Informatics and Decision Making
Subjects:
Online Access:https://doi.org/10.1186/s12911-021-01550-6
id doaj-cb8d082346114d41be078d06cfbd77f9
record_format Article
spelling doaj-cb8d082346114d41be078d06cfbd77f92021-06-20T11:44:07ZengBMCBMC Medical Informatics and Decision Making1472-69472021-06-0121111310.1186/s12911-021-01550-6Record linkage under suboptimal conditions for data-intensive evaluation of primary care in Rio de Janeiro, BrazilClaudia Medina Coeli0Valeria Saraceni1Paulo Mota Medeiros2Helena Pereira da Silva Santos3Luis Carlos Torres Guillen4Luís Guilherme Santos Buteri Alves5Thomas Hone6Christopher Millett7Anete Trajman8Betina Durovni9Instituto de Estudos em Saúde Coletiva, Universidade Federal do Rio de JaneiroSecretaria Municipal de Saúde do Rio de JaneiroInstituto de Estudos em Saúde Coletiva, Universidade Federal do Rio de JaneiroInstituto de Estudos em Saúde Coletiva, Universidade Federal do Rio de JaneiroInstituto de Estudos em Saúde Coletiva, Universidade Federal do Rio de JaneiroInstituto de Estudos em Saúde Coletiva, Universidade Federal do Rio de JaneiroPublic Health Policy Evaluation Unit, Imperial College LondonPublic Health Policy Evaluation Unit, Imperial College LondonPrograma de Pós-Graduação em Clínica Médica e Mestrado Profissional em Atenção Primária à Saúde, Federal University of Rio de JaneiroCentro de Estudos Estratégicos, Fundação Oswaldo CruzAbstract Background Linking Brazilian databases demands the development of algorithms and processes to deal with various challenges including the large size of the databases, the low number and poor quality of personal identifiers available to be compared (national security number not mandatory), and some characteristics of Brazilian names that make the linkage process prone to errors. This study aims to describe and evaluate the quality of the processes used to create an individual-linked database for data-intensive research on the impacts on health indicators of the expansion of primary care in Rio de Janeiro City, Brazil. Methods We created an individual-level dataset linking social benefits recipients, primary health care, hospital admission and mortality data. The databases were pre-processed, and we adopted a multiple approach strategy combining deterministic and probabilistic record linkage techniques, and an extensive clerical review of the potential matches. Relying on manual review as the gold standard, we estimated the false match (false-positive) proportion of each approach (deterministic, probabilistic, clerical review) and the missed match proportion (false-negative) of the clerical review approach. To assess the sensitivity (recall) to identifying social benefits recipients’ deaths, we used their vital status registered on the primary care database as the gold standard. Results In all linkage processes, the deterministic approach identified most of the matches. However, the proportion of matches identified in each approach varied. The false match proportion was around 1% or less in almost all approaches. The missed match proportion in the clerical review approach of all linkage processes were under 3%. We estimated a recall of 93.6% (95% CI 92.8–94.3) for the linkage between social benefits recipients and mortality data. Conclusion The adoption of a linkage strategy combining pre-processing routines, deterministic, and probabilistic strategies, as well as an extensive clerical review approach minimized linkage errors in the context of suboptimal data quality.https://doi.org/10.1186/s12911-021-01550-6Medical record linkageData accuracyBrazilPrimary healthcare
collection DOAJ
language English
format Article
sources DOAJ
author Claudia Medina Coeli
Valeria Saraceni
Paulo Mota Medeiros
Helena Pereira da Silva Santos
Luis Carlos Torres Guillen
Luís Guilherme Santos Buteri Alves
Thomas Hone
Christopher Millett
Anete Trajman
Betina Durovni
spellingShingle Claudia Medina Coeli
Valeria Saraceni
Paulo Mota Medeiros
Helena Pereira da Silva Santos
Luis Carlos Torres Guillen
Luís Guilherme Santos Buteri Alves
Thomas Hone
Christopher Millett
Anete Trajman
Betina Durovni
Record linkage under suboptimal conditions for data-intensive evaluation of primary care in Rio de Janeiro, Brazil
BMC Medical Informatics and Decision Making
Medical record linkage
Data accuracy
Brazil
Primary healthcare
author_facet Claudia Medina Coeli
Valeria Saraceni
Paulo Mota Medeiros
Helena Pereira da Silva Santos
Luis Carlos Torres Guillen
Luís Guilherme Santos Buteri Alves
Thomas Hone
Christopher Millett
Anete Trajman
Betina Durovni
author_sort Claudia Medina Coeli
title Record linkage under suboptimal conditions for data-intensive evaluation of primary care in Rio de Janeiro, Brazil
title_short Record linkage under suboptimal conditions for data-intensive evaluation of primary care in Rio de Janeiro, Brazil
title_full Record linkage under suboptimal conditions for data-intensive evaluation of primary care in Rio de Janeiro, Brazil
title_fullStr Record linkage under suboptimal conditions for data-intensive evaluation of primary care in Rio de Janeiro, Brazil
title_full_unstemmed Record linkage under suboptimal conditions for data-intensive evaluation of primary care in Rio de Janeiro, Brazil
title_sort record linkage under suboptimal conditions for data-intensive evaluation of primary care in rio de janeiro, brazil
publisher BMC
series BMC Medical Informatics and Decision Making
issn 1472-6947
publishDate 2021-06-01
description Abstract Background Linking Brazilian databases demands the development of algorithms and processes to deal with various challenges including the large size of the databases, the low number and poor quality of personal identifiers available to be compared (national security number not mandatory), and some characteristics of Brazilian names that make the linkage process prone to errors. This study aims to describe and evaluate the quality of the processes used to create an individual-linked database for data-intensive research on the impacts on health indicators of the expansion of primary care in Rio de Janeiro City, Brazil. Methods We created an individual-level dataset linking social benefits recipients, primary health care, hospital admission and mortality data. The databases were pre-processed, and we adopted a multiple approach strategy combining deterministic and probabilistic record linkage techniques, and an extensive clerical review of the potential matches. Relying on manual review as the gold standard, we estimated the false match (false-positive) proportion of each approach (deterministic, probabilistic, clerical review) and the missed match proportion (false-negative) of the clerical review approach. To assess the sensitivity (recall) to identifying social benefits recipients’ deaths, we used their vital status registered on the primary care database as the gold standard. Results In all linkage processes, the deterministic approach identified most of the matches. However, the proportion of matches identified in each approach varied. The false match proportion was around 1% or less in almost all approaches. The missed match proportion in the clerical review approach of all linkage processes were under 3%. We estimated a recall of 93.6% (95% CI 92.8–94.3) for the linkage between social benefits recipients and mortality data. Conclusion The adoption of a linkage strategy combining pre-processing routines, deterministic, and probabilistic strategies, as well as an extensive clerical review approach minimized linkage errors in the context of suboptimal data quality.
topic Medical record linkage
Data accuracy
Brazil
Primary healthcare
url https://doi.org/10.1186/s12911-021-01550-6
work_keys_str_mv AT claudiamedinacoeli recordlinkageundersuboptimalconditionsfordataintensiveevaluationofprimarycareinriodejaneirobrazil
AT valeriasaraceni recordlinkageundersuboptimalconditionsfordataintensiveevaluationofprimarycareinriodejaneirobrazil
AT paulomotamedeiros recordlinkageundersuboptimalconditionsfordataintensiveevaluationofprimarycareinriodejaneirobrazil
AT helenapereiradasilvasantos recordlinkageundersuboptimalconditionsfordataintensiveevaluationofprimarycareinriodejaneirobrazil
AT luiscarlostorresguillen recordlinkageundersuboptimalconditionsfordataintensiveevaluationofprimarycareinriodejaneirobrazil
AT luisguilhermesantosbuterialves recordlinkageundersuboptimalconditionsfordataintensiveevaluationofprimarycareinriodejaneirobrazil
AT thomashone recordlinkageundersuboptimalconditionsfordataintensiveevaluationofprimarycareinriodejaneirobrazil
AT christophermillett recordlinkageundersuboptimalconditionsfordataintensiveevaluationofprimarycareinriodejaneirobrazil
AT anetetrajman recordlinkageundersuboptimalconditionsfordataintensiveevaluationofprimarycareinriodejaneirobrazil
AT betinadurovni recordlinkageundersuboptimalconditionsfordataintensiveevaluationofprimarycareinriodejaneirobrazil
_version_ 1721369794180022272