Prevalence estimation by joint use of big data and health survey: a demonstration study using electronic health records in New York city

Abstract Background Electronic Health Records (EHR) has been increasingly used as a tool to monitor population health. However, subject-level errors in the records can yield biased estimates of health indicators. There is an urgent need for methods to estimate the prevalence of health indicators usi...

Full description

Bibliographic Details
Main Authors: Ryung S. Kim, Viswanathan Shankar
Format: Article
Language:English
Published: BMC 2020-04-01
Series:BMC Medical Research Methodology
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12874-020-00956-6
id doaj-5a47df556a87421eb221b29f21e624e3
record_format Article
spelling doaj-5a47df556a87421eb221b29f21e624e32020-11-25T02:58:38ZengBMCBMC Medical Research Methodology1471-22882020-04-0120111010.1186/s12874-020-00956-6Prevalence estimation by joint use of big data and health survey: a demonstration study using electronic health records in New York cityRyung S. Kim0Viswanathan Shankar1Department of Epidemiology and Population Health, Albert Einstein College of MedicineDepartment of Epidemiology and Population Health, Albert Einstein College of MedicineAbstract Background Electronic Health Records (EHR) has been increasingly used as a tool to monitor population health. However, subject-level errors in the records can yield biased estimates of health indicators. There is an urgent need for methods to estimate the prevalence of health indicators using large and real-time EHR while correcting the potential bias. Methods We demonstrate joint analyses of EHR and a smaller gold-standard health survey. We first adopted Mosteller’s method that pools two estimators, among which one is potentially biased. It only requires knowing the prevalence estimates from two data sources and their standard errors. Then, we adopted the method of Schenker et al., which uses multiple imputations of subject-level health outcomes that are missing for the subjects in EHR. This procedure requires information to link some subjects between two sources and modeling the mechanism of misclassification in EHR as well as modeling inclusion probabilities to both sources. Results In a simulation study, both estimators yielded negligible bias even when EHR was biased. They performed as well as health survey estimator when EHR bias was large and better than health survey estimator when EHR bias was moderate. It may be challenging to model the misclassification mechanism in real data for the subject-level imputation estimator. We illustrated the methods analyzing six health indicators from 2013 to 14 NYC HANES and the 2013 NYC Macroscope, and a study that linked some subjects in both data sources. Conclusions When a small gold-standard health survey exists, it can serve as a safeguard against potential bias in EHR through the joint analysis of the two sources.http://link.springer.com/article/10.1186/s12874-020-00956-6Big dataElectronic health recordsMultiple imputationsMeasurement errorSelection biasPopulation health surveillance
collection DOAJ
language English
format Article
sources DOAJ
author Ryung S. Kim
Viswanathan Shankar
spellingShingle Ryung S. Kim
Viswanathan Shankar
Prevalence estimation by joint use of big data and health survey: a demonstration study using electronic health records in New York city
BMC Medical Research Methodology
Big data
Electronic health records
Multiple imputations
Measurement error
Selection bias
Population health surveillance
author_facet Ryung S. Kim
Viswanathan Shankar
author_sort Ryung S. Kim
title Prevalence estimation by joint use of big data and health survey: a demonstration study using electronic health records in New York city
title_short Prevalence estimation by joint use of big data and health survey: a demonstration study using electronic health records in New York city
title_full Prevalence estimation by joint use of big data and health survey: a demonstration study using electronic health records in New York city
title_fullStr Prevalence estimation by joint use of big data and health survey: a demonstration study using electronic health records in New York city
title_full_unstemmed Prevalence estimation by joint use of big data and health survey: a demonstration study using electronic health records in New York city
title_sort prevalence estimation by joint use of big data and health survey: a demonstration study using electronic health records in new york city
publisher BMC
series BMC Medical Research Methodology
issn 1471-2288
publishDate 2020-04-01
description Abstract Background Electronic Health Records (EHR) has been increasingly used as a tool to monitor population health. However, subject-level errors in the records can yield biased estimates of health indicators. There is an urgent need for methods to estimate the prevalence of health indicators using large and real-time EHR while correcting the potential bias. Methods We demonstrate joint analyses of EHR and a smaller gold-standard health survey. We first adopted Mosteller’s method that pools two estimators, among which one is potentially biased. It only requires knowing the prevalence estimates from two data sources and their standard errors. Then, we adopted the method of Schenker et al., which uses multiple imputations of subject-level health outcomes that are missing for the subjects in EHR. This procedure requires information to link some subjects between two sources and modeling the mechanism of misclassification in EHR as well as modeling inclusion probabilities to both sources. Results In a simulation study, both estimators yielded negligible bias even when EHR was biased. They performed as well as health survey estimator when EHR bias was large and better than health survey estimator when EHR bias was moderate. It may be challenging to model the misclassification mechanism in real data for the subject-level imputation estimator. We illustrated the methods analyzing six health indicators from 2013 to 14 NYC HANES and the 2013 NYC Macroscope, and a study that linked some subjects in both data sources. Conclusions When a small gold-standard health survey exists, it can serve as a safeguard against potential bias in EHR through the joint analysis of the two sources.
topic Big data
Electronic health records
Multiple imputations
Measurement error
Selection bias
Population health surveillance
url http://link.springer.com/article/10.1186/s12874-020-00956-6
work_keys_str_mv AT ryungskim prevalenceestimationbyjointuseofbigdataandhealthsurveyademonstrationstudyusingelectronichealthrecordsinnewyorkcity
AT viswanathanshankar prevalenceestimationbyjointuseofbigdataandhealthsurveyademonstrationstudyusingelectronichealthrecordsinnewyorkcity
_version_ 1724705867866046464