Variation in model performance by data cleanliness and classification methods in the prediction of 30-day ICU mortality, a US nationwide retrospective cohort and simulation study

Objective There has been a proliferation of approaches to statistical methods and missing data imputation as electronic health records become more plentiful; however, the relative performance on real-world problems is unclear.Materials and methods Using 355 823 intensive care unit (ICU) hospitalisat...

Full description

Bibliographic Details
Main Authors:	Cheng Ma, Xiao Qing Wang, Sarah Seelye, Ji Zhu
Format:	Article
Language:	English
Published:	BMJ Publishing Group 2020-12-01
Series:	BMJ Open
Online Access:	https://bmjopen.bmj.com/content/10/12/e041421.full

id	doaj-1519d5aa843f4a088dfb1b4d7dc548db
record_format	Article
spelling	doaj-1519d5aa843f4a088dfb1b4d7dc548db2021-08-21T13:00:06ZengBMJ Publishing GroupBMJ Open2044-60552020-12-01101210.1136/bmjopen-2020-041421Variation in model performance by data cleanliness and classification methods in the prediction of 30-day ICU mortality, a US nationwide retrospective cohort and simulation studyCheng Ma0Xiao Qing Wang1Sarah Seelye2Ji Zhu3Department of Statistics, University of Michigan, Ann Arbor, Michigan, USADepartment of Internal Medicine, University of Michigan, Ann Arbor, Michigan, USAVA Center for Clinical Management Research, VA Ann Arbor Healthcare System, Ann Arbor, Michigan, USADepartment of Statistics, University of Michigan, Ann Arbor, Michigan, USAObjective There has been a proliferation of approaches to statistical methods and missing data imputation as electronic health records become more plentiful; however, the relative performance on real-world problems is unclear.Materials and methods Using 355 823 intensive care unit (ICU) hospitalisations at over 100 hospitals in the nationwide Veterans Health Administration system (2014–2017), we systematically varied three approaches: how we extracted and cleaned physiologic variables; how we handled missing data (using mean value imputation, random forest, extremely randomised trees (extra-trees regression), ridge regression, normal value imputation and case-wise deletion) and how we computed risk (using logistic regression, random forest and neural networks). We applied these approaches in a 70% development sample and tested the results in an independent 30% testing sample. Area under the receiver operating characteristic curve (AUROC) was used to quantify model discrimination.Results In 355 823 ICU stays, there were 34 867 deaths (9.8%) within 30 days of admission. The highest AUROCs obtained for each primary classification method were very similar: 0.83 (95% CI 0.83 to 0.83) to 0.85 (95% CI 0.84 to 0.85). Likewise, there was relatively little variation within classification method by the missing value imputation method used—except when casewise deletion was applied for missing data.Conclusion Variation in discrimination was seen as a function of data cleanliness, with logistic regression suffering the most loss of discrimination in the least clean data. Losses in discrimination were not present in random forest and neural networks even in naively extracted data. Data from a large nationwide health system revealed interactions between missing data imputation techniques, data cleanliness and classification methods for predicting 30-day mortality.https://bmjopen.bmj.com/content/10/12/e041421.full
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Cheng Ma Xiao Qing Wang Sarah Seelye Ji Zhu
spellingShingle	Cheng Ma Xiao Qing Wang Sarah Seelye Ji Zhu Variation in model performance by data cleanliness and classification methods in the prediction of 30-day ICU mortality, a US nationwide retrospective cohort and simulation study BMJ Open
author_facet	Cheng Ma Xiao Qing Wang Sarah Seelye Ji Zhu
author_sort	Cheng Ma
title	Variation in model performance by data cleanliness and classification methods in the prediction of 30-day ICU mortality, a US nationwide retrospective cohort and simulation study
title_short	Variation in model performance by data cleanliness and classification methods in the prediction of 30-day ICU mortality, a US nationwide retrospective cohort and simulation study
title_full	Variation in model performance by data cleanliness and classification methods in the prediction of 30-day ICU mortality, a US nationwide retrospective cohort and simulation study
title_fullStr	Variation in model performance by data cleanliness and classification methods in the prediction of 30-day ICU mortality, a US nationwide retrospective cohort and simulation study
title_full_unstemmed	Variation in model performance by data cleanliness and classification methods in the prediction of 30-day ICU mortality, a US nationwide retrospective cohort and simulation study
title_sort	variation in model performance by data cleanliness and classification methods in the prediction of 30-day icu mortality, a us nationwide retrospective cohort and simulation study
publisher	BMJ Publishing Group
series	BMJ Open
issn	2044-6055
publishDate	2020-12-01
description	Objective There has been a proliferation of approaches to statistical methods and missing data imputation as electronic health records become more plentiful; however, the relative performance on real-world problems is unclear.Materials and methods Using 355 823 intensive care unit (ICU) hospitalisations at over 100 hospitals in the nationwide Veterans Health Administration system (2014–2017), we systematically varied three approaches: how we extracted and cleaned physiologic variables; how we handled missing data (using mean value imputation, random forest, extremely randomised trees (extra-trees regression), ridge regression, normal value imputation and case-wise deletion) and how we computed risk (using logistic regression, random forest and neural networks). We applied these approaches in a 70% development sample and tested the results in an independent 30% testing sample. Area under the receiver operating characteristic curve (AUROC) was used to quantify model discrimination.Results In 355 823 ICU stays, there were 34 867 deaths (9.8%) within 30 days of admission. The highest AUROCs obtained for each primary classification method were very similar: 0.83 (95% CI 0.83 to 0.83) to 0.85 (95% CI 0.84 to 0.85). Likewise, there was relatively little variation within classification method by the missing value imputation method used—except when casewise deletion was applied for missing data.Conclusion Variation in discrimination was seen as a function of data cleanliness, with logistic regression suffering the most loss of discrimination in the least clean data. Losses in discrimination were not present in random forest and neural networks even in naively extracted data. Data from a large nationwide health system revealed interactions between missing data imputation techniques, data cleanliness and classification methods for predicting 30-day mortality.
url	https://bmjopen.bmj.com/content/10/12/e041421.full
work_keys_str_mv	AT chengma variationinmodelperformancebydatacleanlinessandclassificationmethodsinthepredictionof30dayicumortalityausnationwideretrospectivecohortandsimulationstudy AT xiaoqingwang variationinmodelperformancebydatacleanlinessandclassificationmethodsinthepredictionof30dayicumortalityausnationwideretrospectivecohortandsimulationstudy AT sarahseelye variationinmodelperformancebydatacleanlinessandclassificationmethodsinthepredictionof30dayicumortalityausnationwideretrospectivecohortandsimulationstudy AT jizhu variationinmodelperformancebydatacleanlinessandclassificationmethodsinthepredictionof30dayicumortalityausnationwideretrospectivecohortandsimulationstudy
_version_	1721200484355670016

Variation in model performance by data cleanliness and classification methods in the prediction of 30-day ICU mortality, a US nationwide retrospective cohort and simulation study

Similar Items