Post-Analysis of Predictive Modeling with an Epidemiological Example

Post-analysis of predictive models fosters their application in practice, as domain experts want to understand the logic behind them. In epidemiology, methods explaining sophisticated models facilitate the usage of up-to-date tools, especially in the high-dimensional predictor space. Investigating h...

Full description

Bibliographic Details
Main Authors:	Christina Brester, Ari Voutilainen, Tomi-Pekka Tuomainen, Jussi Kauhanen, Mikko Kolehmainen
Format:	Article
Language:	English
Published:	MDPI AG 2021-06-01
Series:	Healthcare
Subjects:	post-analysis of data-driven models rule design multi-objective optimization model performance prediction of cardiovascular death
Online Access:	https://www.mdpi.com/2227-9032/9/7/792

id	doaj-c1a69559975c443a946e56e91e5efc9c
record_format	Article
spelling	doaj-c1a69559975c443a946e56e91e5efc9c2021-07-23T13:42:22ZengMDPI AGHealthcare2227-90322021-06-01979279210.3390/healthcare9070792Post-Analysis of Predictive Modeling with an Epidemiological ExampleChristina Brester0Ari Voutilainen1Tomi-Pekka Tuomainen2Jussi Kauhanen3Mikko Kolehmainen4Department of Environmental and Biological Sciences, University of Eastern Finland, Yliopistonranta 1 E, P.O. Box 1627, FI-70211 Kuopio, FinlandInstitute of Public Health and Clinical Nutrition, University of Eastern Finland, Yliopistonranta 1 C, P.O. Box 1627, FI-70211 Kuopio, FinlandInstitute of Public Health and Clinical Nutrition, University of Eastern Finland, Yliopistonranta 1 C, P.O. Box 1627, FI-70211 Kuopio, FinlandInstitute of Public Health and Clinical Nutrition, University of Eastern Finland, Yliopistonranta 1 C, P.O. Box 1627, FI-70211 Kuopio, FinlandDepartment of Environmental and Biological Sciences, University of Eastern Finland, Yliopistonranta 1 E, P.O. Box 1627, FI-70211 Kuopio, FinlandPost-analysis of predictive models fosters their application in practice, as domain experts want to understand the logic behind them. In epidemiology, methods explaining sophisticated models facilitate the usage of up-to-date tools, especially in the high-dimensional predictor space. Investigating how model performance varies for subjects with different conditions is one of the important parts of post-analysis. This paper presents a model-independent approach for post-analysis, aiming to reveal those subjects’ conditions that lead to low or high model performance, compared to the average level on the whole sample. Conditions of interest are presented in the form of rules generated by a multi-objective evolutionary algorithm (MOGA). In this study, Lasso logistic regression (LLR) was trained to predict cardiovascular death by 2016 using the data from the 1984–1989 examination within the Kuopio Ischemic Heart Disease Risk Factor Study (KIHD), which contained 2682 subjects and 950 preselected predictors. After 50 independent runs of five-fold cross-validation, the model performance collected for each subject was used to generate rules describing “easy” and “difficult” cases. LLR with 61 selected predictors, on average, achieved 72.53% accuracy on the whole sample. However, during post-analysis, three categories of subjects were discovered: “Easy” cases with an LLR accuracy of 95.84%, “difficult” cases with an LLR accuracy of 48.11%, and the remaining cases with an LLR accuracy of 71.00%. Moreover, the rule analysis showed that medication was one of the main confusing factors that led to lower model performance. The proposed approach provides insightful information about subjects’ conditions that complicate predictive modeling.https://www.mdpi.com/2227-9032/9/7/792post-analysis of data-driven modelsrule designmulti-objective optimizationmodel performanceprediction of cardiovascular death
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Christina Brester Ari Voutilainen Tomi-Pekka Tuomainen Jussi Kauhanen Mikko Kolehmainen
spellingShingle	Christina Brester Ari Voutilainen Tomi-Pekka Tuomainen Jussi Kauhanen Mikko Kolehmainen Post-Analysis of Predictive Modeling with an Epidemiological Example Healthcare post-analysis of data-driven models rule design multi-objective optimization model performance prediction of cardiovascular death
author_facet	Christina Brester Ari Voutilainen Tomi-Pekka Tuomainen Jussi Kauhanen Mikko Kolehmainen
author_sort	Christina Brester
title	Post-Analysis of Predictive Modeling with an Epidemiological Example
title_short	Post-Analysis of Predictive Modeling with an Epidemiological Example
title_full	Post-Analysis of Predictive Modeling with an Epidemiological Example
title_fullStr	Post-Analysis of Predictive Modeling with an Epidemiological Example
title_full_unstemmed	Post-Analysis of Predictive Modeling with an Epidemiological Example
title_sort	post-analysis of predictive modeling with an epidemiological example
publisher	MDPI AG
series	Healthcare
issn	2227-9032
publishDate	2021-06-01
description	Post-analysis of predictive models fosters their application in practice, as domain experts want to understand the logic behind them. In epidemiology, methods explaining sophisticated models facilitate the usage of up-to-date tools, especially in the high-dimensional predictor space. Investigating how model performance varies for subjects with different conditions is one of the important parts of post-analysis. This paper presents a model-independent approach for post-analysis, aiming to reveal those subjects’ conditions that lead to low or high model performance, compared to the average level on the whole sample. Conditions of interest are presented in the form of rules generated by a multi-objective evolutionary algorithm (MOGA). In this study, Lasso logistic regression (LLR) was trained to predict cardiovascular death by 2016 using the data from the 1984–1989 examination within the Kuopio Ischemic Heart Disease Risk Factor Study (KIHD), which contained 2682 subjects and 950 preselected predictors. After 50 independent runs of five-fold cross-validation, the model performance collected for each subject was used to generate rules describing “easy” and “difficult” cases. LLR with 61 selected predictors, on average, achieved 72.53% accuracy on the whole sample. However, during post-analysis, three categories of subjects were discovered: “Easy” cases with an LLR accuracy of 95.84%, “difficult” cases with an LLR accuracy of 48.11%, and the remaining cases with an LLR accuracy of 71.00%. Moreover, the rule analysis showed that medication was one of the main confusing factors that led to lower model performance. The proposed approach provides insightful information about subjects’ conditions that complicate predictive modeling.
topic	post-analysis of data-driven models rule design multi-objective optimization model performance prediction of cardiovascular death
url	https://www.mdpi.com/2227-9032/9/7/792
work_keys_str_mv	AT christinabrester postanalysisofpredictivemodelingwithanepidemiologicalexample AT arivoutilainen postanalysisofpredictivemodelingwithanepidemiologicalexample AT tomipekkatuomainen postanalysisofpredictivemodelingwithanepidemiologicalexample AT jussikauhanen postanalysisofpredictivemodelingwithanepidemiologicalexample AT mikkokolehmainen postanalysisofpredictivemodelingwithanepidemiologicalexample
_version_	1721288286529388544

Post-Analysis of Predictive Modeling with an Epidemiological Example

Similar Items