Random forest-based prediction of stroke outcome

Abstract We research into the clinical, biochemical and neuroimaging factors associated with the outcome of stroke patients to generate a predictive model using machine learning techniques for prediction of mortality and morbidity 3-months after admission. The dataset consisted of patients with isch...

Full description

Bibliographic Details
Main Authors: Carlos Fernandez-Lozano, Pablo Hervella, Virginia Mato-Abad, Manuel Rodríguez-Yáñez, Sonia Suárez-Garaboa, Iria López-Dequidt, Ana Estany-Gestal, Tomás Sobrino, Francisco Campos, José Castillo, Santiago Rodríguez-Yáñez, Ramón Iglesias-Rey
Format: Article
Language:English
Published: Nature Publishing Group 2021-05-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-021-89434-7
id doaj-4ecffb34346142b8adea9bd5b4185d38
record_format Article
spelling doaj-4ecffb34346142b8adea9bd5b4185d382021-05-16T11:26:20ZengNature Publishing GroupScientific Reports2045-23222021-05-0111111210.1038/s41598-021-89434-7Random forest-based prediction of stroke outcomeCarlos Fernandez-Lozano0Pablo Hervella1Virginia Mato-Abad2Manuel Rodríguez-Yáñez3Sonia Suárez-Garaboa4Iria López-Dequidt5Ana Estany-Gestal6Tomás Sobrino7Francisco Campos8José Castillo9Santiago Rodríguez-Yáñez10Ramón Iglesias-Rey11Department of Computer Science and Information Technologies, Faculty of Computer Science, CITIC-Research Center of Information and Communication Technologies, Universidade da CoruñaClinical Neurosciences Research Laboratory (LINC), Health Research Institute of Santiago de Compostela (IDIS)Software Engineering Laboratory, Department of Computer Science and Information Technologies, Faculty of Computer Science, University of A CoruñaStroke Unit, Department of Neurology, Health Research Institute of Santiago de Compostela (IDIS), Hospital Clínico UniversitarioSoftware Engineering Laboratory, Department of Computer Science and Information Technologies, Faculty of Computer Science, University of A CoruñaStroke Unit, Department of Neurology, Health Research Institute of Santiago de Compostela (IDIS), Hospital Clínico UniversitarioUnit of Methodology of the Research, Health Research Institute of Santiago de Compostela (IDIS)Clinical Neurosciences Research Laboratory (LINC), Health Research Institute of Santiago de Compostela (IDIS)Clinical Neurosciences Research Laboratory (LINC), Health Research Institute of Santiago de Compostela (IDIS)Clinical Neurosciences Research Laboratory (LINC), Health Research Institute of Santiago de Compostela (IDIS)Software Engineering Laboratory, Department of Computer Science and Information Technologies, Faculty of Computer Science, University of A CoruñaClinical Neurosciences Research Laboratory (LINC), Health Research Institute of Santiago de Compostela (IDIS)Abstract We research into the clinical, biochemical and neuroimaging factors associated with the outcome of stroke patients to generate a predictive model using machine learning techniques for prediction of mortality and morbidity 3-months after admission. The dataset consisted of patients with ischemic stroke (IS) and non-traumatic intracerebral hemorrhage (ICH) admitted to Stroke Unit of a European Tertiary Hospital prospectively registered. We identified the main variables for machine learning Random Forest (RF), generating a predictive model that can estimate patient mortality/morbidity according to the following groups: (1) IS + ICH, (2) IS, and (3) ICH. A total of 6022 patients were included: 4922 (mean age 71.9 ± 13.8 years) with IS and 1100 (mean age 73.3 ± 13.1 years) with ICH. NIHSS at 24, 48 h and axillary temperature at admission were the most important variables to consider for evolution of patients at 3-months. IS + ICH group was the most stable for mortality prediction [0.904 ± 0.025 of area under the receiver operating characteristics curve (AUC)]. IS group presented similar results, although variability between experiments was slightly higher (0.909 ± 0.032 of AUC). ICH group was the one in which RF had more problems to make adequate predictions (0.9837 vs. 0.7104 of AUC). There were no major differences between IS and IS + ICH groups according to morbidity prediction (0.738 and 0.755 of AUC) but, after checking normality with a Shapiro Wilk test with the null hypothesis that the data follow a normal distribution, it was rejected with W = 0.93546 (p-value < 2.2e−16). Conditions required for a parametric test do not hold, and we performed a paired Wilcoxon Test assuming the null hypothesis that all the groups have the same performance. The null hypothesis was rejected with a value < 2.2e−16, so there are statistical differences between IS and ICH groups. In conclusion, machine learning algorithms RF can be effectively used in stroke patients for long-term outcome prediction of mortality and morbidity.https://doi.org/10.1038/s41598-021-89434-7
collection DOAJ
language English
format Article
sources DOAJ
author Carlos Fernandez-Lozano
Pablo Hervella
Virginia Mato-Abad
Manuel Rodríguez-Yáñez
Sonia Suárez-Garaboa
Iria López-Dequidt
Ana Estany-Gestal
Tomás Sobrino
Francisco Campos
José Castillo
Santiago Rodríguez-Yáñez
Ramón Iglesias-Rey
spellingShingle Carlos Fernandez-Lozano
Pablo Hervella
Virginia Mato-Abad
Manuel Rodríguez-Yáñez
Sonia Suárez-Garaboa
Iria López-Dequidt
Ana Estany-Gestal
Tomás Sobrino
Francisco Campos
José Castillo
Santiago Rodríguez-Yáñez
Ramón Iglesias-Rey
Random forest-based prediction of stroke outcome
Scientific Reports
author_facet Carlos Fernandez-Lozano
Pablo Hervella
Virginia Mato-Abad
Manuel Rodríguez-Yáñez
Sonia Suárez-Garaboa
Iria López-Dequidt
Ana Estany-Gestal
Tomás Sobrino
Francisco Campos
José Castillo
Santiago Rodríguez-Yáñez
Ramón Iglesias-Rey
author_sort Carlos Fernandez-Lozano
title Random forest-based prediction of stroke outcome
title_short Random forest-based prediction of stroke outcome
title_full Random forest-based prediction of stroke outcome
title_fullStr Random forest-based prediction of stroke outcome
title_full_unstemmed Random forest-based prediction of stroke outcome
title_sort random forest-based prediction of stroke outcome
publisher Nature Publishing Group
series Scientific Reports
issn 2045-2322
publishDate 2021-05-01
description Abstract We research into the clinical, biochemical and neuroimaging factors associated with the outcome of stroke patients to generate a predictive model using machine learning techniques for prediction of mortality and morbidity 3-months after admission. The dataset consisted of patients with ischemic stroke (IS) and non-traumatic intracerebral hemorrhage (ICH) admitted to Stroke Unit of a European Tertiary Hospital prospectively registered. We identified the main variables for machine learning Random Forest (RF), generating a predictive model that can estimate patient mortality/morbidity according to the following groups: (1) IS + ICH, (2) IS, and (3) ICH. A total of 6022 patients were included: 4922 (mean age 71.9 ± 13.8 years) with IS and 1100 (mean age 73.3 ± 13.1 years) with ICH. NIHSS at 24, 48 h and axillary temperature at admission were the most important variables to consider for evolution of patients at 3-months. IS + ICH group was the most stable for mortality prediction [0.904 ± 0.025 of area under the receiver operating characteristics curve (AUC)]. IS group presented similar results, although variability between experiments was slightly higher (0.909 ± 0.032 of AUC). ICH group was the one in which RF had more problems to make adequate predictions (0.9837 vs. 0.7104 of AUC). There were no major differences between IS and IS + ICH groups according to morbidity prediction (0.738 and 0.755 of AUC) but, after checking normality with a Shapiro Wilk test with the null hypothesis that the data follow a normal distribution, it was rejected with W = 0.93546 (p-value < 2.2e−16). Conditions required for a parametric test do not hold, and we performed a paired Wilcoxon Test assuming the null hypothesis that all the groups have the same performance. The null hypothesis was rejected with a value < 2.2e−16, so there are statistical differences between IS and ICH groups. In conclusion, machine learning algorithms RF can be effectively used in stroke patients for long-term outcome prediction of mortality and morbidity.
url https://doi.org/10.1038/s41598-021-89434-7
work_keys_str_mv AT carlosfernandezlozano randomforestbasedpredictionofstrokeoutcome
AT pablohervella randomforestbasedpredictionofstrokeoutcome
AT virginiamatoabad randomforestbasedpredictionofstrokeoutcome
AT manuelrodriguezyanez randomforestbasedpredictionofstrokeoutcome
AT soniasuarezgaraboa randomforestbasedpredictionofstrokeoutcome
AT irialopezdequidt randomforestbasedpredictionofstrokeoutcome
AT anaestanygestal randomforestbasedpredictionofstrokeoutcome
AT tomassobrino randomforestbasedpredictionofstrokeoutcome
AT franciscocampos randomforestbasedpredictionofstrokeoutcome
AT josecastillo randomforestbasedpredictionofstrokeoutcome
AT santiagorodriguezyanez randomforestbasedpredictionofstrokeoutcome
AT ramoniglesiasrey randomforestbasedpredictionofstrokeoutcome
_version_ 1721439470603993088