Mood Disorder Detection in Adolescents by Classification Trees, Random Forests and XGBoost in Presence of Missing Data
We apply tree-based classification algorithms, namely the classification trees, with the use of the rpart algorithm, random forests and XGBoost methods to detect mood disorder in a group of 2508 lower secondary school students. The dataset presents many challenges, the most important of which is man...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2021-09-01
|
Series: | Entropy |
Subjects: | |
Online Access: | https://www.mdpi.com/1099-4300/23/9/1210 |
id |
doaj-556fef2103b14adeabc8f08674a39f0a |
---|---|
record_format |
Article |
spelling |
doaj-556fef2103b14adeabc8f08674a39f0a2021-09-26T00:07:05ZengMDPI AGEntropy1099-43002021-09-01231210121010.3390/e23091210Mood Disorder Detection in Adolescents by Classification Trees, Random Forests and XGBoost in Presence of Missing DataElzbieta Turska0Szymon Jurga1Jaroslaw Piskorski2Institute of Pedagogy, University of Zielona Gora, 65-417 Zielona Gora, PolandDepartment of Neurology, Collegium Medicum, University Hospital, University of Zielona Góra, 65-417 Zielona Gora, PolandInstitute of Physics, University of Zielona Gora, 65-417 Zielona Gora, PolandWe apply tree-based classification algorithms, namely the classification trees, with the use of the rpart algorithm, random forests and XGBoost methods to detect mood disorder in a group of 2508 lower secondary school students. The dataset presents many challenges, the most important of which is many missing data as well as the being heavily unbalanced (there are few severe mood disorder cases). We find that all algorithms are specific, but only the rpart algorithm is sensitive; i.e., it is able to detect cases of real cases mood disorder. The conclusion of this paper is that this is caused by the fact that the rpart algorithm uses the surrogate variables to handle missing data. The most important social-studies-related result is that the adolescents’ relationships with their parents are the single most important factor in developing mood disorders—far more important than other factors, such as the socio-economic status or school success.https://www.mdpi.com/1099-4300/23/9/1210classificationclassification treesrpart algorithmrandom forestsXGBoostmood disorders |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Elzbieta Turska Szymon Jurga Jaroslaw Piskorski |
spellingShingle |
Elzbieta Turska Szymon Jurga Jaroslaw Piskorski Mood Disorder Detection in Adolescents by Classification Trees, Random Forests and XGBoost in Presence of Missing Data Entropy classification classification trees rpart algorithm random forests XGBoost mood disorders |
author_facet |
Elzbieta Turska Szymon Jurga Jaroslaw Piskorski |
author_sort |
Elzbieta Turska |
title |
Mood Disorder Detection in Adolescents by Classification Trees, Random Forests and XGBoost in Presence of Missing Data |
title_short |
Mood Disorder Detection in Adolescents by Classification Trees, Random Forests and XGBoost in Presence of Missing Data |
title_full |
Mood Disorder Detection in Adolescents by Classification Trees, Random Forests and XGBoost in Presence of Missing Data |
title_fullStr |
Mood Disorder Detection in Adolescents by Classification Trees, Random Forests and XGBoost in Presence of Missing Data |
title_full_unstemmed |
Mood Disorder Detection in Adolescents by Classification Trees, Random Forests and XGBoost in Presence of Missing Data |
title_sort |
mood disorder detection in adolescents by classification trees, random forests and xgboost in presence of missing data |
publisher |
MDPI AG |
series |
Entropy |
issn |
1099-4300 |
publishDate |
2021-09-01 |
description |
We apply tree-based classification algorithms, namely the classification trees, with the use of the rpart algorithm, random forests and XGBoost methods to detect mood disorder in a group of 2508 lower secondary school students. The dataset presents many challenges, the most important of which is many missing data as well as the being heavily unbalanced (there are few severe mood disorder cases). We find that all algorithms are specific, but only the rpart algorithm is sensitive; i.e., it is able to detect cases of real cases mood disorder. The conclusion of this paper is that this is caused by the fact that the rpart algorithm uses the surrogate variables to handle missing data. The most important social-studies-related result is that the adolescents’ relationships with their parents are the single most important factor in developing mood disorders—far more important than other factors, such as the socio-economic status or school success. |
topic |
classification classification trees rpart algorithm random forests XGBoost mood disorders |
url |
https://www.mdpi.com/1099-4300/23/9/1210 |
work_keys_str_mv |
AT elzbietaturska mooddisorderdetectioninadolescentsbyclassificationtreesrandomforestsandxgboostinpresenceofmissingdata AT szymonjurga mooddisorderdetectioninadolescentsbyclassificationtreesrandomforestsandxgboostinpresenceofmissingdata AT jaroslawpiskorski mooddisorderdetectioninadolescentsbyclassificationtreesrandomforestsandxgboostinpresenceofmissingdata |
_version_ |
1717367022936915968 |