Mood Disorder Detection in Adolescents by Classification Trees, Random Forests and XGBoost in Presence of Missing Data

We apply tree-based classification algorithms, namely the classification trees, with the use of the rpart algorithm, random forests and XGBoost methods to detect mood disorder in a group of 2508 lower secondary school students. The dataset presents many challenges, the most important of which is man...

Full description

Bibliographic Details
Main Authors: Elzbieta Turska, Szymon Jurga, Jaroslaw Piskorski
Format: Article
Language:English
Published: MDPI AG 2021-09-01
Series:Entropy
Subjects:
Online Access:https://www.mdpi.com/1099-4300/23/9/1210
id doaj-556fef2103b14adeabc8f08674a39f0a
record_format Article
spelling doaj-556fef2103b14adeabc8f08674a39f0a2021-09-26T00:07:05ZengMDPI AGEntropy1099-43002021-09-01231210121010.3390/e23091210Mood Disorder Detection in Adolescents by Classification Trees, Random Forests and XGBoost in Presence of Missing DataElzbieta Turska0Szymon Jurga1Jaroslaw Piskorski2Institute of Pedagogy, University of Zielona Gora, 65-417 Zielona Gora, PolandDepartment of Neurology, Collegium Medicum, University Hospital, University of Zielona Góra, 65-417 Zielona Gora, PolandInstitute of Physics, University of Zielona Gora, 65-417 Zielona Gora, PolandWe apply tree-based classification algorithms, namely the classification trees, with the use of the rpart algorithm, random forests and XGBoost methods to detect mood disorder in a group of 2508 lower secondary school students. The dataset presents many challenges, the most important of which is many missing data as well as the being heavily unbalanced (there are few severe mood disorder cases). We find that all algorithms are specific, but only the rpart algorithm is sensitive; i.e., it is able to detect cases of real cases mood disorder. The conclusion of this paper is that this is caused by the fact that the rpart algorithm uses the surrogate variables to handle missing data. The most important social-studies-related result is that the adolescents’ relationships with their parents are the single most important factor in developing mood disorders—far more important than other factors, such as the socio-economic status or school success.https://www.mdpi.com/1099-4300/23/9/1210classificationclassification treesrpart algorithmrandom forestsXGBoostmood disorders
collection DOAJ
language English
format Article
sources DOAJ
author Elzbieta Turska
Szymon Jurga
Jaroslaw Piskorski
spellingShingle Elzbieta Turska
Szymon Jurga
Jaroslaw Piskorski
Mood Disorder Detection in Adolescents by Classification Trees, Random Forests and XGBoost in Presence of Missing Data
Entropy
classification
classification trees
rpart algorithm
random forests
XGBoost
mood disorders
author_facet Elzbieta Turska
Szymon Jurga
Jaroslaw Piskorski
author_sort Elzbieta Turska
title Mood Disorder Detection in Adolescents by Classification Trees, Random Forests and XGBoost in Presence of Missing Data
title_short Mood Disorder Detection in Adolescents by Classification Trees, Random Forests and XGBoost in Presence of Missing Data
title_full Mood Disorder Detection in Adolescents by Classification Trees, Random Forests and XGBoost in Presence of Missing Data
title_fullStr Mood Disorder Detection in Adolescents by Classification Trees, Random Forests and XGBoost in Presence of Missing Data
title_full_unstemmed Mood Disorder Detection in Adolescents by Classification Trees, Random Forests and XGBoost in Presence of Missing Data
title_sort mood disorder detection in adolescents by classification trees, random forests and xgboost in presence of missing data
publisher MDPI AG
series Entropy
issn 1099-4300
publishDate 2021-09-01
description We apply tree-based classification algorithms, namely the classification trees, with the use of the rpart algorithm, random forests and XGBoost methods to detect mood disorder in a group of 2508 lower secondary school students. The dataset presents many challenges, the most important of which is many missing data as well as the being heavily unbalanced (there are few severe mood disorder cases). We find that all algorithms are specific, but only the rpart algorithm is sensitive; i.e., it is able to detect cases of real cases mood disorder. The conclusion of this paper is that this is caused by the fact that the rpart algorithm uses the surrogate variables to handle missing data. The most important social-studies-related result is that the adolescents’ relationships with their parents are the single most important factor in developing mood disorders—far more important than other factors, such as the socio-economic status or school success.
topic classification
classification trees
rpart algorithm
random forests
XGBoost
mood disorders
url https://www.mdpi.com/1099-4300/23/9/1210
work_keys_str_mv AT elzbietaturska mooddisorderdetectioninadolescentsbyclassificationtreesrandomforestsandxgboostinpresenceofmissingdata
AT szymonjurga mooddisorderdetectioninadolescentsbyclassificationtreesrandomforestsandxgboostinpresenceofmissingdata
AT jaroslawpiskorski mooddisorderdetectioninadolescentsbyclassificationtreesrandomforestsandxgboostinpresenceofmissingdata
_version_ 1717367022936915968