An application of machine learning with feature selection to improve diagnosis and classification of neurodegenerative disorders

Abstract Background The analysis of health and medical data is crucial for improving the diagnosis precision, treatments and prevention. In this field, machine learning techniques play a key role. However, the amount of health data acquired from digital machines has high dimensionality and not all d...

Full description

Bibliographic Details
Main Authors: Josefa Díaz Álvarez, Jordi A. Matias-Guiu, María Nieves Cabrera-Martín, José L. Risco-Martín, José L. Ayala
Format: Article
Language:English
Published: BMC 2019-10-01
Series:BMC Bioinformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12859-019-3027-7
id doaj-5fa57f4d30e740a5bf809be4ccaaf250
record_format Article
spelling doaj-5fa57f4d30e740a5bf809be4ccaaf2502020-11-25T03:36:40ZengBMCBMC Bioinformatics1471-21052019-10-0120111210.1186/s12859-019-3027-7An application of machine learning with feature selection to improve diagnosis and classification of neurodegenerative disordersJosefa Díaz Álvarez0Jordi A. Matias-Guiu1María Nieves Cabrera-Martín2José L. Risco-Martín3José L. Ayala4Dep. of Computer Architecture and Communications, Universidad de ExtremaduraDep. of Neurology, Hospital Clinico San Carlos, San Carlos Research Health Institute (IdISSC), Universidad ComplutenseDep. of Neurology, Hospital Clinico San Carlos, San Carlos Research Health Institute (IdISSC), Universidad ComplutenseDep. of Computer Architecture and Automation, Universidad ComplutenseDep. of Computer Architecture and Automation, Universidad ComplutenseAbstract Background The analysis of health and medical data is crucial for improving the diagnosis precision, treatments and prevention. In this field, machine learning techniques play a key role. However, the amount of health data acquired from digital machines has high dimensionality and not all data acquired from digital machines are relevant for a particular disease. Primary Progressive Aphasia (PPA) is a neurodegenerative syndrome including several specific diseases, and it is a good model to implement machine learning analyses. In this work, we applied five feature selection algorithms to identify the set of relevant features from 18F-fluorodeoxyglucose positron emission tomography images of the main areas affected by PPA from patient records. On the other hand, we carried out classification and clustering algorithms before and after the feature selection process to contrast both results with those obtained in a previous work. We aimed to find the best classifier and the more relevant features from the WEKA tool to propose further a framework for automatic help on diagnosis. Dataset contains data from 150 FDG-PET imaging studies of 91 patients with a clinic prognosis of PPA, which were examined twice, and 28 controls. Our method comprises six different stages: (i) feature extraction, (ii) expertise knowledge supervision (iii) classification process, (iv) comparing classification results for feature selection, (v) clustering process after feature selection, and (vi) comparing clustering results with those obtained in a previous work. Results Experimental tests confirmed clustering results from a previous work. Although classification results for some algorithms are not decisive for reducing features precisely, Principal Components Analisys (PCA) results exhibited similar or even better performances when compared to those obtained with all features. Conclusions Although reducing the dimensionality does not means a general improvement, the set of features is almost halved and results are better or quite similar. Finally, it is interesting how these results expose a finer grain classification of patients according to the neuroanatomy of their disease.http://link.springer.com/article/10.1186/s12859-019-3027-7Machine learning primary progressive aphasiaSupervised algorithmUnsupervised algorithmClustering Analysis
collection DOAJ
language English
format Article
sources DOAJ
author Josefa Díaz Álvarez
Jordi A. Matias-Guiu
María Nieves Cabrera-Martín
José L. Risco-Martín
José L. Ayala
spellingShingle Josefa Díaz Álvarez
Jordi A. Matias-Guiu
María Nieves Cabrera-Martín
José L. Risco-Martín
José L. Ayala
An application of machine learning with feature selection to improve diagnosis and classification of neurodegenerative disorders
BMC Bioinformatics
Machine learning primary progressive aphasia
Supervised algorithm
Unsupervised algorithm
Clustering Analysis
author_facet Josefa Díaz Álvarez
Jordi A. Matias-Guiu
María Nieves Cabrera-Martín
José L. Risco-Martín
José L. Ayala
author_sort Josefa Díaz Álvarez
title An application of machine learning with feature selection to improve diagnosis and classification of neurodegenerative disorders
title_short An application of machine learning with feature selection to improve diagnosis and classification of neurodegenerative disorders
title_full An application of machine learning with feature selection to improve diagnosis and classification of neurodegenerative disorders
title_fullStr An application of machine learning with feature selection to improve diagnosis and classification of neurodegenerative disorders
title_full_unstemmed An application of machine learning with feature selection to improve diagnosis and classification of neurodegenerative disorders
title_sort application of machine learning with feature selection to improve diagnosis and classification of neurodegenerative disorders
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2019-10-01
description Abstract Background The analysis of health and medical data is crucial for improving the diagnosis precision, treatments and prevention. In this field, machine learning techniques play a key role. However, the amount of health data acquired from digital machines has high dimensionality and not all data acquired from digital machines are relevant for a particular disease. Primary Progressive Aphasia (PPA) is a neurodegenerative syndrome including several specific diseases, and it is a good model to implement machine learning analyses. In this work, we applied five feature selection algorithms to identify the set of relevant features from 18F-fluorodeoxyglucose positron emission tomography images of the main areas affected by PPA from patient records. On the other hand, we carried out classification and clustering algorithms before and after the feature selection process to contrast both results with those obtained in a previous work. We aimed to find the best classifier and the more relevant features from the WEKA tool to propose further a framework for automatic help on diagnosis. Dataset contains data from 150 FDG-PET imaging studies of 91 patients with a clinic prognosis of PPA, which were examined twice, and 28 controls. Our method comprises six different stages: (i) feature extraction, (ii) expertise knowledge supervision (iii) classification process, (iv) comparing classification results for feature selection, (v) clustering process after feature selection, and (vi) comparing clustering results with those obtained in a previous work. Results Experimental tests confirmed clustering results from a previous work. Although classification results for some algorithms are not decisive for reducing features precisely, Principal Components Analisys (PCA) results exhibited similar or even better performances when compared to those obtained with all features. Conclusions Although reducing the dimensionality does not means a general improvement, the set of features is almost halved and results are better or quite similar. Finally, it is interesting how these results expose a finer grain classification of patients according to the neuroanatomy of their disease.
topic Machine learning primary progressive aphasia
Supervised algorithm
Unsupervised algorithm
Clustering Analysis
url http://link.springer.com/article/10.1186/s12859-019-3027-7
work_keys_str_mv AT josefadiazalvarez anapplicationofmachinelearningwithfeatureselectiontoimprovediagnosisandclassificationofneurodegenerativedisorders
AT jordiamatiasguiu anapplicationofmachinelearningwithfeatureselectiontoimprovediagnosisandclassificationofneurodegenerativedisorders
AT marianievescabreramartin anapplicationofmachinelearningwithfeatureselectiontoimprovediagnosisandclassificationofneurodegenerativedisorders
AT joselriscomartin anapplicationofmachinelearningwithfeatureselectiontoimprovediagnosisandclassificationofneurodegenerativedisorders
AT joselayala anapplicationofmachinelearningwithfeatureselectiontoimprovediagnosisandclassificationofneurodegenerativedisorders
AT josefadiazalvarez applicationofmachinelearningwithfeatureselectiontoimprovediagnosisandclassificationofneurodegenerativedisorders
AT jordiamatiasguiu applicationofmachinelearningwithfeatureselectiontoimprovediagnosisandclassificationofneurodegenerativedisorders
AT marianievescabreramartin applicationofmachinelearningwithfeatureselectiontoimprovediagnosisandclassificationofneurodegenerativedisorders
AT joselriscomartin applicationofmachinelearningwithfeatureselectiontoimprovediagnosisandclassificationofneurodegenerativedisorders
AT joselayala applicationofmachinelearningwithfeatureselectiontoimprovediagnosisandclassificationofneurodegenerativedisorders
_version_ 1724548734342135808