Identifying homogeneous subgroups of patients and important features: a topological machine learning approach
Abstract Background This paper exploits recent developments in topological data analysis to present a pipeline for clustering based on Mapper, an algorithm that reduces complex data into a one-dimensional graph. Results We present a pipeline to identify and summarise clusters based on statistically...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2021-09-01
|
Series: | BMC Bioinformatics |
Subjects: | |
Online Access: | https://doi.org/10.1186/s12859-021-04360-9 |
id |
doaj-7987bfc4f24c4d468de684aefd90a4f1 |
---|---|
record_format |
Article |
spelling |
doaj-7987bfc4f24c4d468de684aefd90a4f12021-09-26T11:15:33ZengBMCBMC Bioinformatics1471-21052021-09-012211710.1186/s12859-021-04360-9Identifying homogeneous subgroups of patients and important features: a topological machine learning approachEwan Carr0Mathieu Carrière1Bertrand Michel2Frédéric Chazal3Raquel Iniesta4Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King’s College LondonInria Sophia-Antipolis, DataShape TeamEcole Centrale de Nantes, LMJL – UMR CNRS 6629Inria Saclay, Ile-de-France, Alan Turing BuildingDepartment of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King’s College LondonAbstract Background This paper exploits recent developments in topological data analysis to present a pipeline for clustering based on Mapper, an algorithm that reduces complex data into a one-dimensional graph. Results We present a pipeline to identify and summarise clusters based on statistically significant topological features from a point cloud using Mapper. Conclusions Key strengths of this pipeline include the integration of prior knowledge to inform the clustering process and the selection of optimal clusters; the use of the bootstrap to restrict the search to robust topological features; the use of machine learning to inspect clusters; and the ability to incorporate mixed data types. Our pipeline can be downloaded under the GNU GPLv3 license at https://github.com/kcl-bhi/mapper-pipeline .https://doi.org/10.1186/s12859-021-04360-9Topological data analysisClusteringMachine learning |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Ewan Carr Mathieu Carrière Bertrand Michel Frédéric Chazal Raquel Iniesta |
spellingShingle |
Ewan Carr Mathieu Carrière Bertrand Michel Frédéric Chazal Raquel Iniesta Identifying homogeneous subgroups of patients and important features: a topological machine learning approach BMC Bioinformatics Topological data analysis Clustering Machine learning |
author_facet |
Ewan Carr Mathieu Carrière Bertrand Michel Frédéric Chazal Raquel Iniesta |
author_sort |
Ewan Carr |
title |
Identifying homogeneous subgroups of patients and important features: a topological machine learning approach |
title_short |
Identifying homogeneous subgroups of patients and important features: a topological machine learning approach |
title_full |
Identifying homogeneous subgroups of patients and important features: a topological machine learning approach |
title_fullStr |
Identifying homogeneous subgroups of patients and important features: a topological machine learning approach |
title_full_unstemmed |
Identifying homogeneous subgroups of patients and important features: a topological machine learning approach |
title_sort |
identifying homogeneous subgroups of patients and important features: a topological machine learning approach |
publisher |
BMC |
series |
BMC Bioinformatics |
issn |
1471-2105 |
publishDate |
2021-09-01 |
description |
Abstract Background This paper exploits recent developments in topological data analysis to present a pipeline for clustering based on Mapper, an algorithm that reduces complex data into a one-dimensional graph. Results We present a pipeline to identify and summarise clusters based on statistically significant topological features from a point cloud using Mapper. Conclusions Key strengths of this pipeline include the integration of prior knowledge to inform the clustering process and the selection of optimal clusters; the use of the bootstrap to restrict the search to robust topological features; the use of machine learning to inspect clusters; and the ability to incorporate mixed data types. Our pipeline can be downloaded under the GNU GPLv3 license at https://github.com/kcl-bhi/mapper-pipeline . |
topic |
Topological data analysis Clustering Machine learning |
url |
https://doi.org/10.1186/s12859-021-04360-9 |
work_keys_str_mv |
AT ewancarr identifyinghomogeneoussubgroupsofpatientsandimportantfeaturesatopologicalmachinelearningapproach AT mathieucarriere identifyinghomogeneoussubgroupsofpatientsandimportantfeaturesatopologicalmachinelearningapproach AT bertrandmichel identifyinghomogeneoussubgroupsofpatientsandimportantfeaturesatopologicalmachinelearningapproach AT fredericchazal identifyinghomogeneoussubgroupsofpatientsandimportantfeaturesatopologicalmachinelearningapproach AT raqueliniesta identifyinghomogeneoussubgroupsofpatientsandimportantfeaturesatopologicalmachinelearningapproach |
_version_ |
1716868097823997952 |