Identifying homogeneous subgroups of patients and important features: a topological machine learning approach

Abstract Background This paper exploits recent developments in topological data analysis to present a pipeline for clustering based on Mapper, an algorithm that reduces complex data into a one-dimensional graph. Results We present a pipeline to identify and summarise clusters based on statistically...

Full description

Bibliographic Details
Main Authors: Ewan Carr, Mathieu Carrière, Bertrand Michel, Frédéric Chazal, Raquel Iniesta
Format: Article
Language:English
Published: BMC 2021-09-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-021-04360-9
id doaj-7987bfc4f24c4d468de684aefd90a4f1
record_format Article
spelling doaj-7987bfc4f24c4d468de684aefd90a4f12021-09-26T11:15:33ZengBMCBMC Bioinformatics1471-21052021-09-012211710.1186/s12859-021-04360-9Identifying homogeneous subgroups of patients and important features: a topological machine learning approachEwan Carr0Mathieu Carrière1Bertrand Michel2Frédéric Chazal3Raquel Iniesta4Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King’s College LondonInria Sophia-Antipolis, DataShape TeamEcole Centrale de Nantes, LMJL – UMR CNRS 6629Inria Saclay, Ile-de-France, Alan Turing BuildingDepartment of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King’s College LondonAbstract Background This paper exploits recent developments in topological data analysis to present a pipeline for clustering based on Mapper, an algorithm that reduces complex data into a one-dimensional graph. Results We present a pipeline to identify and summarise clusters based on statistically significant topological features from a point cloud using Mapper. Conclusions Key strengths of this pipeline include the integration of prior knowledge to inform the clustering process and the selection of optimal clusters; the use of the bootstrap to restrict the search to robust topological features; the use of machine learning to inspect clusters; and the ability to incorporate mixed data types. Our pipeline can be downloaded under the GNU GPLv3 license at https://github.com/kcl-bhi/mapper-pipeline .https://doi.org/10.1186/s12859-021-04360-9Topological data analysisClusteringMachine learning
collection DOAJ
language English
format Article
sources DOAJ
author Ewan Carr
Mathieu Carrière
Bertrand Michel
Frédéric Chazal
Raquel Iniesta
spellingShingle Ewan Carr
Mathieu Carrière
Bertrand Michel
Frédéric Chazal
Raquel Iniesta
Identifying homogeneous subgroups of patients and important features: a topological machine learning approach
BMC Bioinformatics
Topological data analysis
Clustering
Machine learning
author_facet Ewan Carr
Mathieu Carrière
Bertrand Michel
Frédéric Chazal
Raquel Iniesta
author_sort Ewan Carr
title Identifying homogeneous subgroups of patients and important features: a topological machine learning approach
title_short Identifying homogeneous subgroups of patients and important features: a topological machine learning approach
title_full Identifying homogeneous subgroups of patients and important features: a topological machine learning approach
title_fullStr Identifying homogeneous subgroups of patients and important features: a topological machine learning approach
title_full_unstemmed Identifying homogeneous subgroups of patients and important features: a topological machine learning approach
title_sort identifying homogeneous subgroups of patients and important features: a topological machine learning approach
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2021-09-01
description Abstract Background This paper exploits recent developments in topological data analysis to present a pipeline for clustering based on Mapper, an algorithm that reduces complex data into a one-dimensional graph. Results We present a pipeline to identify and summarise clusters based on statistically significant topological features from a point cloud using Mapper. Conclusions Key strengths of this pipeline include the integration of prior knowledge to inform the clustering process and the selection of optimal clusters; the use of the bootstrap to restrict the search to robust topological features; the use of machine learning to inspect clusters; and the ability to incorporate mixed data types. Our pipeline can be downloaded under the GNU GPLv3 license at https://github.com/kcl-bhi/mapper-pipeline .
topic Topological data analysis
Clustering
Machine learning
url https://doi.org/10.1186/s12859-021-04360-9
work_keys_str_mv AT ewancarr identifyinghomogeneoussubgroupsofpatientsandimportantfeaturesatopologicalmachinelearningapproach
AT mathieucarriere identifyinghomogeneoussubgroupsofpatientsandimportantfeaturesatopologicalmachinelearningapproach
AT bertrandmichel identifyinghomogeneoussubgroupsofpatientsandimportantfeaturesatopologicalmachinelearningapproach
AT fredericchazal identifyinghomogeneoussubgroupsofpatientsandimportantfeaturesatopologicalmachinelearningapproach
AT raqueliniesta identifyinghomogeneoussubgroupsofpatientsandimportantfeaturesatopologicalmachinelearningapproach
_version_ 1716868097823997952