Identifying homogeneous subgroups of patients and important features: a topological machine learning approach

Background: This paper exploits recent developments in topological data analysis to present a pipeline for clustering based on Mapper, an algorithm that reduces complex data into a one-dimensional graph. Results: We present a pipeline to identify and summarise clusters based on statistically signifi...

Full description

Bibliographic Details
Main Authors: Carr, E. (Author), Carrière, M. (Author), Chazal, F. (Author), Iniesta, R. (Author), Michel, B. (Author)
Format: Article
Language:English
Published: BioMed Central Ltd 2021
Subjects:
Online Access:View Fulltext in Publisher
LEADER 02448nam a2200565Ia 4500
001 10.1186-s12859-021-04360-9
008 220427s2021 CNT 000 0 und d
020 |a 14712105 (ISSN) 
245 1 0 |a Identifying homogeneous subgroups of patients and important features: a topological machine learning approach 
260 0 |b BioMed Central Ltd  |c 2021 
856 |z View Fulltext in Publisher  |u https://doi.org/10.1186/s12859-021-04360-9 
520 3 |a Background: This paper exploits recent developments in topological data analysis to present a pipeline for clustering based on Mapper, an algorithm that reduces complex data into a one-dimensional graph. Results: We present a pipeline to identify and summarise clusters based on statistically significant topological features from a point cloud using Mapper. Conclusions: Key strengths of this pipeline include the integration of prior knowledge to inform the clustering process and the selection of optimal clusters; the use of the bootstrap to restrict the search to robust topological features; the use of machine learning to inspect clusters; and the ability to incorporate mixed data types. Our pipeline can be downloaded under the GNU GPLv3 license at https://github.com/kcl-bhi/mapper-pipeline. © 2021, The Author(s). 
650 0 4 |a adult 
650 0 4 |a algorithm 
650 0 4 |a Algorithms 
650 0 4 |a article 
650 0 4 |a bootstrapping 
650 0 4 |a cluster analysis 
650 0 4 |a Cluster Analysis 
650 0 4 |a Clustering 
650 0 4 |a Clustering algorithms 
650 0 4 |a Clustering process 
650 0 4 |a Complex data 
650 0 4 |a data analysis 
650 0 4 |a data analysis 
650 0 4 |a Data Analysis 
650 0 4 |a Graph algorithms 
650 0 4 |a human 
650 0 4 |a Humans 
650 0 4 |a Important features 
650 0 4 |a licence 
650 0 4 |a machine learning 
650 0 4 |a Machine learning 
650 0 4 |a Machine learning 
650 0 4 |a Machine Learning 
650 0 4 |a Machine learning approaches 
650 0 4 |a Mixed data types 
650 0 4 |a pipeline 
650 0 4 |a Pipelines 
650 0 4 |a Prior knowledge 
650 0 4 |a Topological data analysis 
650 0 4 |a Topological data analysis 
650 0 4 |a Topological features 
650 0 4 |a Topology 
700 1 |a Carr, E.  |e author 
700 1 |a Carrière, M.  |e author 
700 1 |a Chazal, F.  |e author 
700 1 |a Iniesta, R.  |e author 
700 1 |a Michel, B.  |e author 
773 |t BMC Bioinformatics