A comparison framework and guideline of clustering methods for mass cytometry data

Abstract Background With the expanding applications of mass cytometry in medical research, a wide variety of clustering methods, both semi-supervised and unsupervised, have been developed for data analysis. Selecting the optimal clustering method can accelerate the identification of meaningful cell...

Full description

Bibliographic Details
Main Authors: Xiao Liu, Weichen Song, Brandon Y. Wong, Ting Zhang, Shunying Yu, Guan Ning Lin, Xianting Ding
Format: Article
Language:English
Published: BMC 2019-12-01
Series:Genome Biology
Subjects:
Online Access:https://doi.org/10.1186/s13059-019-1917-7
id doaj-364c7cf5cc334feaace05582cec5127a
record_format Article
spelling doaj-364c7cf5cc334feaace05582cec5127a2020-12-27T12:20:15ZengBMCGenome Biology1474-760X2019-12-0120111810.1186/s13059-019-1917-7A comparison framework and guideline of clustering methods for mass cytometry dataXiao Liu0Weichen Song1Brandon Y. Wong2Ting Zhang3Shunying Yu4Guan Ning Lin5Xianting Ding6State Key Laboratory of Oncogenes and Related Genes, Institute for Personalized Medicine, School of Biomedical Engineering, Shanghai Jiao Tong UniversityShanghai Key Laboratory of Psychotic Disorders, Shanghai Mental Health Center, Shanghai Jiao Tong University School of MedicineState Key Laboratory of Oncogenes and Related Genes, Institute for Personalized Medicine, School of Biomedical Engineering, Shanghai Jiao Tong UniversityState Key Laboratory of Oncogenes and Related Genes, Institute for Personalized Medicine, School of Biomedical Engineering, Shanghai Jiao Tong UniversityShanghai Key Laboratory of Psychotic Disorders, Shanghai Mental Health Center, Shanghai Jiao Tong University School of MedicineState Key Laboratory of Oncogenes and Related Genes, Institute for Personalized Medicine, School of Biomedical Engineering, Shanghai Jiao Tong UniversityState Key Laboratory of Oncogenes and Related Genes, Institute for Personalized Medicine, School of Biomedical Engineering, Shanghai Jiao Tong UniversityAbstract Background With the expanding applications of mass cytometry in medical research, a wide variety of clustering methods, both semi-supervised and unsupervised, have been developed for data analysis. Selecting the optimal clustering method can accelerate the identification of meaningful cell populations. Result To address this issue, we compared three classes of performance measures, “precision” as external evaluation, “coherence” as internal evaluation, and stability, of nine methods based on six independent benchmark datasets. Seven unsupervised methods (Accense, Xshift, PhenoGraph, FlowSOM, flowMeans, DEPECHE, and kmeans) and two semi-supervised methods (Automated Cell-type Discovery and Classification and linear discriminant analysis (LDA)) are tested on six mass cytometry datasets. We compute and compare all defined performance measures against random subsampling, varying sample sizes, and the number of clusters for each method. LDA reproduces the manual labels most precisely but does not rank top in internal evaluation. PhenoGraph and FlowSOM perform better than other unsupervised tools in precision, coherence, and stability. PhenoGraph and Xshift are more robust when detecting refined sub-clusters, whereas DEPECHE and FlowSOM tend to group similar clusters into meta-clusters. The performances of PhenoGraph, Xshift, and flowMeans are impacted by increased sample size, but FlowSOM is relatively stable as sample size increases. Conclusion All the evaluations including precision, coherence, stability, and clustering resolution should be taken into synthetic consideration when choosing an appropriate tool for cytometry data analysis. Thus, we provide decision guidelines based on these characteristics for the general reader to more easily choose the most suitable clustering tools.https://doi.org/10.1186/s13059-019-1917-7Mass cytometryCyTOFCell populationClustering toolsComparison
collection DOAJ
language English
format Article
sources DOAJ
author Xiao Liu
Weichen Song
Brandon Y. Wong
Ting Zhang
Shunying Yu
Guan Ning Lin
Xianting Ding
spellingShingle Xiao Liu
Weichen Song
Brandon Y. Wong
Ting Zhang
Shunying Yu
Guan Ning Lin
Xianting Ding
A comparison framework and guideline of clustering methods for mass cytometry data
Genome Biology
Mass cytometry
CyTOF
Cell population
Clustering tools
Comparison
author_facet Xiao Liu
Weichen Song
Brandon Y. Wong
Ting Zhang
Shunying Yu
Guan Ning Lin
Xianting Ding
author_sort Xiao Liu
title A comparison framework and guideline of clustering methods for mass cytometry data
title_short A comparison framework and guideline of clustering methods for mass cytometry data
title_full A comparison framework and guideline of clustering methods for mass cytometry data
title_fullStr A comparison framework and guideline of clustering methods for mass cytometry data
title_full_unstemmed A comparison framework and guideline of clustering methods for mass cytometry data
title_sort comparison framework and guideline of clustering methods for mass cytometry data
publisher BMC
series Genome Biology
issn 1474-760X
publishDate 2019-12-01
description Abstract Background With the expanding applications of mass cytometry in medical research, a wide variety of clustering methods, both semi-supervised and unsupervised, have been developed for data analysis. Selecting the optimal clustering method can accelerate the identification of meaningful cell populations. Result To address this issue, we compared three classes of performance measures, “precision” as external evaluation, “coherence” as internal evaluation, and stability, of nine methods based on six independent benchmark datasets. Seven unsupervised methods (Accense, Xshift, PhenoGraph, FlowSOM, flowMeans, DEPECHE, and kmeans) and two semi-supervised methods (Automated Cell-type Discovery and Classification and linear discriminant analysis (LDA)) are tested on six mass cytometry datasets. We compute and compare all defined performance measures against random subsampling, varying sample sizes, and the number of clusters for each method. LDA reproduces the manual labels most precisely but does not rank top in internal evaluation. PhenoGraph and FlowSOM perform better than other unsupervised tools in precision, coherence, and stability. PhenoGraph and Xshift are more robust when detecting refined sub-clusters, whereas DEPECHE and FlowSOM tend to group similar clusters into meta-clusters. The performances of PhenoGraph, Xshift, and flowMeans are impacted by increased sample size, but FlowSOM is relatively stable as sample size increases. Conclusion All the evaluations including precision, coherence, stability, and clustering resolution should be taken into synthetic consideration when choosing an appropriate tool for cytometry data analysis. Thus, we provide decision guidelines based on these characteristics for the general reader to more easily choose the most suitable clustering tools.
topic Mass cytometry
CyTOF
Cell population
Clustering tools
Comparison
url https://doi.org/10.1186/s13059-019-1917-7
work_keys_str_mv AT xiaoliu acomparisonframeworkandguidelineofclusteringmethodsformasscytometrydata
AT weichensong acomparisonframeworkandguidelineofclusteringmethodsformasscytometrydata
AT brandonywong acomparisonframeworkandguidelineofclusteringmethodsformasscytometrydata
AT tingzhang acomparisonframeworkandguidelineofclusteringmethodsformasscytometrydata
AT shunyingyu acomparisonframeworkandguidelineofclusteringmethodsformasscytometrydata
AT guanninglin acomparisonframeworkandguidelineofclusteringmethodsformasscytometrydata
AT xiantingding acomparisonframeworkandguidelineofclusteringmethodsformasscytometrydata
AT xiaoliu comparisonframeworkandguidelineofclusteringmethodsformasscytometrydata
AT weichensong comparisonframeworkandguidelineofclusteringmethodsformasscytometrydata
AT brandonywong comparisonframeworkandguidelineofclusteringmethodsformasscytometrydata
AT tingzhang comparisonframeworkandguidelineofclusteringmethodsformasscytometrydata
AT shunyingyu comparisonframeworkandguidelineofclusteringmethodsformasscytometrydata
AT guanninglin comparisonframeworkandguidelineofclusteringmethodsformasscytometrydata
AT xiantingding comparisonframeworkandguidelineofclusteringmethodsformasscytometrydata
_version_ 1724369142763487232