A comparison framework and guideline of clustering methods for mass cytometry data
Abstract Background With the expanding applications of mass cytometry in medical research, a wide variety of clustering methods, both semi-supervised and unsupervised, have been developed for data analysis. Selecting the optimal clustering method can accelerate the identification of meaningful cell...
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2019-12-01
|
Series: | Genome Biology |
Subjects: | |
Online Access: | https://doi.org/10.1186/s13059-019-1917-7 |
id |
doaj-364c7cf5cc334feaace05582cec5127a |
---|---|
record_format |
Article |
spelling |
doaj-364c7cf5cc334feaace05582cec5127a2020-12-27T12:20:15ZengBMCGenome Biology1474-760X2019-12-0120111810.1186/s13059-019-1917-7A comparison framework and guideline of clustering methods for mass cytometry dataXiao Liu0Weichen Song1Brandon Y. Wong2Ting Zhang3Shunying Yu4Guan Ning Lin5Xianting Ding6State Key Laboratory of Oncogenes and Related Genes, Institute for Personalized Medicine, School of Biomedical Engineering, Shanghai Jiao Tong UniversityShanghai Key Laboratory of Psychotic Disorders, Shanghai Mental Health Center, Shanghai Jiao Tong University School of MedicineState Key Laboratory of Oncogenes and Related Genes, Institute for Personalized Medicine, School of Biomedical Engineering, Shanghai Jiao Tong UniversityState Key Laboratory of Oncogenes and Related Genes, Institute for Personalized Medicine, School of Biomedical Engineering, Shanghai Jiao Tong UniversityShanghai Key Laboratory of Psychotic Disorders, Shanghai Mental Health Center, Shanghai Jiao Tong University School of MedicineState Key Laboratory of Oncogenes and Related Genes, Institute for Personalized Medicine, School of Biomedical Engineering, Shanghai Jiao Tong UniversityState Key Laboratory of Oncogenes and Related Genes, Institute for Personalized Medicine, School of Biomedical Engineering, Shanghai Jiao Tong UniversityAbstract Background With the expanding applications of mass cytometry in medical research, a wide variety of clustering methods, both semi-supervised and unsupervised, have been developed for data analysis. Selecting the optimal clustering method can accelerate the identification of meaningful cell populations. Result To address this issue, we compared three classes of performance measures, “precision” as external evaluation, “coherence” as internal evaluation, and stability, of nine methods based on six independent benchmark datasets. Seven unsupervised methods (Accense, Xshift, PhenoGraph, FlowSOM, flowMeans, DEPECHE, and kmeans) and two semi-supervised methods (Automated Cell-type Discovery and Classification and linear discriminant analysis (LDA)) are tested on six mass cytometry datasets. We compute and compare all defined performance measures against random subsampling, varying sample sizes, and the number of clusters for each method. LDA reproduces the manual labels most precisely but does not rank top in internal evaluation. PhenoGraph and FlowSOM perform better than other unsupervised tools in precision, coherence, and stability. PhenoGraph and Xshift are more robust when detecting refined sub-clusters, whereas DEPECHE and FlowSOM tend to group similar clusters into meta-clusters. The performances of PhenoGraph, Xshift, and flowMeans are impacted by increased sample size, but FlowSOM is relatively stable as sample size increases. Conclusion All the evaluations including precision, coherence, stability, and clustering resolution should be taken into synthetic consideration when choosing an appropriate tool for cytometry data analysis. Thus, we provide decision guidelines based on these characteristics for the general reader to more easily choose the most suitable clustering tools.https://doi.org/10.1186/s13059-019-1917-7Mass cytometryCyTOFCell populationClustering toolsComparison |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Xiao Liu Weichen Song Brandon Y. Wong Ting Zhang Shunying Yu Guan Ning Lin Xianting Ding |
spellingShingle |
Xiao Liu Weichen Song Brandon Y. Wong Ting Zhang Shunying Yu Guan Ning Lin Xianting Ding A comparison framework and guideline of clustering methods for mass cytometry data Genome Biology Mass cytometry CyTOF Cell population Clustering tools Comparison |
author_facet |
Xiao Liu Weichen Song Brandon Y. Wong Ting Zhang Shunying Yu Guan Ning Lin Xianting Ding |
author_sort |
Xiao Liu |
title |
A comparison framework and guideline of clustering methods for mass cytometry data |
title_short |
A comparison framework and guideline of clustering methods for mass cytometry data |
title_full |
A comparison framework and guideline of clustering methods for mass cytometry data |
title_fullStr |
A comparison framework and guideline of clustering methods for mass cytometry data |
title_full_unstemmed |
A comparison framework and guideline of clustering methods for mass cytometry data |
title_sort |
comparison framework and guideline of clustering methods for mass cytometry data |
publisher |
BMC |
series |
Genome Biology |
issn |
1474-760X |
publishDate |
2019-12-01 |
description |
Abstract Background With the expanding applications of mass cytometry in medical research, a wide variety of clustering methods, both semi-supervised and unsupervised, have been developed for data analysis. Selecting the optimal clustering method can accelerate the identification of meaningful cell populations. Result To address this issue, we compared three classes of performance measures, “precision” as external evaluation, “coherence” as internal evaluation, and stability, of nine methods based on six independent benchmark datasets. Seven unsupervised methods (Accense, Xshift, PhenoGraph, FlowSOM, flowMeans, DEPECHE, and kmeans) and two semi-supervised methods (Automated Cell-type Discovery and Classification and linear discriminant analysis (LDA)) are tested on six mass cytometry datasets. We compute and compare all defined performance measures against random subsampling, varying sample sizes, and the number of clusters for each method. LDA reproduces the manual labels most precisely but does not rank top in internal evaluation. PhenoGraph and FlowSOM perform better than other unsupervised tools in precision, coherence, and stability. PhenoGraph and Xshift are more robust when detecting refined sub-clusters, whereas DEPECHE and FlowSOM tend to group similar clusters into meta-clusters. The performances of PhenoGraph, Xshift, and flowMeans are impacted by increased sample size, but FlowSOM is relatively stable as sample size increases. Conclusion All the evaluations including precision, coherence, stability, and clustering resolution should be taken into synthetic consideration when choosing an appropriate tool for cytometry data analysis. Thus, we provide decision guidelines based on these characteristics for the general reader to more easily choose the most suitable clustering tools. |
topic |
Mass cytometry CyTOF Cell population Clustering tools Comparison |
url |
https://doi.org/10.1186/s13059-019-1917-7 |
work_keys_str_mv |
AT xiaoliu acomparisonframeworkandguidelineofclusteringmethodsformasscytometrydata AT weichensong acomparisonframeworkandguidelineofclusteringmethodsformasscytometrydata AT brandonywong acomparisonframeworkandguidelineofclusteringmethodsformasscytometrydata AT tingzhang acomparisonframeworkandguidelineofclusteringmethodsformasscytometrydata AT shunyingyu acomparisonframeworkandguidelineofclusteringmethodsformasscytometrydata AT guanninglin acomparisonframeworkandguidelineofclusteringmethodsformasscytometrydata AT xiantingding acomparisonframeworkandguidelineofclusteringmethodsformasscytometrydata AT xiaoliu comparisonframeworkandguidelineofclusteringmethodsformasscytometrydata AT weichensong comparisonframeworkandguidelineofclusteringmethodsformasscytometrydata AT brandonywong comparisonframeworkandguidelineofclusteringmethodsformasscytometrydata AT tingzhang comparisonframeworkandguidelineofclusteringmethodsformasscytometrydata AT shunyingyu comparisonframeworkandguidelineofclusteringmethodsformasscytometrydata AT guanninglin comparisonframeworkandguidelineofclusteringmethodsformasscytometrydata AT xiantingding comparisonframeworkandguidelineofclusteringmethodsformasscytometrydata |
_version_ |
1724369142763487232 |