Evaluation of some aspects in supervised cell type identification for single-cell RNA-seq: classifier, feature selection, and reference construction

Abstract Background Cell type identification is one of the most important questions in single-cell RNA sequencing (scRNA-seq) data analysis. With the accumulation of public scRNA-seq data, supervised cell type identification methods have gained increasing popularity due to better accuracy, robustnes...

Full description

Bibliographic Details
Main Authors: Wenjing Ma, Kenong Su, Hao Wu
Format: Article
Language:English
Published: BMC 2021-09-01
Series:Genome Biology
Subjects:
Online Access:https://doi.org/10.1186/s13059-021-02480-2
id doaj-1cefabdcd5c84096bd25d56141822cf4
record_format Article
spelling doaj-1cefabdcd5c84096bd25d56141822cf42021-09-12T12:03:19ZengBMCGenome Biology1474-760X2021-09-0122112310.1186/s13059-021-02480-2Evaluation of some aspects in supervised cell type identification for single-cell RNA-seq: classifier, feature selection, and reference constructionWenjing Ma0Kenong Su1Hao Wu2Department of Computer Science, Emory UniversityDepartment of Computer Science, Emory UniversityDepartment of Computer Science, Emory UniversityAbstract Background Cell type identification is one of the most important questions in single-cell RNA sequencing (scRNA-seq) data analysis. With the accumulation of public scRNA-seq data, supervised cell type identification methods have gained increasing popularity due to better accuracy, robustness, and computational performance. Despite all the advantages, the performance of the supervised methods relies heavily on several key factors: feature selection, prediction method, and, most importantly, choice of the reference dataset. Results In this work, we perform extensive real data analyses to systematically evaluate these strategies in supervised cell identification. We first benchmark nine classifiers along with six feature selection strategies and investigate the impact of reference data size and number of cell types in cell type prediction. Next, we focus on how discrepancies between reference and target datasets and how data preprocessing such as imputation and batch effect correction affect prediction performance. We also investigate the strategies of pooling and purifying reference data. Conclusions Based on our analysis results, we provide guidelines for using supervised cell typing methods. We suggest combining all individuals from available datasets to construct the reference dataset and use multi-layer perceptron (MLP) as the classifier, along with F-test as the feature selection method. All the code used for our analysis is available on GitHub ( https://github.com/marvinquiet/RefConstruction_supervisedCelltyping ).https://doi.org/10.1186/s13059-021-02480-2Supervised cell typingReference dataset constructionscRNA-seq
collection DOAJ
language English
format Article
sources DOAJ
author Wenjing Ma
Kenong Su
Hao Wu
spellingShingle Wenjing Ma
Kenong Su
Hao Wu
Evaluation of some aspects in supervised cell type identification for single-cell RNA-seq: classifier, feature selection, and reference construction
Genome Biology
Supervised cell typing
Reference dataset construction
scRNA-seq
author_facet Wenjing Ma
Kenong Su
Hao Wu
author_sort Wenjing Ma
title Evaluation of some aspects in supervised cell type identification for single-cell RNA-seq: classifier, feature selection, and reference construction
title_short Evaluation of some aspects in supervised cell type identification for single-cell RNA-seq: classifier, feature selection, and reference construction
title_full Evaluation of some aspects in supervised cell type identification for single-cell RNA-seq: classifier, feature selection, and reference construction
title_fullStr Evaluation of some aspects in supervised cell type identification for single-cell RNA-seq: classifier, feature selection, and reference construction
title_full_unstemmed Evaluation of some aspects in supervised cell type identification for single-cell RNA-seq: classifier, feature selection, and reference construction
title_sort evaluation of some aspects in supervised cell type identification for single-cell rna-seq: classifier, feature selection, and reference construction
publisher BMC
series Genome Biology
issn 1474-760X
publishDate 2021-09-01
description Abstract Background Cell type identification is one of the most important questions in single-cell RNA sequencing (scRNA-seq) data analysis. With the accumulation of public scRNA-seq data, supervised cell type identification methods have gained increasing popularity due to better accuracy, robustness, and computational performance. Despite all the advantages, the performance of the supervised methods relies heavily on several key factors: feature selection, prediction method, and, most importantly, choice of the reference dataset. Results In this work, we perform extensive real data analyses to systematically evaluate these strategies in supervised cell identification. We first benchmark nine classifiers along with six feature selection strategies and investigate the impact of reference data size and number of cell types in cell type prediction. Next, we focus on how discrepancies between reference and target datasets and how data preprocessing such as imputation and batch effect correction affect prediction performance. We also investigate the strategies of pooling and purifying reference data. Conclusions Based on our analysis results, we provide guidelines for using supervised cell typing methods. We suggest combining all individuals from available datasets to construct the reference dataset and use multi-layer perceptron (MLP) as the classifier, along with F-test as the feature selection method. All the code used for our analysis is available on GitHub ( https://github.com/marvinquiet/RefConstruction_supervisedCelltyping ).
topic Supervised cell typing
Reference dataset construction
scRNA-seq
url https://doi.org/10.1186/s13059-021-02480-2
work_keys_str_mv AT wenjingma evaluationofsomeaspectsinsupervisedcelltypeidentificationforsinglecellrnaseqclassifierfeatureselectionandreferenceconstruction
AT kenongsu evaluationofsomeaspectsinsupervisedcelltypeidentificationforsinglecellrnaseqclassifierfeatureselectionandreferenceconstruction
AT haowu evaluationofsomeaspectsinsupervisedcelltypeidentificationforsinglecellrnaseqclassifierfeatureselectionandreferenceconstruction
_version_ 1717755222355345408