Autoencoder-based cluster ensembles for single-cell RNA-seq data analysis

Abstract Background Single-cell RNA-sequencing (scRNA-seq) is a transformative technology, allowing global transcriptomes of individual cells to be profiled with high accuracy. An essential task in scRNA-seq data analysis is the identification of cell types from complex samples or tissues profiled i...

Full description

Bibliographic Details
Main Authors: Thomas A. Geddes, Taiyun Kim, Lihao Nan, James G. Burchfield, Jean Y. H. Yang, Dacheng Tao, Pengyi Yang
Format: Article
Language:English
Published: BMC 2019-12-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-019-3179-5
id doaj-55a0ffe541ab4e0eaf79be974cf0a9ba
record_format Article
spelling doaj-55a0ffe541ab4e0eaf79be974cf0a9ba2020-12-27T12:21:31ZengBMCBMC Bioinformatics1471-21052019-12-0120S1911110.1186/s12859-019-3179-5Autoencoder-based cluster ensembles for single-cell RNA-seq data analysisThomas A. Geddes0Taiyun Kim1Lihao Nan2James G. Burchfield3Jean Y. H. Yang4Dacheng Tao5Pengyi Yang6Charles Perkins Centre, School of Mathematics and Statistics, Faculty of Science, The University of SydneyCharles Perkins Centre, School of Mathematics and Statistics, Faculty of Science, The University of SydneyUBTECH Sydney Artificial Intelligence Centre and the School of Computer Science, Faculty of Engineering and Information Technologies, The University of SydneyCharles Perkins Centre, School of Life and Environmental Sciences, Faculty of Science, The University of SydneyCharles Perkins Centre, School of Mathematics and Statistics, Faculty of Science, The University of SydneyUBTECH Sydney Artificial Intelligence Centre and the School of Computer Science, Faculty of Engineering and Information Technologies, The University of SydneyCharles Perkins Centre, School of Mathematics and Statistics, Faculty of Science, The University of SydneyAbstract Background Single-cell RNA-sequencing (scRNA-seq) is a transformative technology, allowing global transcriptomes of individual cells to be profiled with high accuracy. An essential task in scRNA-seq data analysis is the identification of cell types from complex samples or tissues profiled in an experiment. To this end, clustering has become a key computational technique for grouping cells based on their transcriptome profiles, enabling subsequent cell type identification from each cluster of cells. Due to the high feature-dimensionality of the transcriptome (i.e. the large number of measured genes in each cell) and because only a small fraction of genes are cell type-specific and therefore informative for generating cell type-specific clusters, clustering directly on the original feature/gene dimension may lead to uninformative clusters and hinder correct cell type identification. Results Here, we propose an autoencoder-based cluster ensemble framework in which we first take random subspace projections from the data, then compress each random projection to a low-dimensional space using an autoencoder artificial neural network, and finally apply ensemble clustering across all encoded datasets to generate clusters of cells. We employ four evaluation metrics to benchmark clustering performance and our experiments demonstrate that the proposed autoencoder-based cluster ensemble can lead to substantially improved cell type-specific clusters when applied with both the standard k-means clustering algorithm and a state-of-the-art kernel-based clustering algorithm (SIMLR) designed specifically for scRNA-seq data. Compared to directly using these clustering algorithms on the original datasets, the performance improvement in some cases is up to 100%, depending on the evaluation metric used. Conclusions Our results suggest that the proposed framework can facilitate more accurate cell type identification as well as other downstream analyses. The code for creating the proposed autoencoder-based cluster ensemble framework is freely available from https://github.com/gedcom/scCCESShttps://doi.org/10.1186/s12859-019-3179-5AutoencoderCluster ensembleSingle cellsscRNA-seqSingle-cell transcriptomeCell type identification
collection DOAJ
language English
format Article
sources DOAJ
author Thomas A. Geddes
Taiyun Kim
Lihao Nan
James G. Burchfield
Jean Y. H. Yang
Dacheng Tao
Pengyi Yang
spellingShingle Thomas A. Geddes
Taiyun Kim
Lihao Nan
James G. Burchfield
Jean Y. H. Yang
Dacheng Tao
Pengyi Yang
Autoencoder-based cluster ensembles for single-cell RNA-seq data analysis
BMC Bioinformatics
Autoencoder
Cluster ensemble
Single cells
scRNA-seq
Single-cell transcriptome
Cell type identification
author_facet Thomas A. Geddes
Taiyun Kim
Lihao Nan
James G. Burchfield
Jean Y. H. Yang
Dacheng Tao
Pengyi Yang
author_sort Thomas A. Geddes
title Autoencoder-based cluster ensembles for single-cell RNA-seq data analysis
title_short Autoencoder-based cluster ensembles for single-cell RNA-seq data analysis
title_full Autoencoder-based cluster ensembles for single-cell RNA-seq data analysis
title_fullStr Autoencoder-based cluster ensembles for single-cell RNA-seq data analysis
title_full_unstemmed Autoencoder-based cluster ensembles for single-cell RNA-seq data analysis
title_sort autoencoder-based cluster ensembles for single-cell rna-seq data analysis
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2019-12-01
description Abstract Background Single-cell RNA-sequencing (scRNA-seq) is a transformative technology, allowing global transcriptomes of individual cells to be profiled with high accuracy. An essential task in scRNA-seq data analysis is the identification of cell types from complex samples or tissues profiled in an experiment. To this end, clustering has become a key computational technique for grouping cells based on their transcriptome profiles, enabling subsequent cell type identification from each cluster of cells. Due to the high feature-dimensionality of the transcriptome (i.e. the large number of measured genes in each cell) and because only a small fraction of genes are cell type-specific and therefore informative for generating cell type-specific clusters, clustering directly on the original feature/gene dimension may lead to uninformative clusters and hinder correct cell type identification. Results Here, we propose an autoencoder-based cluster ensemble framework in which we first take random subspace projections from the data, then compress each random projection to a low-dimensional space using an autoencoder artificial neural network, and finally apply ensemble clustering across all encoded datasets to generate clusters of cells. We employ four evaluation metrics to benchmark clustering performance and our experiments demonstrate that the proposed autoencoder-based cluster ensemble can lead to substantially improved cell type-specific clusters when applied with both the standard k-means clustering algorithm and a state-of-the-art kernel-based clustering algorithm (SIMLR) designed specifically for scRNA-seq data. Compared to directly using these clustering algorithms on the original datasets, the performance improvement in some cases is up to 100%, depending on the evaluation metric used. Conclusions Our results suggest that the proposed framework can facilitate more accurate cell type identification as well as other downstream analyses. The code for creating the proposed autoencoder-based cluster ensemble framework is freely available from https://github.com/gedcom/scCCESS
topic Autoencoder
Cluster ensemble
Single cells
scRNA-seq
Single-cell transcriptome
Cell type identification
url https://doi.org/10.1186/s12859-019-3179-5
work_keys_str_mv AT thomasageddes autoencoderbasedclusterensemblesforsinglecellrnaseqdataanalysis
AT taiyunkim autoencoderbasedclusterensemblesforsinglecellrnaseqdataanalysis
AT lihaonan autoencoderbasedclusterensemblesforsinglecellrnaseqdataanalysis
AT jamesgburchfield autoencoderbasedclusterensemblesforsinglecellrnaseqdataanalysis
AT jeanyhyang autoencoderbasedclusterensemblesforsinglecellrnaseqdataanalysis
AT dachengtao autoencoderbasedclusterensemblesforsinglecellrnaseqdataanalysis
AT pengyiyang autoencoderbasedclusterensemblesforsinglecellrnaseqdataanalysis
_version_ 1724369043066978304