A component overlapping attribute clustering (COAC) algorithm for single-cell RNA sequencing data analysis and potential pathobiological implications.

Recent advances in next-generation sequencing and computational technologies have enabled routine analysis of large-scale single-cell ribonucleic acid sequencing (scRNA-seq) data. However, scRNA-seq technologies have suffered from several technical challenges, including low mean expression levels in...

Full description

Bibliographic Details
Main Authors: He Peng, Xiangxiang Zeng, Yadi Zhou, Defu Zhang, Ruth Nussinov, Feixiong Cheng
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2019-02-01
Series:PLoS Computational Biology
Online Access:https://doi.org/10.1371/journal.pcbi.1006772
id doaj-ce9a9621fac84bf5bdb5bc48566889e6
record_format Article
spelling doaj-ce9a9621fac84bf5bdb5bc48566889e62021-06-19T05:31:33ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582019-02-01152e100677210.1371/journal.pcbi.1006772A component overlapping attribute clustering (COAC) algorithm for single-cell RNA sequencing data analysis and potential pathobiological implications.He PengXiangxiang ZengYadi ZhouDefu ZhangRuth NussinovFeixiong ChengRecent advances in next-generation sequencing and computational technologies have enabled routine analysis of large-scale single-cell ribonucleic acid sequencing (scRNA-seq) data. However, scRNA-seq technologies have suffered from several technical challenges, including low mean expression levels in most genes and higher frequencies of missing data than bulk population sequencing technologies. Identifying functional gene sets and their regulatory networks that link specific cell types to human diseases and therapeutics from scRNA-seq profiles are daunting tasks. In this study, we developed a Component Overlapping Attribute Clustering (COAC) algorithm to perform the localized (cell subpopulation) gene co-expression network analysis from large-scale scRNA-seq profiles. Gene subnetworks that represent specific gene co-expression patterns are inferred from the components of a decomposed matrix of scRNA-seq profiles. We showed that single-cell gene subnetworks identified by COAC from multiple time points within cell phases can be used for cell type identification with high accuracy (83%). In addition, COAC-inferred subnetworks from melanoma patients' scRNA-seq profiles are highly correlated with survival rate from The Cancer Genome Atlas (TCGA). Moreover, the localized gene subnetworks identified by COAC from individual patients' scRNA-seq data can be used as pharmacogenomics biomarkers to predict drug responses (The area under the receiver operating characteristic curves ranges from 0.728 to 0.783) in cancer cell lines from the Genomics of Drug Sensitivity in Cancer (GDSC) database. In summary, COAC offers a powerful tool to identify potential network-based diagnostic and pharmacogenomics biomarkers from large-scale scRNA-seq profiles. COAC is freely available at https://github.com/ChengF-Lab/COAC.https://doi.org/10.1371/journal.pcbi.1006772
collection DOAJ
language English
format Article
sources DOAJ
author He Peng
Xiangxiang Zeng
Yadi Zhou
Defu Zhang
Ruth Nussinov
Feixiong Cheng
spellingShingle He Peng
Xiangxiang Zeng
Yadi Zhou
Defu Zhang
Ruth Nussinov
Feixiong Cheng
A component overlapping attribute clustering (COAC) algorithm for single-cell RNA sequencing data analysis and potential pathobiological implications.
PLoS Computational Biology
author_facet He Peng
Xiangxiang Zeng
Yadi Zhou
Defu Zhang
Ruth Nussinov
Feixiong Cheng
author_sort He Peng
title A component overlapping attribute clustering (COAC) algorithm for single-cell RNA sequencing data analysis and potential pathobiological implications.
title_short A component overlapping attribute clustering (COAC) algorithm for single-cell RNA sequencing data analysis and potential pathobiological implications.
title_full A component overlapping attribute clustering (COAC) algorithm for single-cell RNA sequencing data analysis and potential pathobiological implications.
title_fullStr A component overlapping attribute clustering (COAC) algorithm for single-cell RNA sequencing data analysis and potential pathobiological implications.
title_full_unstemmed A component overlapping attribute clustering (COAC) algorithm for single-cell RNA sequencing data analysis and potential pathobiological implications.
title_sort component overlapping attribute clustering (coac) algorithm for single-cell rna sequencing data analysis and potential pathobiological implications.
publisher Public Library of Science (PLoS)
series PLoS Computational Biology
issn 1553-734X
1553-7358
publishDate 2019-02-01
description Recent advances in next-generation sequencing and computational technologies have enabled routine analysis of large-scale single-cell ribonucleic acid sequencing (scRNA-seq) data. However, scRNA-seq technologies have suffered from several technical challenges, including low mean expression levels in most genes and higher frequencies of missing data than bulk population sequencing technologies. Identifying functional gene sets and their regulatory networks that link specific cell types to human diseases and therapeutics from scRNA-seq profiles are daunting tasks. In this study, we developed a Component Overlapping Attribute Clustering (COAC) algorithm to perform the localized (cell subpopulation) gene co-expression network analysis from large-scale scRNA-seq profiles. Gene subnetworks that represent specific gene co-expression patterns are inferred from the components of a decomposed matrix of scRNA-seq profiles. We showed that single-cell gene subnetworks identified by COAC from multiple time points within cell phases can be used for cell type identification with high accuracy (83%). In addition, COAC-inferred subnetworks from melanoma patients' scRNA-seq profiles are highly correlated with survival rate from The Cancer Genome Atlas (TCGA). Moreover, the localized gene subnetworks identified by COAC from individual patients' scRNA-seq data can be used as pharmacogenomics biomarkers to predict drug responses (The area under the receiver operating characteristic curves ranges from 0.728 to 0.783) in cancer cell lines from the Genomics of Drug Sensitivity in Cancer (GDSC) database. In summary, COAC offers a powerful tool to identify potential network-based diagnostic and pharmacogenomics biomarkers from large-scale scRNA-seq profiles. COAC is freely available at https://github.com/ChengF-Lab/COAC.
url https://doi.org/10.1371/journal.pcbi.1006772
work_keys_str_mv AT hepeng acomponentoverlappingattributeclusteringcoacalgorithmforsinglecellrnasequencingdataanalysisandpotentialpathobiologicalimplications
AT xiangxiangzeng acomponentoverlappingattributeclusteringcoacalgorithmforsinglecellrnasequencingdataanalysisandpotentialpathobiologicalimplications
AT yadizhou acomponentoverlappingattributeclusteringcoacalgorithmforsinglecellrnasequencingdataanalysisandpotentialpathobiologicalimplications
AT defuzhang acomponentoverlappingattributeclusteringcoacalgorithmforsinglecellrnasequencingdataanalysisandpotentialpathobiologicalimplications
AT ruthnussinov acomponentoverlappingattributeclusteringcoacalgorithmforsinglecellrnasequencingdataanalysisandpotentialpathobiologicalimplications
AT feixiongcheng acomponentoverlappingattributeclusteringcoacalgorithmforsinglecellrnasequencingdataanalysisandpotentialpathobiologicalimplications
AT hepeng componentoverlappingattributeclusteringcoacalgorithmforsinglecellrnasequencingdataanalysisandpotentialpathobiologicalimplications
AT xiangxiangzeng componentoverlappingattributeclusteringcoacalgorithmforsinglecellrnasequencingdataanalysisandpotentialpathobiologicalimplications
AT yadizhou componentoverlappingattributeclusteringcoacalgorithmforsinglecellrnasequencingdataanalysisandpotentialpathobiologicalimplications
AT defuzhang componentoverlappingattributeclusteringcoacalgorithmforsinglecellrnasequencingdataanalysisandpotentialpathobiologicalimplications
AT ruthnussinov componentoverlappingattributeclusteringcoacalgorithmforsinglecellrnasequencingdataanalysisandpotentialpathobiologicalimplications
AT feixiongcheng componentoverlappingattributeclusteringcoacalgorithmforsinglecellrnasequencingdataanalysisandpotentialpathobiologicalimplications
_version_ 1721371296195936256