A robust semi-supervised NMF model for single cell RNA-seq data

Background Single-cell RNA-sequencing (scRNA-seq) technology is a powerful tool to study organism from a single cell perspective and explore the heterogeneity between cells. Clustering is a fundamental step in scRNA-seq data analysis and it is the key to understand cell function and constitutes the...

Full description

Bibliographic Details
Main Authors: Peng Wu, Mo An, Hai-Ren Zou, Cai-Ying Zhong, Wei Wang, Chang-Peng Wu
Format: Article
Language:English
Published: PeerJ Inc. 2020-10-01
Series:PeerJ
Subjects:
Online Access:https://peerj.com/articles/10091.pdf
id doaj-499670f535b34875bd4e6c93500792e7
record_format Article
spelling doaj-499670f535b34875bd4e6c93500792e72020-11-25T03:53:44ZengPeerJ Inc.PeerJ2167-83592020-10-018e1009110.7717/peerj.10091A robust semi-supervised NMF model for single cell RNA-seq dataPeng Wu0Mo An1Hai-Ren Zou2Cai-Ying Zhong3Wei Wang4Chang-Peng Wu5Department of Neurosurgery, The People’s Hospital of Longhua District, Shenzhen, Guangdong Province, ChinaDepartment of Neurosurgery, The People’s Hospital of Longhua District, Shenzhen, Guangdong Province, ChinaDepartment of Neurosurgery, The People’s Hospital of Longhua District, Shenzhen, Guangdong Province, ChinaDepartment of Neurosurgery, The People’s Hospital of Longhua District, Shenzhen, Guangdong Province, ChinaDepartment of Neurosurgery, The People’s Hospital of Longhua District, Shenzhen, Guangdong Province, ChinaDepartment of Neurosurgery, The People’s Hospital of Longhua District, Shenzhen, Guangdong Province, ChinaBackground Single-cell RNA-sequencing (scRNA-seq) technology is a powerful tool to study organism from a single cell perspective and explore the heterogeneity between cells. Clustering is a fundamental step in scRNA-seq data analysis and it is the key to understand cell function and constitutes the basis of other advanced analysis. Nonnegative Matrix Factorization (NMF) has been widely used in clustering analysis of transcriptome data and achieved good performance. However, the existing NMF model is unsupervised and ignores known gene functions in the process of clustering. Knowledges of cell markers genes (genes that only express in specific cells) in human and model organisms have been accumulated a lot, such as the Molecular Signatures Database (MSigDB), which can be used as prior information in the clustering analysis of scRNA-seq data. Because the same kind of cells is likely to have similar biological functions and specific gene expression patterns, the marker genes of cells can be utilized as prior knowledge in the clustering analysis. Methods We propose a robust and semi-supervised NMF (rssNMF) model, which introduces a new variable to absorb noises of data and incorporates marker genes as prior information into a graph regularization term. We use rssNMF to solve the clustering problem of scRNA-seq data. Results Twelve scRNA-seq datasets with true labels are used to test the model performance and the results illustrate that our model outperforms original NMF and other common methods such as KMeans and Hierarchical Clustering. Biological significance analysis shows that rssNMF can identify key subclasses and latent biological processes. To our knowledge, this study is the first method that incorporates prior knowledge into the clustering analysis of scRNA-seq data.https://peerj.com/articles/10091.pdfSemi-supervisedNMF modelSingle cell RNA-seq
collection DOAJ
language English
format Article
sources DOAJ
author Peng Wu
Mo An
Hai-Ren Zou
Cai-Ying Zhong
Wei Wang
Chang-Peng Wu
spellingShingle Peng Wu
Mo An
Hai-Ren Zou
Cai-Ying Zhong
Wei Wang
Chang-Peng Wu
A robust semi-supervised NMF model for single cell RNA-seq data
PeerJ
Semi-supervised
NMF model
Single cell RNA-seq
author_facet Peng Wu
Mo An
Hai-Ren Zou
Cai-Ying Zhong
Wei Wang
Chang-Peng Wu
author_sort Peng Wu
title A robust semi-supervised NMF model for single cell RNA-seq data
title_short A robust semi-supervised NMF model for single cell RNA-seq data
title_full A robust semi-supervised NMF model for single cell RNA-seq data
title_fullStr A robust semi-supervised NMF model for single cell RNA-seq data
title_full_unstemmed A robust semi-supervised NMF model for single cell RNA-seq data
title_sort robust semi-supervised nmf model for single cell rna-seq data
publisher PeerJ Inc.
series PeerJ
issn 2167-8359
publishDate 2020-10-01
description Background Single-cell RNA-sequencing (scRNA-seq) technology is a powerful tool to study organism from a single cell perspective and explore the heterogeneity between cells. Clustering is a fundamental step in scRNA-seq data analysis and it is the key to understand cell function and constitutes the basis of other advanced analysis. Nonnegative Matrix Factorization (NMF) has been widely used in clustering analysis of transcriptome data and achieved good performance. However, the existing NMF model is unsupervised and ignores known gene functions in the process of clustering. Knowledges of cell markers genes (genes that only express in specific cells) in human and model organisms have been accumulated a lot, such as the Molecular Signatures Database (MSigDB), which can be used as prior information in the clustering analysis of scRNA-seq data. Because the same kind of cells is likely to have similar biological functions and specific gene expression patterns, the marker genes of cells can be utilized as prior knowledge in the clustering analysis. Methods We propose a robust and semi-supervised NMF (rssNMF) model, which introduces a new variable to absorb noises of data and incorporates marker genes as prior information into a graph regularization term. We use rssNMF to solve the clustering problem of scRNA-seq data. Results Twelve scRNA-seq datasets with true labels are used to test the model performance and the results illustrate that our model outperforms original NMF and other common methods such as KMeans and Hierarchical Clustering. Biological significance analysis shows that rssNMF can identify key subclasses and latent biological processes. To our knowledge, this study is the first method that incorporates prior knowledge into the clustering analysis of scRNA-seq data.
topic Semi-supervised
NMF model
Single cell RNA-seq
url https://peerj.com/articles/10091.pdf
work_keys_str_mv AT pengwu arobustsemisupervisednmfmodelforsinglecellrnaseqdata
AT moan arobustsemisupervisednmfmodelforsinglecellrnaseqdata
AT hairenzou arobustsemisupervisednmfmodelforsinglecellrnaseqdata
AT caiyingzhong arobustsemisupervisednmfmodelforsinglecellrnaseqdata
AT weiwang arobustsemisupervisednmfmodelforsinglecellrnaseqdata
AT changpengwu arobustsemisupervisednmfmodelforsinglecellrnaseqdata
AT pengwu robustsemisupervisednmfmodelforsinglecellrnaseqdata
AT moan robustsemisupervisednmfmodelforsinglecellrnaseqdata
AT hairenzou robustsemisupervisednmfmodelforsinglecellrnaseqdata
AT caiyingzhong robustsemisupervisednmfmodelforsinglecellrnaseqdata
AT weiwang robustsemisupervisednmfmodelforsinglecellrnaseqdata
AT changpengwu robustsemisupervisednmfmodelforsinglecellrnaseqdata
_version_ 1724476974708031488