Deterministic column subset selection for single-cell RNA-Seq.

Analysis of single-cell RNA sequencing (scRNA-Seq) data often involves filtering out uninteresting or poorly measured genes and dimensionality reduction to reduce noise and simplify data visualization. However, techniques such as principal components analysis (PCA) fail to preserve non-negativity an...

Full description

Bibliographic Details
Main Authors: Shannon R McCurdy, Vasilis Ntranos, Lior Pachter
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2019-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0210571
id doaj-f518e44c0fcf4bb1a38d1cf02c7dd932
record_format Article
spelling doaj-f518e44c0fcf4bb1a38d1cf02c7dd9322021-03-03T20:56:36ZengPublic Library of Science (PLoS)PLoS ONE1932-62032019-01-01141e021057110.1371/journal.pone.0210571Deterministic column subset selection for single-cell RNA-Seq.Shannon R McCurdyVasilis NtranosLior PachterAnalysis of single-cell RNA sequencing (scRNA-Seq) data often involves filtering out uninteresting or poorly measured genes and dimensionality reduction to reduce noise and simplify data visualization. However, techniques such as principal components analysis (PCA) fail to preserve non-negativity and sparsity structures present in the original matrices, and the coordinates of projected cells are not easily interpretable. Commonly used thresholding methods to filter genes avoid those pitfalls, but ignore collinearity and covariance in the original matrix. We show that a deterministic column subset selection (DCSS) method possesses many of the favorable properties of common thresholding methods and PCA, while avoiding pitfalls from both. We derive new spectral bounds for DCSS. We apply DCSS to two measures of gene expression from two scRNA-Seq experiments with different clustering workflows, and compare to three thresholding methods. In each case study, the clusters based on the small subset of the complete gene expression profile selected by DCSS are similar to clusters produced from the full set. The resulting clusters are informative for cell type.https://doi.org/10.1371/journal.pone.0210571
collection DOAJ
language English
format Article
sources DOAJ
author Shannon R McCurdy
Vasilis Ntranos
Lior Pachter
spellingShingle Shannon R McCurdy
Vasilis Ntranos
Lior Pachter
Deterministic column subset selection for single-cell RNA-Seq.
PLoS ONE
author_facet Shannon R McCurdy
Vasilis Ntranos
Lior Pachter
author_sort Shannon R McCurdy
title Deterministic column subset selection for single-cell RNA-Seq.
title_short Deterministic column subset selection for single-cell RNA-Seq.
title_full Deterministic column subset selection for single-cell RNA-Seq.
title_fullStr Deterministic column subset selection for single-cell RNA-Seq.
title_full_unstemmed Deterministic column subset selection for single-cell RNA-Seq.
title_sort deterministic column subset selection for single-cell rna-seq.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2019-01-01
description Analysis of single-cell RNA sequencing (scRNA-Seq) data often involves filtering out uninteresting or poorly measured genes and dimensionality reduction to reduce noise and simplify data visualization. However, techniques such as principal components analysis (PCA) fail to preserve non-negativity and sparsity structures present in the original matrices, and the coordinates of projected cells are not easily interpretable. Commonly used thresholding methods to filter genes avoid those pitfalls, but ignore collinearity and covariance in the original matrix. We show that a deterministic column subset selection (DCSS) method possesses many of the favorable properties of common thresholding methods and PCA, while avoiding pitfalls from both. We derive new spectral bounds for DCSS. We apply DCSS to two measures of gene expression from two scRNA-Seq experiments with different clustering workflows, and compare to three thresholding methods. In each case study, the clusters based on the small subset of the complete gene expression profile selected by DCSS are similar to clusters produced from the full set. The resulting clusters are informative for cell type.
url https://doi.org/10.1371/journal.pone.0210571
work_keys_str_mv AT shannonrmccurdy deterministiccolumnsubsetselectionforsinglecellrnaseq
AT vasilisntranos deterministiccolumnsubsetselectionforsinglecellrnaseq
AT liorpachter deterministiccolumnsubsetselectionforsinglecellrnaseq
_version_ 1714819712252116992