Joint between-sample normalization and differential expression detection through ℓ 0-regularized regression

Abstract Background A fundamental problem in RNA-seq data analysis is to identify genes or exons that are differentially expressed with varying experimental conditions based on the read counts. The relativeness of RNA-seq measurements makes the between-sample normalization of read counts an essentia...

Full description

Bibliographic Details
Main Authors: Kefei Liu, Li Shen, Hui Jiang
Format: Article
Language:English
Published: BMC 2019-12-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-019-3070-4
id doaj-544a8b0107044ba58007ab48f434a2f1
record_format Article
spelling doaj-544a8b0107044ba58007ab48f434a2f12020-12-06T12:56:41ZengBMCBMC Bioinformatics1471-21052019-12-0120S1611610.1186/s12859-019-3070-4Joint between-sample normalization and differential expression detection through ℓ 0-regularized regressionKefei Liu0Li Shen1Hui Jiang2Department of Biostatistics, Epidemiology and Informatics, University of PennsylvaniaDepartment of Biostatistics, Epidemiology and Informatics, University of PennsylvaniaDepartment of Biostatistics, University of MichiganAbstract Background A fundamental problem in RNA-seq data analysis is to identify genes or exons that are differentially expressed with varying experimental conditions based on the read counts. The relativeness of RNA-seq measurements makes the between-sample normalization of read counts an essential step in differential expression (DE) analysis. In most existing methods, the normalization step is performed prior to the DE analysis. Recently, Jiang and Zhan proposed a statistical method which introduces sample-specific normalization parameters into a joint model, which allows for simultaneous normalization and differential expression analysis from log-transformed RNA-seq data. Furthermore, an ℓ 0 penalty is used to yield a sparse solution which selects a subset of DE genes. The experimental conditions are restricted to be categorical in their work. Results In this paper, we generalize Jiang and Zhan’s method to handle experimental conditions that are measured in continuous variables. As a result, genes with expression levels associated with a single or multiple covariates can be detected. As the problem being high-dimensional, non-differentiable and non-convex, we develop an efficient algorithm for model fitting. Conclusions Experiments on synthetic data demonstrate that the proposed method outperforms existing methods in terms of detection accuracy when a large fraction of genes are differentially expressed in an asymmetric manner, and the performance gain becomes more substantial for larger sample sizes. We also apply our method to a real prostate cancer RNA-seq dataset to identify genes associated with pre-operative prostate-specific antigen (PSA) levels in patients.https://doi.org/10.1186/s12859-019-3070-4Differential expressionBetween-sample normalizationℓ 0-regularized regressionRNA-seq
collection DOAJ
language English
format Article
sources DOAJ
author Kefei Liu
Li Shen
Hui Jiang
spellingShingle Kefei Liu
Li Shen
Hui Jiang
Joint between-sample normalization and differential expression detection through ℓ 0-regularized regression
BMC Bioinformatics
Differential expression
Between-sample normalization
ℓ 0-regularized regression
RNA-seq
author_facet Kefei Liu
Li Shen
Hui Jiang
author_sort Kefei Liu
title Joint between-sample normalization and differential expression detection through ℓ 0-regularized regression
title_short Joint between-sample normalization and differential expression detection through ℓ 0-regularized regression
title_full Joint between-sample normalization and differential expression detection through ℓ 0-regularized regression
title_fullStr Joint between-sample normalization and differential expression detection through ℓ 0-regularized regression
title_full_unstemmed Joint between-sample normalization and differential expression detection through ℓ 0-regularized regression
title_sort joint between-sample normalization and differential expression detection through ℓ 0-regularized regression
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2019-12-01
description Abstract Background A fundamental problem in RNA-seq data analysis is to identify genes or exons that are differentially expressed with varying experimental conditions based on the read counts. The relativeness of RNA-seq measurements makes the between-sample normalization of read counts an essential step in differential expression (DE) analysis. In most existing methods, the normalization step is performed prior to the DE analysis. Recently, Jiang and Zhan proposed a statistical method which introduces sample-specific normalization parameters into a joint model, which allows for simultaneous normalization and differential expression analysis from log-transformed RNA-seq data. Furthermore, an ℓ 0 penalty is used to yield a sparse solution which selects a subset of DE genes. The experimental conditions are restricted to be categorical in their work. Results In this paper, we generalize Jiang and Zhan’s method to handle experimental conditions that are measured in continuous variables. As a result, genes with expression levels associated with a single or multiple covariates can be detected. As the problem being high-dimensional, non-differentiable and non-convex, we develop an efficient algorithm for model fitting. Conclusions Experiments on synthetic data demonstrate that the proposed method outperforms existing methods in terms of detection accuracy when a large fraction of genes are differentially expressed in an asymmetric manner, and the performance gain becomes more substantial for larger sample sizes. We also apply our method to a real prostate cancer RNA-seq dataset to identify genes associated with pre-operative prostate-specific antigen (PSA) levels in patients.
topic Differential expression
Between-sample normalization
ℓ 0-regularized regression
RNA-seq
url https://doi.org/10.1186/s12859-019-3070-4
work_keys_str_mv AT kefeiliu jointbetweensamplenormalizationanddifferentialexpressiondetectionthroughl0regularizedregression
AT lishen jointbetweensamplenormalizationanddifferentialexpressiondetectionthroughl0regularizedregression
AT huijiang jointbetweensamplenormalizationanddifferentialexpressiondetectionthroughl0regularizedregression
_version_ 1724398398247796736