Benchmarking RNA-seq differential expression analysis methods using spike-in and simulation data.

Benchmarking RNA-seq differential expression analysis methods using spike-in and simulated RNA-seq data has often yielded inconsistent results. The spike-in data, which were generated from the same bulk RNA sample, only represent technical variability, making the test results less reliable. We compa...

Full description

Bibliographic Details
Main Authors: Bukyung Baik, Sora Yoon, Dougu Nam
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2020-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0232271
id doaj-9c7bb2824caf4272a209149bd3fb087a
record_format Article
spelling doaj-9c7bb2824caf4272a209149bd3fb087a2021-03-03T21:42:47ZengPublic Library of Science (PLoS)PLoS ONE1932-62032020-01-01154e023227110.1371/journal.pone.0232271Benchmarking RNA-seq differential expression analysis methods using spike-in and simulation data.Bukyung BaikSora YoonDougu NamBenchmarking RNA-seq differential expression analysis methods using spike-in and simulated RNA-seq data has often yielded inconsistent results. The spike-in data, which were generated from the same bulk RNA sample, only represent technical variability, making the test results less reliable. We compared the performance of 12 differential expression analysis methods for RNA-seq data, including recent variants in widely used software packages, using both RNA spike-in and simulation data for negative binomial (NB) model. Performance of edgeR, DESeq2, and ROTS was particularly different between the two benchmark tests. Then, each method was tested under most extensive simulation conditions especially demonstrating the large impacts of proportion, dispersion, and balance of differentially expressed (DE) genes. DESeq2, a robust version of edgeR (edgeR.rb), voom with TMM normalization (voom.tmm) and sample weights (voom.sw) showed an overall good performance regardless of presence of outliers and proportion of DE genes. The performance of RNA-seq DE gene analysis methods substantially depended on the benchmark used. Based on the simulation results, suitable methods were suggested under various test conditions.https://doi.org/10.1371/journal.pone.0232271
collection DOAJ
language English
format Article
sources DOAJ
author Bukyung Baik
Sora Yoon
Dougu Nam
spellingShingle Bukyung Baik
Sora Yoon
Dougu Nam
Benchmarking RNA-seq differential expression analysis methods using spike-in and simulation data.
PLoS ONE
author_facet Bukyung Baik
Sora Yoon
Dougu Nam
author_sort Bukyung Baik
title Benchmarking RNA-seq differential expression analysis methods using spike-in and simulation data.
title_short Benchmarking RNA-seq differential expression analysis methods using spike-in and simulation data.
title_full Benchmarking RNA-seq differential expression analysis methods using spike-in and simulation data.
title_fullStr Benchmarking RNA-seq differential expression analysis methods using spike-in and simulation data.
title_full_unstemmed Benchmarking RNA-seq differential expression analysis methods using spike-in and simulation data.
title_sort benchmarking rna-seq differential expression analysis methods using spike-in and simulation data.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2020-01-01
description Benchmarking RNA-seq differential expression analysis methods using spike-in and simulated RNA-seq data has often yielded inconsistent results. The spike-in data, which were generated from the same bulk RNA sample, only represent technical variability, making the test results less reliable. We compared the performance of 12 differential expression analysis methods for RNA-seq data, including recent variants in widely used software packages, using both RNA spike-in and simulation data for negative binomial (NB) model. Performance of edgeR, DESeq2, and ROTS was particularly different between the two benchmark tests. Then, each method was tested under most extensive simulation conditions especially demonstrating the large impacts of proportion, dispersion, and balance of differentially expressed (DE) genes. DESeq2, a robust version of edgeR (edgeR.rb), voom with TMM normalization (voom.tmm) and sample weights (voom.sw) showed an overall good performance regardless of presence of outliers and proportion of DE genes. The performance of RNA-seq DE gene analysis methods substantially depended on the benchmark used. Based on the simulation results, suitable methods were suggested under various test conditions.
url https://doi.org/10.1371/journal.pone.0232271
work_keys_str_mv AT bukyungbaik benchmarkingrnaseqdifferentialexpressionanalysismethodsusingspikeinandsimulationdata
AT sorayoon benchmarkingrnaseqdifferentialexpressionanalysismethodsusingspikeinandsimulationdata
AT dougunam benchmarkingrnaseqdifferentialexpressionanalysismethodsusingspikeinandsimulationdata
_version_ 1714815541502279680