Benchmarking RNA-seq differential expression analysis methods using spike-in and simulation data.
Benchmarking RNA-seq differential expression analysis methods using spike-in and simulated RNA-seq data has often yielded inconsistent results. The spike-in data, which were generated from the same bulk RNA sample, only represent technical variability, making the test results less reliable. We compa...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Public Library of Science (PLoS)
2020-01-01
|
Series: | PLoS ONE |
Online Access: | https://doi.org/10.1371/journal.pone.0232271 |
id |
doaj-9c7bb2824caf4272a209149bd3fb087a |
---|---|
record_format |
Article |
spelling |
doaj-9c7bb2824caf4272a209149bd3fb087a2021-03-03T21:42:47ZengPublic Library of Science (PLoS)PLoS ONE1932-62032020-01-01154e023227110.1371/journal.pone.0232271Benchmarking RNA-seq differential expression analysis methods using spike-in and simulation data.Bukyung BaikSora YoonDougu NamBenchmarking RNA-seq differential expression analysis methods using spike-in and simulated RNA-seq data has often yielded inconsistent results. The spike-in data, which were generated from the same bulk RNA sample, only represent technical variability, making the test results less reliable. We compared the performance of 12 differential expression analysis methods for RNA-seq data, including recent variants in widely used software packages, using both RNA spike-in and simulation data for negative binomial (NB) model. Performance of edgeR, DESeq2, and ROTS was particularly different between the two benchmark tests. Then, each method was tested under most extensive simulation conditions especially demonstrating the large impacts of proportion, dispersion, and balance of differentially expressed (DE) genes. DESeq2, a robust version of edgeR (edgeR.rb), voom with TMM normalization (voom.tmm) and sample weights (voom.sw) showed an overall good performance regardless of presence of outliers and proportion of DE genes. The performance of RNA-seq DE gene analysis methods substantially depended on the benchmark used. Based on the simulation results, suitable methods were suggested under various test conditions.https://doi.org/10.1371/journal.pone.0232271 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Bukyung Baik Sora Yoon Dougu Nam |
spellingShingle |
Bukyung Baik Sora Yoon Dougu Nam Benchmarking RNA-seq differential expression analysis methods using spike-in and simulation data. PLoS ONE |
author_facet |
Bukyung Baik Sora Yoon Dougu Nam |
author_sort |
Bukyung Baik |
title |
Benchmarking RNA-seq differential expression analysis methods using spike-in and simulation data. |
title_short |
Benchmarking RNA-seq differential expression analysis methods using spike-in and simulation data. |
title_full |
Benchmarking RNA-seq differential expression analysis methods using spike-in and simulation data. |
title_fullStr |
Benchmarking RNA-seq differential expression analysis methods using spike-in and simulation data. |
title_full_unstemmed |
Benchmarking RNA-seq differential expression analysis methods using spike-in and simulation data. |
title_sort |
benchmarking rna-seq differential expression analysis methods using spike-in and simulation data. |
publisher |
Public Library of Science (PLoS) |
series |
PLoS ONE |
issn |
1932-6203 |
publishDate |
2020-01-01 |
description |
Benchmarking RNA-seq differential expression analysis methods using spike-in and simulated RNA-seq data has often yielded inconsistent results. The spike-in data, which were generated from the same bulk RNA sample, only represent technical variability, making the test results less reliable. We compared the performance of 12 differential expression analysis methods for RNA-seq data, including recent variants in widely used software packages, using both RNA spike-in and simulation data for negative binomial (NB) model. Performance of edgeR, DESeq2, and ROTS was particularly different between the two benchmark tests. Then, each method was tested under most extensive simulation conditions especially demonstrating the large impacts of proportion, dispersion, and balance of differentially expressed (DE) genes. DESeq2, a robust version of edgeR (edgeR.rb), voom with TMM normalization (voom.tmm) and sample weights (voom.sw) showed an overall good performance regardless of presence of outliers and proportion of DE genes. The performance of RNA-seq DE gene analysis methods substantially depended on the benchmark used. Based on the simulation results, suitable methods were suggested under various test conditions. |
url |
https://doi.org/10.1371/journal.pone.0232271 |
work_keys_str_mv |
AT bukyungbaik benchmarkingrnaseqdifferentialexpressionanalysismethodsusingspikeinandsimulationdata AT sorayoon benchmarkingrnaseqdifferentialexpressionanalysismethodsusingspikeinandsimulationdata AT dougunam benchmarkingrnaseqdifferentialexpressionanalysismethodsusingspikeinandsimulationdata |
_version_ |
1714815541502279680 |