Short paired-end reads trump long single-end reads for expression analysis
Abstract Background Typical experimental design advice for expression analyses using RNA-seq generally assumes that single-end reads provide robust gene-level expression estimates in a cost-effective manner, and that the additional benefits obtained from paired-end sequencing are not worth the addit...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2020-04-01
|
Series: | BMC Bioinformatics |
Subjects: | |
Online Access: | http://link.springer.com/article/10.1186/s12859-020-3484-z |
id |
doaj-ec4b6f7fa9644b899767bee2b0221432 |
---|---|
record_format |
Article |
spelling |
doaj-ec4b6f7fa9644b899767bee2b02214322020-11-25T03:09:13ZengBMCBMC Bioinformatics1471-21052020-04-0121111110.1186/s12859-020-3484-zShort paired-end reads trump long single-end reads for expression analysisAdam H. Freedman0John M. Gaspar1Timothy B. Sackton2Informatics Group, Harvard UniversityInformatics Group, Harvard UniversityInformatics Group, Harvard UniversityAbstract Background Typical experimental design advice for expression analyses using RNA-seq generally assumes that single-end reads provide robust gene-level expression estimates in a cost-effective manner, and that the additional benefits obtained from paired-end sequencing are not worth the additional cost. However, in many cases (e.g., with Illumina NextSeq and NovaSeq instruments), shorter paired-end reads and longer single-end reads can be generated for the same cost, and it is not obvious which strategy should be preferred. Using publicly available data, we test whether short-paired end reads can achieve more robust expression estimates and differential expression results than single-end reads of approximately the same total number of sequenced bases. Results At both the transcript and gene levels, 2 × 40 paired-end reads unequivocally provide expression estimates that are more highly correlated with 2 × 125 than 1 × 75 reads; in nearly all cases, those correlations are also greater than for 1 × 125, despite the greater total number of sequenced bases for the latter. Across an array of metrics, differential expression tests based upon 2 × 40 consistently outperform those using 1 × 75. Conclusion Researchers seeking a cost-effective approach for gene-level expression analysis should prefer short paired-end reads over a longer single-end strategy. Short paired-end reads will also give reasonably robust expression estimates and differential expression results at the isoform level.http://link.springer.com/article/10.1186/s12859-020-3484-zRNA-seqShort read sequencingDifferential expression |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Adam H. Freedman John M. Gaspar Timothy B. Sackton |
spellingShingle |
Adam H. Freedman John M. Gaspar Timothy B. Sackton Short paired-end reads trump long single-end reads for expression analysis BMC Bioinformatics RNA-seq Short read sequencing Differential expression |
author_facet |
Adam H. Freedman John M. Gaspar Timothy B. Sackton |
author_sort |
Adam H. Freedman |
title |
Short paired-end reads trump long single-end reads for expression analysis |
title_short |
Short paired-end reads trump long single-end reads for expression analysis |
title_full |
Short paired-end reads trump long single-end reads for expression analysis |
title_fullStr |
Short paired-end reads trump long single-end reads for expression analysis |
title_full_unstemmed |
Short paired-end reads trump long single-end reads for expression analysis |
title_sort |
short paired-end reads trump long single-end reads for expression analysis |
publisher |
BMC |
series |
BMC Bioinformatics |
issn |
1471-2105 |
publishDate |
2020-04-01 |
description |
Abstract Background Typical experimental design advice for expression analyses using RNA-seq generally assumes that single-end reads provide robust gene-level expression estimates in a cost-effective manner, and that the additional benefits obtained from paired-end sequencing are not worth the additional cost. However, in many cases (e.g., with Illumina NextSeq and NovaSeq instruments), shorter paired-end reads and longer single-end reads can be generated for the same cost, and it is not obvious which strategy should be preferred. Using publicly available data, we test whether short-paired end reads can achieve more robust expression estimates and differential expression results than single-end reads of approximately the same total number of sequenced bases. Results At both the transcript and gene levels, 2 × 40 paired-end reads unequivocally provide expression estimates that are more highly correlated with 2 × 125 than 1 × 75 reads; in nearly all cases, those correlations are also greater than for 1 × 125, despite the greater total number of sequenced bases for the latter. Across an array of metrics, differential expression tests based upon 2 × 40 consistently outperform those using 1 × 75. Conclusion Researchers seeking a cost-effective approach for gene-level expression analysis should prefer short paired-end reads over a longer single-end strategy. Short paired-end reads will also give reasonably robust expression estimates and differential expression results at the isoform level. |
topic |
RNA-seq Short read sequencing Differential expression |
url |
http://link.springer.com/article/10.1186/s12859-020-3484-z |
work_keys_str_mv |
AT adamhfreedman shortpairedendreadstrumplongsingleendreadsforexpressionanalysis AT johnmgaspar shortpairedendreadstrumplongsingleendreadsforexpressionanalysis AT timothybsackton shortpairedendreadstrumplongsingleendreadsforexpressionanalysis |
_version_ |
1724663862479814656 |