Short paired-end reads trump long single-end reads for expression analysis

Abstract Background Typical experimental design advice for expression analyses using RNA-seq generally assumes that single-end reads provide robust gene-level expression estimates in a cost-effective manner, and that the additional benefits obtained from paired-end sequencing are not worth the addit...

Full description

Bibliographic Details
Main Authors: Adam H. Freedman, John M. Gaspar, Timothy B. Sackton
Format: Article
Language:English
Published: BMC 2020-04-01
Series:BMC Bioinformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12859-020-3484-z
id doaj-ec4b6f7fa9644b899767bee2b0221432
record_format Article
spelling doaj-ec4b6f7fa9644b899767bee2b02214322020-11-25T03:09:13ZengBMCBMC Bioinformatics1471-21052020-04-0121111110.1186/s12859-020-3484-zShort paired-end reads trump long single-end reads for expression analysisAdam H. Freedman0John M. Gaspar1Timothy B. Sackton2Informatics Group, Harvard UniversityInformatics Group, Harvard UniversityInformatics Group, Harvard UniversityAbstract Background Typical experimental design advice for expression analyses using RNA-seq generally assumes that single-end reads provide robust gene-level expression estimates in a cost-effective manner, and that the additional benefits obtained from paired-end sequencing are not worth the additional cost. However, in many cases (e.g., with Illumina NextSeq and NovaSeq instruments), shorter paired-end reads and longer single-end reads can be generated for the same cost, and it is not obvious which strategy should be preferred. Using publicly available data, we test whether short-paired end reads can achieve more robust expression estimates and differential expression results than single-end reads of approximately the same total number of sequenced bases. Results At both the transcript and gene levels, 2 × 40 paired-end reads unequivocally provide expression estimates that are more highly correlated with 2 × 125 than 1 × 75 reads; in nearly all cases, those correlations are also greater than for 1 × 125, despite the greater total number of sequenced bases for the latter. Across an array of metrics, differential expression tests based upon 2 × 40 consistently outperform those using 1 × 75. Conclusion Researchers seeking a cost-effective approach for gene-level expression analysis should prefer short paired-end reads over a longer single-end strategy. Short paired-end reads will also give reasonably robust expression estimates and differential expression results at the isoform level.http://link.springer.com/article/10.1186/s12859-020-3484-zRNA-seqShort read sequencingDifferential expression
collection DOAJ
language English
format Article
sources DOAJ
author Adam H. Freedman
John M. Gaspar
Timothy B. Sackton
spellingShingle Adam H. Freedman
John M. Gaspar
Timothy B. Sackton
Short paired-end reads trump long single-end reads for expression analysis
BMC Bioinformatics
RNA-seq
Short read sequencing
Differential expression
author_facet Adam H. Freedman
John M. Gaspar
Timothy B. Sackton
author_sort Adam H. Freedman
title Short paired-end reads trump long single-end reads for expression analysis
title_short Short paired-end reads trump long single-end reads for expression analysis
title_full Short paired-end reads trump long single-end reads for expression analysis
title_fullStr Short paired-end reads trump long single-end reads for expression analysis
title_full_unstemmed Short paired-end reads trump long single-end reads for expression analysis
title_sort short paired-end reads trump long single-end reads for expression analysis
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2020-04-01
description Abstract Background Typical experimental design advice for expression analyses using RNA-seq generally assumes that single-end reads provide robust gene-level expression estimates in a cost-effective manner, and that the additional benefits obtained from paired-end sequencing are not worth the additional cost. However, in many cases (e.g., with Illumina NextSeq and NovaSeq instruments), shorter paired-end reads and longer single-end reads can be generated for the same cost, and it is not obvious which strategy should be preferred. Using publicly available data, we test whether short-paired end reads can achieve more robust expression estimates and differential expression results than single-end reads of approximately the same total number of sequenced bases. Results At both the transcript and gene levels, 2 × 40 paired-end reads unequivocally provide expression estimates that are more highly correlated with 2 × 125 than 1 × 75 reads; in nearly all cases, those correlations are also greater than for 1 × 125, despite the greater total number of sequenced bases for the latter. Across an array of metrics, differential expression tests based upon 2 × 40 consistently outperform those using 1 × 75. Conclusion Researchers seeking a cost-effective approach for gene-level expression analysis should prefer short paired-end reads over a longer single-end strategy. Short paired-end reads will also give reasonably robust expression estimates and differential expression results at the isoform level.
topic RNA-seq
Short read sequencing
Differential expression
url http://link.springer.com/article/10.1186/s12859-020-3484-z
work_keys_str_mv AT adamhfreedman shortpairedendreadstrumplongsingleendreadsforexpressionanalysis
AT johnmgaspar shortpairedendreadstrumplongsingleendreadsforexpressionanalysis
AT timothybsackton shortpairedendreadstrumplongsingleendreadsforexpressionanalysis
_version_ 1724663862479814656