Short paired-end reads trump long single-end reads for expression analysis

Abstract Background Typical experimental design advice for expression analyses using RNA-seq generally assumes that single-end reads provide robust gene-level expression estimates in a cost-effective manner, and that the additional benefits obtained from paired-end sequencing are not worth the addit...

Full description

Bibliographic Details
Main Authors:	Adam H. Freedman, John M. Gaspar, Timothy B. Sackton
Format:	Article
Language:	English
Published:	BMC 2020-04-01
Series:	BMC Bioinformatics
Subjects:	RNA-seq Short read sequencing Differential expression
Online Access:	http://link.springer.com/article/10.1186/s12859-020-3484-z

id	doaj-ec4b6f7fa9644b899767bee2b0221432
record_format	Article
spelling	doaj-ec4b6f7fa9644b899767bee2b02214322020-11-25T03:09:13ZengBMCBMC Bioinformatics1471-21052020-04-0121111110.1186/s12859-020-3484-zShort paired-end reads trump long single-end reads for expression analysisAdam H. Freedman0John M. Gaspar1Timothy B. Sackton2Informatics Group, Harvard UniversityInformatics Group, Harvard UniversityInformatics Group, Harvard UniversityAbstract Background Typical experimental design advice for expression analyses using RNA-seq generally assumes that single-end reads provide robust gene-level expression estimates in a cost-effective manner, and that the additional benefits obtained from paired-end sequencing are not worth the additional cost. However, in many cases (e.g., with Illumina NextSeq and NovaSeq instruments), shorter paired-end reads and longer single-end reads can be generated for the same cost, and it is not obvious which strategy should be preferred. Using publicly available data, we test whether short-paired end reads can achieve more robust expression estimates and differential expression results than single-end reads of approximately the same total number of sequenced bases. Results At both the transcript and gene levels, 2 × 40 paired-end reads unequivocally provide expression estimates that are more highly correlated with 2 × 125 than 1 × 75 reads; in nearly all cases, those correlations are also greater than for 1 × 125, despite the greater total number of sequenced bases for the latter. Across an array of metrics, differential expression tests based upon 2 × 40 consistently outperform those using 1 × 75. Conclusion Researchers seeking a cost-effective approach for gene-level expression analysis should prefer short paired-end reads over a longer single-end strategy. Short paired-end reads will also give reasonably robust expression estimates and differential expression results at the isoform level.http://link.springer.com/article/10.1186/s12859-020-3484-zRNA-seqShort read sequencingDifferential expression
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Adam H. Freedman John M. Gaspar Timothy B. Sackton
spellingShingle	Adam H. Freedman John M. Gaspar Timothy B. Sackton Short paired-end reads trump long single-end reads for expression analysis BMC Bioinformatics RNA-seq Short read sequencing Differential expression
author_facet	Adam H. Freedman John M. Gaspar Timothy B. Sackton
author_sort	Adam H. Freedman
title	Short paired-end reads trump long single-end reads for expression analysis
title_short	Short paired-end reads trump long single-end reads for expression analysis
title_full	Short paired-end reads trump long single-end reads for expression analysis
title_fullStr	Short paired-end reads trump long single-end reads for expression analysis
title_full_unstemmed	Short paired-end reads trump long single-end reads for expression analysis
title_sort	short paired-end reads trump long single-end reads for expression analysis
publisher	BMC
series	BMC Bioinformatics
issn	1471-2105
publishDate	2020-04-01
description	Abstract Background Typical experimental design advice for expression analyses using RNA-seq generally assumes that single-end reads provide robust gene-level expression estimates in a cost-effective manner, and that the additional benefits obtained from paired-end sequencing are not worth the additional cost. However, in many cases (e.g., with Illumina NextSeq and NovaSeq instruments), shorter paired-end reads and longer single-end reads can be generated for the same cost, and it is not obvious which strategy should be preferred. Using publicly available data, we test whether short-paired end reads can achieve more robust expression estimates and differential expression results than single-end reads of approximately the same total number of sequenced bases. Results At both the transcript and gene levels, 2 × 40 paired-end reads unequivocally provide expression estimates that are more highly correlated with 2 × 125 than 1 × 75 reads; in nearly all cases, those correlations are also greater than for 1 × 125, despite the greater total number of sequenced bases for the latter. Across an array of metrics, differential expression tests based upon 2 × 40 consistently outperform those using 1 × 75. Conclusion Researchers seeking a cost-effective approach for gene-level expression analysis should prefer short paired-end reads over a longer single-end strategy. Short paired-end reads will also give reasonably robust expression estimates and differential expression results at the isoform level.
topic	RNA-seq Short read sequencing Differential expression
url	http://link.springer.com/article/10.1186/s12859-020-3484-z
work_keys_str_mv	AT adamhfreedman shortpairedendreadstrumplongsingleendreadsforexpressionanalysis AT johnmgaspar shortpairedendreadstrumplongsingleendreadsforexpressionanalysis AT timothybsackton shortpairedendreadstrumplongsingleendreadsforexpressionanalysis
_version_	1724663862479814656

Short paired-end reads trump long single-end reads for expression analysis

Similar Items