PaSS: a sequencing simulator for PacBio sequencing

Abstract Background Third-generation sequencing platforms, such as PacBio sequencing, have been developed rapidly in recent years. PacBio sequencing generates much longer reads than the second-generation sequencing (or the next generation sequencing, NGS) technologies and it has unique sequencing er...

Full description

Bibliographic Details
Main Authors: Wenmin Zhang, Ben Jia, Chaochun Wei
Format: Article
Language:English
Published: BMC 2019-06-01
Series:BMC Bioinformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12859-019-2901-7
id doaj-31ea5ae4de5c43f2a29f3fee3c9a5958
record_format Article
spelling doaj-31ea5ae4de5c43f2a29f3fee3c9a59582020-11-25T03:15:10ZengBMCBMC Bioinformatics1471-21052019-06-012011710.1186/s12859-019-2901-7PaSS: a sequencing simulator for PacBio sequencingWenmin Zhang0Ben Jia1Chaochun Wei2Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong UniversityDepartment of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong UniversityDepartment of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong UniversityAbstract Background Third-generation sequencing platforms, such as PacBio sequencing, have been developed rapidly in recent years. PacBio sequencing generates much longer reads than the second-generation sequencing (or the next generation sequencing, NGS) technologies and it has unique sequencing error patterns. An effective read simulator is essential to evaluate and promote the development of new bioinformatics tools for PacBio sequencing data analysis. Results We developed a new PacBio Sequencing Simulator (PaSS). It can learn sequence patterns from PacBio sequencing data currently available. In addition to the distribution of read lengths and error rates, we included a context-specific sequencing error model. Compared to existing PacBio sequencing simulators such as PBSIM, LongISLND and NPBSS, PaSS performed better in many aspects. Assembly tests also suggest that reads simulated by PaSS are the most similar to experimental sequencing data. Conclusion PaSS is an effective sequence simulator for PacBio sequencing. It will facilitate the evaluation and development of new analysis tools for the third-generation sequencing data.http://link.springer.com/article/10.1186/s12859-019-2901-7Third generation sequencingNext generation sequencingPacBio sequencingSequencing simulatorSequencing errorSequence pattern
collection DOAJ
language English
format Article
sources DOAJ
author Wenmin Zhang
Ben Jia
Chaochun Wei
spellingShingle Wenmin Zhang
Ben Jia
Chaochun Wei
PaSS: a sequencing simulator for PacBio sequencing
BMC Bioinformatics
Third generation sequencing
Next generation sequencing
PacBio sequencing
Sequencing simulator
Sequencing error
Sequence pattern
author_facet Wenmin Zhang
Ben Jia
Chaochun Wei
author_sort Wenmin Zhang
title PaSS: a sequencing simulator for PacBio sequencing
title_short PaSS: a sequencing simulator for PacBio sequencing
title_full PaSS: a sequencing simulator for PacBio sequencing
title_fullStr PaSS: a sequencing simulator for PacBio sequencing
title_full_unstemmed PaSS: a sequencing simulator for PacBio sequencing
title_sort pass: a sequencing simulator for pacbio sequencing
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2019-06-01
description Abstract Background Third-generation sequencing platforms, such as PacBio sequencing, have been developed rapidly in recent years. PacBio sequencing generates much longer reads than the second-generation sequencing (or the next generation sequencing, NGS) technologies and it has unique sequencing error patterns. An effective read simulator is essential to evaluate and promote the development of new bioinformatics tools for PacBio sequencing data analysis. Results We developed a new PacBio Sequencing Simulator (PaSS). It can learn sequence patterns from PacBio sequencing data currently available. In addition to the distribution of read lengths and error rates, we included a context-specific sequencing error model. Compared to existing PacBio sequencing simulators such as PBSIM, LongISLND and NPBSS, PaSS performed better in many aspects. Assembly tests also suggest that reads simulated by PaSS are the most similar to experimental sequencing data. Conclusion PaSS is an effective sequence simulator for PacBio sequencing. It will facilitate the evaluation and development of new analysis tools for the third-generation sequencing data.
topic Third generation sequencing
Next generation sequencing
PacBio sequencing
Sequencing simulator
Sequencing error
Sequence pattern
url http://link.springer.com/article/10.1186/s12859-019-2901-7
work_keys_str_mv AT wenminzhang passasequencingsimulatorforpacbiosequencing
AT benjia passasequencingsimulatorforpacbiosequencing
AT chaochunwei passasequencingsimulatorforpacbiosequencing
_version_ 1724640236499107840