GAPPadder: a sensitive approach for closing gaps on draft genomes with short sequence reads

Abstract Background Closing gaps in draft genomes is an important post processing step in genome assembly. It leads to more complete genomes, which benefits downstream genome analysis such as annotation and genotyping. Several tools have been developed for gap closing. However, these tools don’t ful...

Full description

Bibliographic Details
Main Authors: Chong Chu, Xin Li, Yufeng Wu
Format: Article
Language:English
Published: BMC 2019-06-01
Series:BMC Genomics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12864-019-5703-4
id doaj-d5c1b42f93b541bba031caf3394f1b13
record_format Article
spelling doaj-d5c1b42f93b541bba031caf3394f1b132020-11-25T03:21:40ZengBMCBMC Genomics1471-21642019-06-0120S511010.1186/s12864-019-5703-4GAPPadder: a sensitive approach for closing gaps on draft genomes with short sequence readsChong Chu0Xin Li1Yufeng Wu2Dept. of Computer Science and Engineering, University of ConnecticutDept. of Computer Science and Engineering, University of ConnecticutDept. of Computer Science and Engineering, University of ConnecticutAbstract Background Closing gaps in draft genomes is an important post processing step in genome assembly. It leads to more complete genomes, which benefits downstream genome analysis such as annotation and genotyping. Several tools have been developed for gap closing. However, these tools don’t fully utilize the information contained in the sequence data. For example, while it is known that many gaps are caused by genomic repeats, existing tools often ignore many sequence reads that originate from a repeat-related gap. Results We compare GAPPadder with GapCloser, GapFiller and Sealer on one bacterial genome, human chromosome 14 and the human whole genome with paired-end and mate-paired reads with both short and long insert sizes. Empirical results show that GAPPadder can close more gaps than these existing tools. Besides closing gaps on draft genomes assembled only from short sequence reads, GAPPadder can also be used to close gaps for draft genomes assembled with long reads. We show GAPPadder can close gaps on the bed bug genome and the Asian sea bass genome that are assembled partially and fully with long reads respectively. We also show GAPPadder is efficient in both time and memory usage. Conclusion In this paper, we propose a new approach called GAPPadder for gap closing. The main advantage of GAPPadder is that it uses more information in sequence data for gap closing. In particular, GAPPadder finds and uses reads that originate from repeat-related gaps. We show that these repeat-associated reads are useful for gap closing, even though they are ignored by all existing tools. Other main features of GAPPadder include utilizing the information in sequence reads with different insert sizes and performing two-stage local assembly of gap sequences. The results show that our method can close more gaps than several existing tools. The software tool, GAPPadder, is available for download at https://github.com/Reedwarbler/GAPPadder.http://link.springer.com/article/10.1186/s12864-019-5703-4Closing gapsDe novo assemblyRepeat elementsSequencing analysis
collection DOAJ
language English
format Article
sources DOAJ
author Chong Chu
Xin Li
Yufeng Wu
spellingShingle Chong Chu
Xin Li
Yufeng Wu
GAPPadder: a sensitive approach for closing gaps on draft genomes with short sequence reads
BMC Genomics
Closing gaps
De novo assembly
Repeat elements
Sequencing analysis
author_facet Chong Chu
Xin Li
Yufeng Wu
author_sort Chong Chu
title GAPPadder: a sensitive approach for closing gaps on draft genomes with short sequence reads
title_short GAPPadder: a sensitive approach for closing gaps on draft genomes with short sequence reads
title_full GAPPadder: a sensitive approach for closing gaps on draft genomes with short sequence reads
title_fullStr GAPPadder: a sensitive approach for closing gaps on draft genomes with short sequence reads
title_full_unstemmed GAPPadder: a sensitive approach for closing gaps on draft genomes with short sequence reads
title_sort gappadder: a sensitive approach for closing gaps on draft genomes with short sequence reads
publisher BMC
series BMC Genomics
issn 1471-2164
publishDate 2019-06-01
description Abstract Background Closing gaps in draft genomes is an important post processing step in genome assembly. It leads to more complete genomes, which benefits downstream genome analysis such as annotation and genotyping. Several tools have been developed for gap closing. However, these tools don’t fully utilize the information contained in the sequence data. For example, while it is known that many gaps are caused by genomic repeats, existing tools often ignore many sequence reads that originate from a repeat-related gap. Results We compare GAPPadder with GapCloser, GapFiller and Sealer on one bacterial genome, human chromosome 14 and the human whole genome with paired-end and mate-paired reads with both short and long insert sizes. Empirical results show that GAPPadder can close more gaps than these existing tools. Besides closing gaps on draft genomes assembled only from short sequence reads, GAPPadder can also be used to close gaps for draft genomes assembled with long reads. We show GAPPadder can close gaps on the bed bug genome and the Asian sea bass genome that are assembled partially and fully with long reads respectively. We also show GAPPadder is efficient in both time and memory usage. Conclusion In this paper, we propose a new approach called GAPPadder for gap closing. The main advantage of GAPPadder is that it uses more information in sequence data for gap closing. In particular, GAPPadder finds and uses reads that originate from repeat-related gaps. We show that these repeat-associated reads are useful for gap closing, even though they are ignored by all existing tools. Other main features of GAPPadder include utilizing the information in sequence reads with different insert sizes and performing two-stage local assembly of gap sequences. The results show that our method can close more gaps than several existing tools. The software tool, GAPPadder, is available for download at https://github.com/Reedwarbler/GAPPadder.
topic Closing gaps
De novo assembly
Repeat elements
Sequencing analysis
url http://link.springer.com/article/10.1186/s12864-019-5703-4
work_keys_str_mv AT chongchu gappadderasensitiveapproachforclosinggapsondraftgenomeswithshortsequencereads
AT xinli gappadderasensitiveapproachforclosinggapsondraftgenomeswithshortsequencereads
AT yufengwu gappadderasensitiveapproachforclosinggapsondraftgenomeswithshortsequencereads
_version_ 1724613365194555392