An improved approach for reconstructing consensus repeats from short sequence reads

Abstract Background Repeat elements are important components of most eukaryotic genomes. Most existing tools for repeat analysis rely either on high quality reference genomes or existing repeat libraries. Thus, it is still challenging to do repeat analysis for species with highly repetitive or compl...

Full description

Bibliographic Details
Main Authors:	Chong Chu, Jingwen Pei, Yufeng Wu
Format:	Article
Language:	English
Published:	BMC 2018-08-01
Series:	BMC Genomics
Subjects:	Repeat elements De novo genome assembly Sequence analysis
Online Access:	http://link.springer.com/article/10.1186/s12864-018-4920-6

id	doaj-d5863ee682f0410aa13169a544fd0b62
record_format	Article
spelling	doaj-d5863ee682f0410aa13169a544fd0b622020-11-24T22:14:36ZengBMCBMC Genomics1471-21642018-08-0119S691710.1186/s12864-018-4920-6An improved approach for reconstructing consensus repeats from short sequence readsChong Chu0Jingwen Pei1Yufeng Wu2Department of Biomedical Informatics, Harvard Medical SchoolDepartment of Computer Science and Engineering, University of ConnecticutDepartment of Computer Science and Engineering, University of ConnecticutAbstract Background Repeat elements are important components of most eukaryotic genomes. Most existing tools for repeat analysis rely either on high quality reference genomes or existing repeat libraries. Thus, it is still challenging to do repeat analysis for species with highly repetitive or complex genomes which often do not have good reference genomes or annotated repeat libraries. Recently we developed a computational method called REPdenovo that constructs consensus repeat sequences directly from short sequence reads, which outperforms an existing tool called RepARK. One major issue with REPdenovo is that it doesn’t perform well for repeats with relatively high divergence rates or low copy numbers. In this paper, we present an improved approach for constructing consensus repeats directly from short reads. Comparing with the original REPdenovo, the improved approach uses more repeat-related k-mers and improves repeat assembly quality using a consensus-based k-mer processing method. Results We compare the performance of the new method with REPdenovo and RepARK on Human, Arabidopsis thaliana and Drosophila melanogaster short sequencing data. And the new method fully constructs more repeats in Repbase than the original REPdenovo and RepARK, especially for repeats of higher divergence rates and lower copy number. We also apply our new method on Hummingbird data which doesn’t have a known repeat library, and it constructs many repeat elements that can be validated using PacBio long reads. Conclusion We propose an improved method for reconstructing repeat elements directly from short sequence reads. The results show that our new method can assemble more complete repeats than REPdenovo (and also RepARK). Our new approach has been implemented as part of the REPdenovo software package, which is available for download at https://github.com/Reedwarbler/REPdenovo.http://link.springer.com/article/10.1186/s12864-018-4920-6Repeat elementsDe novo genome assemblySequence analysis
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Chong Chu Jingwen Pei Yufeng Wu
spellingShingle	Chong Chu Jingwen Pei Yufeng Wu An improved approach for reconstructing consensus repeats from short sequence reads BMC Genomics Repeat elements De novo genome assembly Sequence analysis
author_facet	Chong Chu Jingwen Pei Yufeng Wu
author_sort	Chong Chu
title	An improved approach for reconstructing consensus repeats from short sequence reads
title_short	An improved approach for reconstructing consensus repeats from short sequence reads
title_full	An improved approach for reconstructing consensus repeats from short sequence reads
title_fullStr	An improved approach for reconstructing consensus repeats from short sequence reads
title_full_unstemmed	An improved approach for reconstructing consensus repeats from short sequence reads
title_sort	improved approach for reconstructing consensus repeats from short sequence reads
publisher	BMC
series	BMC Genomics
issn	1471-2164
publishDate	2018-08-01
description	Abstract Background Repeat elements are important components of most eukaryotic genomes. Most existing tools for repeat analysis rely either on high quality reference genomes or existing repeat libraries. Thus, it is still challenging to do repeat analysis for species with highly repetitive or complex genomes which often do not have good reference genomes or annotated repeat libraries. Recently we developed a computational method called REPdenovo that constructs consensus repeat sequences directly from short sequence reads, which outperforms an existing tool called RepARK. One major issue with REPdenovo is that it doesn’t perform well for repeats with relatively high divergence rates or low copy numbers. In this paper, we present an improved approach for constructing consensus repeats directly from short reads. Comparing with the original REPdenovo, the improved approach uses more repeat-related k-mers and improves repeat assembly quality using a consensus-based k-mer processing method. Results We compare the performance of the new method with REPdenovo and RepARK on Human, Arabidopsis thaliana and Drosophila melanogaster short sequencing data. And the new method fully constructs more repeats in Repbase than the original REPdenovo and RepARK, especially for repeats of higher divergence rates and lower copy number. We also apply our new method on Hummingbird data which doesn’t have a known repeat library, and it constructs many repeat elements that can be validated using PacBio long reads. Conclusion We propose an improved method for reconstructing repeat elements directly from short sequence reads. The results show that our new method can assemble more complete repeats than REPdenovo (and also RepARK). Our new approach has been implemented as part of the REPdenovo software package, which is available for download at https://github.com/Reedwarbler/REPdenovo.
topic	Repeat elements De novo genome assembly Sequence analysis
url	http://link.springer.com/article/10.1186/s12864-018-4920-6
work_keys_str_mv	AT chongchu animprovedapproachforreconstructingconsensusrepeatsfromshortsequencereads AT jingwenpei animprovedapproachforreconstructingconsensusrepeatsfromshortsequencereads AT yufengwu animprovedapproachforreconstructingconsensusrepeatsfromshortsequencereads AT chongchu improvedapproachforreconstructingconsensusrepeatsfromshortsequencereads AT jingwenpei improvedapproachforreconstructingconsensusrepeatsfromshortsequencereads AT yufengwu improvedapproachforreconstructingconsensusrepeatsfromshortsequencereads
_version_	1725797999413559296

An improved approach for reconstructing consensus repeats from short sequence reads

Similar Items