REPdenovo: Inferring De Novo Repeat Motifs from Short Sequence Reads.

Repeat elements are important components of eukaryotic genomes. One limitation in our understanding of repeat elements is that most analyses rely on reference genomes that are incomplete and often contain missing data in highly repetitive regions that are difficult to assemble. To overcome this prob...

Full description

Bibliographic Details
Main Authors: Chong Chu, Rasmus Nielsen, Yufeng Wu
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2016-01-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC4792456?pdf=render
id doaj-8e871a15769d4c25943e3ff751313c25
record_format Article
spelling doaj-8e871a15769d4c25943e3ff751313c252020-11-25T01:27:29ZengPublic Library of Science (PLoS)PLoS ONE1932-62032016-01-01113e015071910.1371/journal.pone.0150719REPdenovo: Inferring De Novo Repeat Motifs from Short Sequence Reads.Chong ChuRasmus NielsenYufeng WuRepeat elements are important components of eukaryotic genomes. One limitation in our understanding of repeat elements is that most analyses rely on reference genomes that are incomplete and often contain missing data in highly repetitive regions that are difficult to assemble. To overcome this problem we develop a new method, REPdenovo, which assembles repeat sequences directly from raw shotgun sequencing data. REPdenovo can construct various types of repeats that are highly repetitive and have low sequence divergence within copies. We show that REPdenovo is substantially better than existing methods both in terms of the number and the completeness of the repeat sequences that it recovers. The key advantage of REPdenovo is that it can reconstruct long repeats from sequence reads. We apply the method to human data and discover a number of potentially new repeats sequences that have been missed by previous repeat annotations. Many of these sequences are incorporated into various parasite genomes, possibly because the filtering process for host DNA involved in the sequencing of the parasite genomes failed to exclude the host derived repeat sequences. REPdenovo is a new powerful computational tool for annotating genomes and for addressing questions regarding the evolution of repeat families. The software tool, REPdenovo, is available for download at https://github.com/Reedwarbler/REPdenovo.http://europepmc.org/articles/PMC4792456?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author Chong Chu
Rasmus Nielsen
Yufeng Wu
spellingShingle Chong Chu
Rasmus Nielsen
Yufeng Wu
REPdenovo: Inferring De Novo Repeat Motifs from Short Sequence Reads.
PLoS ONE
author_facet Chong Chu
Rasmus Nielsen
Yufeng Wu
author_sort Chong Chu
title REPdenovo: Inferring De Novo Repeat Motifs from Short Sequence Reads.
title_short REPdenovo: Inferring De Novo Repeat Motifs from Short Sequence Reads.
title_full REPdenovo: Inferring De Novo Repeat Motifs from Short Sequence Reads.
title_fullStr REPdenovo: Inferring De Novo Repeat Motifs from Short Sequence Reads.
title_full_unstemmed REPdenovo: Inferring De Novo Repeat Motifs from Short Sequence Reads.
title_sort repdenovo: inferring de novo repeat motifs from short sequence reads.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2016-01-01
description Repeat elements are important components of eukaryotic genomes. One limitation in our understanding of repeat elements is that most analyses rely on reference genomes that are incomplete and often contain missing data in highly repetitive regions that are difficult to assemble. To overcome this problem we develop a new method, REPdenovo, which assembles repeat sequences directly from raw shotgun sequencing data. REPdenovo can construct various types of repeats that are highly repetitive and have low sequence divergence within copies. We show that REPdenovo is substantially better than existing methods both in terms of the number and the completeness of the repeat sequences that it recovers. The key advantage of REPdenovo is that it can reconstruct long repeats from sequence reads. We apply the method to human data and discover a number of potentially new repeats sequences that have been missed by previous repeat annotations. Many of these sequences are incorporated into various parasite genomes, possibly because the filtering process for host DNA involved in the sequencing of the parasite genomes failed to exclude the host derived repeat sequences. REPdenovo is a new powerful computational tool for annotating genomes and for addressing questions regarding the evolution of repeat families. The software tool, REPdenovo, is available for download at https://github.com/Reedwarbler/REPdenovo.
url http://europepmc.org/articles/PMC4792456?pdf=render
work_keys_str_mv AT chongchu repdenovoinferringdenovorepeatmotifsfromshortsequencereads
AT rasmusnielsen repdenovoinferringdenovorepeatmotifsfromshortsequencereads
AT yufengwu repdenovoinferringdenovorepeatmotifsfromshortsequencereads
_version_ 1725105239082663936