Resolving repeat families with long reads

Abstract Background Draft quality genomes for a multitude of organisms have become common due to the advancement of genome assemblers using long-read technologies with high error rates. Although current assemblies are substantially more contiguous than assemblies based on short reads, complete chrom...

Full description

Bibliographic Details
Main Author: Philipp Bongartz
Format: Article
Language:English
Published: BMC 2019-05-01
Series:BMC Bioinformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12859-019-2807-4
id doaj-13ee9a6752f845dcb0045cc266f0dfd5
record_format Article
spelling doaj-13ee9a6752f845dcb0045cc266f0dfd52020-11-25T03:36:45ZengBMCBMC Bioinformatics1471-21052019-05-0120111110.1186/s12859-019-2807-4Resolving repeat families with long readsPhilipp Bongartz0Heidelberg Institute for Theoretical StudiesAbstract Background Draft quality genomes for a multitude of organisms have become common due to the advancement of genome assemblers using long-read technologies with high error rates. Although current assemblies are substantially more contiguous than assemblies based on short reads, complete chromosomal assemblies are still challenging. Interspersed repeat families with multiple copy versions dominate the contig and scaffold ends of current long-read assemblies for complex genomes. These repeat families generally remain unresolved, as existing algorithmic solutions either do not scale to large copy numbers or can not handle the current high read error rates. Results We propose novel repeat resolution methods for large interspersed repeat families and assess their accuracy on simulated data sets with various distinct repeat structures and on drosophila melanogaster transposons. Additionally, we compare our methods to an existing long read repeat resolution tool and show the improved accuracy of our method. Conclusions Our results demonstrate the applicability of our methods for the improvement of the contiguity of genome assemblies.http://link.springer.com/article/10.1186/s12859-019-2807-4Genome assemblyRepeat familiesRepeat resolution
collection DOAJ
language English
format Article
sources DOAJ
author Philipp Bongartz
spellingShingle Philipp Bongartz
Resolving repeat families with long reads
BMC Bioinformatics
Genome assembly
Repeat families
Repeat resolution
author_facet Philipp Bongartz
author_sort Philipp Bongartz
title Resolving repeat families with long reads
title_short Resolving repeat families with long reads
title_full Resolving repeat families with long reads
title_fullStr Resolving repeat families with long reads
title_full_unstemmed Resolving repeat families with long reads
title_sort resolving repeat families with long reads
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2019-05-01
description Abstract Background Draft quality genomes for a multitude of organisms have become common due to the advancement of genome assemblers using long-read technologies with high error rates. Although current assemblies are substantially more contiguous than assemblies based on short reads, complete chromosomal assemblies are still challenging. Interspersed repeat families with multiple copy versions dominate the contig and scaffold ends of current long-read assemblies for complex genomes. These repeat families generally remain unresolved, as existing algorithmic solutions either do not scale to large copy numbers or can not handle the current high read error rates. Results We propose novel repeat resolution methods for large interspersed repeat families and assess their accuracy on simulated data sets with various distinct repeat structures and on drosophila melanogaster transposons. Additionally, we compare our methods to an existing long read repeat resolution tool and show the improved accuracy of our method. Conclusions Our results demonstrate the applicability of our methods for the improvement of the contiguity of genome assemblies.
topic Genome assembly
Repeat families
Repeat resolution
url http://link.springer.com/article/10.1186/s12859-019-2807-4
work_keys_str_mv AT philippbongartz resolvingrepeatfamilieswithlongreads
_version_ 1724548216808013824