Dynamic construction of pan-genome subgraphs

Marcus et al. (Bioinformatics 2014) proposed to use a compressed de Bruijn graph as a description of a pan-genome, comprising the genomes of many individuals/strains of the same or closely related species. Subsequent work improved the construction of the compressed de Bruijn graph in terms of run-ti...

Full description

Bibliographic Details
Main Authors: Dede Kadir, Ohlebusch Enno
Format: Article
Language:English
Published: De Gruyter 2020-04-01
Series:Open Computer Science
Subjects:
Online Access:https://doi.org/10.1515/comp-2020-0018
id doaj-0159213ca2d640b8ada2d0be7b299629
record_format Article
spelling doaj-0159213ca2d640b8ada2d0be7b2996292021-09-06T19:19:43ZengDe GruyterOpen Computer Science2299-10932020-04-01101829610.1515/comp-2020-0018comp-2020-0018Dynamic construction of pan-genome subgraphsDede Kadir0Ohlebusch Enno1Institute of Theoretical Computer Science, Ulm University, D-89069 Ulm, GermanyInstitute of Theoretical Computer Science, Ulm University, D-89069 Ulm, GermanyMarcus et al. (Bioinformatics 2014) proposed to use a compressed de Bruijn graph as a description of a pan-genome, comprising the genomes of many individuals/strains of the same or closely related species. Subsequent work improved the construction of the compressed de Bruijn graph in terms of run-time and memory consumption. According to the Computational Pan-Genomics Consortium (Briefings in Bioinformatics 2016), a pan-genome data structure should support the following functionality: “All information within a data structure should be easily accessible for human eyes by visualization support on different scales.” However, a pan-genome graph can have thousands to millions of nodes and such an amount of information is certainly not easily accessible for human eyes. Thus, the possibility to construct pangenome subgraphs on demand would be quite valuable. In this article, we use the space-efficient representation of the compressed de Bruijn graph devised by Beller and Ohle-busch (Algorithms for Molecular Biology 2016) to construct pan-genome subgraphs on the fly. The user can specify a region in one of the genomes and the software tool will build a subgraph that contains the path corresponding to that region and all paths that are in the neighborhood of that path. The size of the neighborhood can be controlled by the user.https://doi.org/10.1515/comp-2020-0018compressed de bruijn graphburrows-wheeler transformbackward searchpan-genome analysis
collection DOAJ
language English
format Article
sources DOAJ
author Dede Kadir
Ohlebusch Enno
spellingShingle Dede Kadir
Ohlebusch Enno
Dynamic construction of pan-genome subgraphs
Open Computer Science
compressed de bruijn graph
burrows-wheeler transform
backward search
pan-genome analysis
author_facet Dede Kadir
Ohlebusch Enno
author_sort Dede Kadir
title Dynamic construction of pan-genome subgraphs
title_short Dynamic construction of pan-genome subgraphs
title_full Dynamic construction of pan-genome subgraphs
title_fullStr Dynamic construction of pan-genome subgraphs
title_full_unstemmed Dynamic construction of pan-genome subgraphs
title_sort dynamic construction of pan-genome subgraphs
publisher De Gruyter
series Open Computer Science
issn 2299-1093
publishDate 2020-04-01
description Marcus et al. (Bioinformatics 2014) proposed to use a compressed de Bruijn graph as a description of a pan-genome, comprising the genomes of many individuals/strains of the same or closely related species. Subsequent work improved the construction of the compressed de Bruijn graph in terms of run-time and memory consumption. According to the Computational Pan-Genomics Consortium (Briefings in Bioinformatics 2016), a pan-genome data structure should support the following functionality: “All information within a data structure should be easily accessible for human eyes by visualization support on different scales.” However, a pan-genome graph can have thousands to millions of nodes and such an amount of information is certainly not easily accessible for human eyes. Thus, the possibility to construct pangenome subgraphs on demand would be quite valuable. In this article, we use the space-efficient representation of the compressed de Bruijn graph devised by Beller and Ohle-busch (Algorithms for Molecular Biology 2016) to construct pan-genome subgraphs on the fly. The user can specify a region in one of the genomes and the software tool will build a subgraph that contains the path corresponding to that region and all paths that are in the neighborhood of that path. The size of the neighborhood can be controlled by the user.
topic compressed de bruijn graph
burrows-wheeler transform
backward search
pan-genome analysis
url https://doi.org/10.1515/comp-2020-0018
work_keys_str_mv AT dedekadir dynamicconstructionofpangenomesubgraphs
AT ohlebuschenno dynamicconstructionofpangenomesubgraphs
_version_ 1717777904957390848