Buffering updates enables efficient dynamic de Bruijn graphs

Motivation: The de Bruijn graph has become a ubiquitous graph model for biological data ever since its initial introduction in the late 1990s. It has been used for a variety of purposes including genome assembly (Zerbino and Birney, 2008; Bankevich et al., 2012; Peng et al., 2012), variant detection...

Full description

Bibliographic Details
Main Authors: Jarno Alanko, Bahar Alipanahi, Jonathen Settle, Christina Boucher, Travis Gagie
Format: Article
Language:English
Published: Elsevier 2021-01-01
Series:Computational and Structural Biotechnology Journal
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2001037021002853
id doaj-d09103a281304b56a7b353dbcea9853d
record_format Article
spelling doaj-d09103a281304b56a7b353dbcea9853d2021-07-25T04:42:45ZengElsevierComputational and Structural Biotechnology Journal2001-03702021-01-011940674078Buffering updates enables efficient dynamic de Bruijn graphsJarno Alanko0Bahar Alipanahi1Jonathen Settle2Christina Boucher3Travis Gagie4Department of Computer Science, University of Helsinki, Helsinki, Finland; Faculty of Computer Science, Dalhousie University, Halifax, Canada; Corresponding author.Department of Computer and Information Science and Engineering, University of Florida, Gainesville, FL, USADepartment of Computer and Information Science and Engineering, University of Florida, Gainesville, FL, USADepartment of Computer and Information Science and Engineering, University of Florida, Gainesville, FL, USADepartment of Computer Science, University of Helsinki, Helsinki, FinlandMotivation: The de Bruijn graph has become a ubiquitous graph model for biological data ever since its initial introduction in the late 1990s. It has been used for a variety of purposes including genome assembly (Zerbino and Birney, 2008; Bankevich et al., 2012; Peng et al., 2012), variant detection (Alipanahi et al., 2020b; Iqbal et al., 2012), and storage of assembled genomes (Chikhi et al., 2016). For this reason, there have been over a dozen methods for building and representing the de Bruijn graph and its variants in a space and time efficient manner. Results: With the exception of a few data structures (Muggli et al., 2019; Holley and Melsted, 2020; Crawford et al.,2018), compressed and compact de Bruijn graphs do not allow for the graph to be efficiently updated, meaning that data can be added or deleted. The most recent compressed dynamic de Bruijn graph (Alipanahi et al., 2020a), relies on dynamic bit vectors which are slow in theory and practice. To address this shortcoming, we present a compressed dynamic de Bruijn graph that removes the necessity of dynamic bit vectors by buffering data that should be added or removed from the graph. We implement our method, which we refer to as BufBOSS, and compare its performance to Bifrost, DynamicBOSS, and FDBG. Our experiments demonstrate that BufBOSS achieves attractive trade-offs compared to other tools in terms of time, memory and disk, and has the best deletion performance by an order of magnitude.http://www.sciencedirect.com/science/article/pii/S2001037021002853de Bruijn graphDynamic data structuresSuccinct data structuresBurrows-Wheeler transform
collection DOAJ
language English
format Article
sources DOAJ
author Jarno Alanko
Bahar Alipanahi
Jonathen Settle
Christina Boucher
Travis Gagie
spellingShingle Jarno Alanko
Bahar Alipanahi
Jonathen Settle
Christina Boucher
Travis Gagie
Buffering updates enables efficient dynamic de Bruijn graphs
Computational and Structural Biotechnology Journal
de Bruijn graph
Dynamic data structures
Succinct data structures
Burrows-Wheeler transform
author_facet Jarno Alanko
Bahar Alipanahi
Jonathen Settle
Christina Boucher
Travis Gagie
author_sort Jarno Alanko
title Buffering updates enables efficient dynamic de Bruijn graphs
title_short Buffering updates enables efficient dynamic de Bruijn graphs
title_full Buffering updates enables efficient dynamic de Bruijn graphs
title_fullStr Buffering updates enables efficient dynamic de Bruijn graphs
title_full_unstemmed Buffering updates enables efficient dynamic de Bruijn graphs
title_sort buffering updates enables efficient dynamic de bruijn graphs
publisher Elsevier
series Computational and Structural Biotechnology Journal
issn 2001-0370
publishDate 2021-01-01
description Motivation: The de Bruijn graph has become a ubiquitous graph model for biological data ever since its initial introduction in the late 1990s. It has been used for a variety of purposes including genome assembly (Zerbino and Birney, 2008; Bankevich et al., 2012; Peng et al., 2012), variant detection (Alipanahi et al., 2020b; Iqbal et al., 2012), and storage of assembled genomes (Chikhi et al., 2016). For this reason, there have been over a dozen methods for building and representing the de Bruijn graph and its variants in a space and time efficient manner. Results: With the exception of a few data structures (Muggli et al., 2019; Holley and Melsted, 2020; Crawford et al.,2018), compressed and compact de Bruijn graphs do not allow for the graph to be efficiently updated, meaning that data can be added or deleted. The most recent compressed dynamic de Bruijn graph (Alipanahi et al., 2020a), relies on dynamic bit vectors which are slow in theory and practice. To address this shortcoming, we present a compressed dynamic de Bruijn graph that removes the necessity of dynamic bit vectors by buffering data that should be added or removed from the graph. We implement our method, which we refer to as BufBOSS, and compare its performance to Bifrost, DynamicBOSS, and FDBG. Our experiments demonstrate that BufBOSS achieves attractive trade-offs compared to other tools in terms of time, memory and disk, and has the best deletion performance by an order of magnitude.
topic de Bruijn graph
Dynamic data structures
Succinct data structures
Burrows-Wheeler transform
url http://www.sciencedirect.com/science/article/pii/S2001037021002853
work_keys_str_mv AT jarnoalanko bufferingupdatesenablesefficientdynamicdebruijngraphs
AT baharalipanahi bufferingupdatesenablesefficientdynamicdebruijngraphs
AT jonathensettle bufferingupdatesenablesefficientdynamicdebruijngraphs
AT christinaboucher bufferingupdatesenablesefficientdynamicdebruijngraphs
AT travisgagie bufferingupdatesenablesefficientdynamicdebruijngraphs
_version_ 1721283816440463360