Fast read alignment with incorporation of known genomic variants

Abstract Background Many genetic variants have been reported from sequencing projects due to decreasing experimental costs. Compared to the current typical paradigm, read mapping incorporating existing variants can improve the performance of subsequent analysis. This method is supposed to map sequen...

Full description

Bibliographic Details
Main Authors: Hongzhe Guo, Bo Liu, Dengfeng Guan, Yilei Fu, Yadong Wang
Format: Article
Language:English
Published: BMC 2019-12-01
Series:BMC Medical Informatics and Decision Making
Subjects:
Online Access:https://doi.org/10.1186/s12911-019-0960-3
id doaj-a2b46b2f53314a1fbc71cca119c91e50
record_format Article
spelling doaj-a2b46b2f53314a1fbc71cca119c91e502020-12-20T12:35:16ZengBMCBMC Medical Informatics and Decision Making1472-69472019-12-0119S611310.1186/s12911-019-0960-3Fast read alignment with incorporation of known genomic variantsHongzhe Guo0Bo Liu1Dengfeng Guan2Yilei Fu3Yadong Wang4Center for Bioinformatics, Harbin Institute of TechnologyCenter for Bioinformatics, Harbin Institute of TechnologyCenter for Bioinformatics, Harbin Institute of TechnologyCenter for Bioinformatics, Harbin Institute of TechnologyCenter for Bioinformatics, Harbin Institute of TechnologyAbstract Background Many genetic variants have been reported from sequencing projects due to decreasing experimental costs. Compared to the current typical paradigm, read mapping incorporating existing variants can improve the performance of subsequent analysis. This method is supposed to map sequencing reads efficiently to a graphical index with a reference genome and known variation to increase alignment quality and variant calling accuracy. However, storing and indexing various types of variation require costly RAM space. Methods Aligning reads to a graph model-based index including the whole set of variants is ultimately an NP-hard problem in theory. Here, we propose a variation-aware read alignment algorithm (VARA), which generates the alignment between read and multiple genomic sequences simultaneously utilizing the schema of the Landau-Vishkin algorithm. VARA dynamically extracts regional variants to construct a pseudo tree-based structure on-the-fly for seed extension without loading the whole genome variation into memory space. Results We developed the novel high-throughput sequencing read aligner deBGA-VARA by integrating VARA into deBGA. The deBGA-VARA is benchmarked both on simulated reads and the NA12878 sequencing dataset. The experimental results demonstrate that read alignment incorporating genetic variation knowledge can achieve high sensitivity and accuracy. Conclusions Due to its efficiency, VARA provides a promising solution for further improvement of variant calling while maintaining small memory footprints. The deBGA-VARA is available at: https://github.com/hitbc/deBGA-VARA.https://doi.org/10.1186/s12911-019-0960-3Seed-and-extension alignmentLandau-Vishkin algorithmVariation-aware read alignment
collection DOAJ
language English
format Article
sources DOAJ
author Hongzhe Guo
Bo Liu
Dengfeng Guan
Yilei Fu
Yadong Wang
spellingShingle Hongzhe Guo
Bo Liu
Dengfeng Guan
Yilei Fu
Yadong Wang
Fast read alignment with incorporation of known genomic variants
BMC Medical Informatics and Decision Making
Seed-and-extension alignment
Landau-Vishkin algorithm
Variation-aware read alignment
author_facet Hongzhe Guo
Bo Liu
Dengfeng Guan
Yilei Fu
Yadong Wang
author_sort Hongzhe Guo
title Fast read alignment with incorporation of known genomic variants
title_short Fast read alignment with incorporation of known genomic variants
title_full Fast read alignment with incorporation of known genomic variants
title_fullStr Fast read alignment with incorporation of known genomic variants
title_full_unstemmed Fast read alignment with incorporation of known genomic variants
title_sort fast read alignment with incorporation of known genomic variants
publisher BMC
series BMC Medical Informatics and Decision Making
issn 1472-6947
publishDate 2019-12-01
description Abstract Background Many genetic variants have been reported from sequencing projects due to decreasing experimental costs. Compared to the current typical paradigm, read mapping incorporating existing variants can improve the performance of subsequent analysis. This method is supposed to map sequencing reads efficiently to a graphical index with a reference genome and known variation to increase alignment quality and variant calling accuracy. However, storing and indexing various types of variation require costly RAM space. Methods Aligning reads to a graph model-based index including the whole set of variants is ultimately an NP-hard problem in theory. Here, we propose a variation-aware read alignment algorithm (VARA), which generates the alignment between read and multiple genomic sequences simultaneously utilizing the schema of the Landau-Vishkin algorithm. VARA dynamically extracts regional variants to construct a pseudo tree-based structure on-the-fly for seed extension without loading the whole genome variation into memory space. Results We developed the novel high-throughput sequencing read aligner deBGA-VARA by integrating VARA into deBGA. The deBGA-VARA is benchmarked both on simulated reads and the NA12878 sequencing dataset. The experimental results demonstrate that read alignment incorporating genetic variation knowledge can achieve high sensitivity and accuracy. Conclusions Due to its efficiency, VARA provides a promising solution for further improvement of variant calling while maintaining small memory footprints. The deBGA-VARA is available at: https://github.com/hitbc/deBGA-VARA.
topic Seed-and-extension alignment
Landau-Vishkin algorithm
Variation-aware read alignment
url https://doi.org/10.1186/s12911-019-0960-3
work_keys_str_mv AT hongzheguo fastreadalignmentwithincorporationofknowngenomicvariants
AT boliu fastreadalignmentwithincorporationofknowngenomicvariants
AT dengfengguan fastreadalignmentwithincorporationofknowngenomicvariants
AT yileifu fastreadalignmentwithincorporationofknowngenomicvariants
AT yadongwang fastreadalignmentwithincorporationofknowngenomicvariants
_version_ 1724376402976833536