Reference flow: reducing reference bias using multiple population genomes

Abstract Most sequencing data analyses start by aligning sequencing reads to a linear reference genome, but failure to account for genetic variation leads to reference bias and confounding of results downstream. Other approaches replace the linear reference with structures like graphs that can inclu...

Full description

Bibliographic Details
Main Authors: Nae-Chyun Chen, Brad Solomon, Taher Mun, Sheila Iyer, Ben Langmead
Format: Article
Language:English
Published: BMC 2021-01-01
Series:Genome Biology
Online Access:https://doi.org/10.1186/s13059-020-02229-3
id doaj-6e4b79f80f3144f289b5c65dca87d376
record_format Article
spelling doaj-6e4b79f80f3144f289b5c65dca87d3762021-01-10T12:58:51ZengBMCGenome Biology1474-760X2021-01-0122111710.1186/s13059-020-02229-3Reference flow: reducing reference bias using multiple population genomesNae-Chyun Chen0Brad Solomon1Taher Mun2Sheila Iyer3Ben Langmead4Department of Computer Science, Johns Hopkins UniversityDepartment of Computer Science, Johns Hopkins UniversityDepartment of Computer Science, Johns Hopkins UniversityDepartment of Computer Science, Johns Hopkins UniversityDepartment of Computer Science, Johns Hopkins UniversityAbstract Most sequencing data analyses start by aligning sequencing reads to a linear reference genome, but failure to account for genetic variation leads to reference bias and confounding of results downstream. Other approaches replace the linear reference with structures like graphs that can include genetic variation, incurring major computational overhead. We propose the reference flow alignment method that uses multiple population reference genomes to improve alignment accuracy and reduce reference bias. Compared to the graph aligner vg, reference flow achieves a similar level of accuracy and bias avoidance but with 14% of the memory footprint and 5.5 times the speed.https://doi.org/10.1186/s13059-020-02229-3
collection DOAJ
language English
format Article
sources DOAJ
author Nae-Chyun Chen
Brad Solomon
Taher Mun
Sheila Iyer
Ben Langmead
spellingShingle Nae-Chyun Chen
Brad Solomon
Taher Mun
Sheila Iyer
Ben Langmead
Reference flow: reducing reference bias using multiple population genomes
Genome Biology
author_facet Nae-Chyun Chen
Brad Solomon
Taher Mun
Sheila Iyer
Ben Langmead
author_sort Nae-Chyun Chen
title Reference flow: reducing reference bias using multiple population genomes
title_short Reference flow: reducing reference bias using multiple population genomes
title_full Reference flow: reducing reference bias using multiple population genomes
title_fullStr Reference flow: reducing reference bias using multiple population genomes
title_full_unstemmed Reference flow: reducing reference bias using multiple population genomes
title_sort reference flow: reducing reference bias using multiple population genomes
publisher BMC
series Genome Biology
issn 1474-760X
publishDate 2021-01-01
description Abstract Most sequencing data analyses start by aligning sequencing reads to a linear reference genome, but failure to account for genetic variation leads to reference bias and confounding of results downstream. Other approaches replace the linear reference with structures like graphs that can include genetic variation, incurring major computational overhead. We propose the reference flow alignment method that uses multiple population reference genomes to improve alignment accuracy and reduce reference bias. Compared to the graph aligner vg, reference flow achieves a similar level of accuracy and bias avoidance but with 14% of the memory footprint and 5.5 times the speed.
url https://doi.org/10.1186/s13059-020-02229-3
work_keys_str_mv AT naechyunchen referenceflowreducingreferencebiasusingmultiplepopulationgenomes
AT bradsolomon referenceflowreducingreferencebiasusingmultiplepopulationgenomes
AT tahermun referenceflowreducingreferencebiasusingmultiplepopulationgenomes
AT sheilaiyer referenceflowreducingreferencebiasusingmultiplepopulationgenomes
AT benlangmead referenceflowreducingreferencebiasusingmultiplepopulationgenomes
_version_ 1724341981164863488