Short Read Alignment Based on Maximal Approximate Match Seeds

Sequence alignment is a critical step in many critical genomic studies, such as variant calling, quantitative transcriptome analysis (RNA-seq), and metagenomic sequence classification. However, the alignment performance is largely affected by repetitive sequences in the reference genome, which exten...

Full description

Bibliographic Details
Main Authors:	Wei Quan, Dengfeng Guan, Guangri Quan, Bo Liu, Yadong Wang
Format:	Article
Language:	English
Published:	Frontiers Media S.A. 2020-11-01
Series:	Frontiers in Molecular Biosciences
Subjects:	whole-genome resequencing next-generation sequencing repeats sequence alignment FM-index
Online Access:	https://www.frontiersin.org/articles/10.3389/fmolb.2020.572934/full

id	doaj-84ab495b487744a2a1e144415115d845
record_format	Article
spelling	doaj-84ab495b487744a2a1e144415115d8452020-11-25T04:08:24ZengFrontiers Media S.A.Frontiers in Molecular Biosciences2296-889X2020-11-01710.3389/fmolb.2020.572934572934Short Read Alignment Based on Maximal Approximate Match SeedsWei Quan0Dengfeng Guan1Dengfeng Guan2Guangri Quan3Bo Liu4Yadong Wang5School of Computer Science and Technology, Harbin Institute of Technology, Harbin, ChinaSchool of Computer Science and Technology, Harbin Institute of Technology, Harbin, ChinaInstitute of Zoology, Chinese Academy of Sciences, Beijing, ChinaSchool of Computer Science and Technology, Harbin Institute of Technology, Harbin, ChinaSchool of Computer Science and Technology, Harbin Institute of Technology, Harbin, ChinaSchool of Computer Science and Technology, Harbin Institute of Technology, Harbin, ChinaSequence alignment is a critical step in many critical genomic studies, such as variant calling, quantitative transcriptome analysis (RNA-seq), and metagenomic sequence classification. However, the alignment performance is largely affected by repetitive sequences in the reference genome, which extensively exist in species from bacteria to mammals. Aligning repeating sequences might lead to tremendous candidate locations, bringing about a challenging computational burden. Thus, most alignment tools prefer to simply discard highly repetitive seeds, but this may cause the true alignment to be missed. Using maximal approximate matches (MAMs) as seeds is an option, but MEMs seeds may fail due to sequencing errors or genomic variations in MEMs seeds. Here, we propose a novel sequence alignment algorithm, named MAM, which can efficiently align short DNA sequences. MAM first builds a modified Burrows-Wheeler transform (BWT) structure of a reference genome to accelerate approximate seed matching. Then, MAM uses maximal approximate matches (MAMs) seeds to reduce the candidate locations. Finally, MAM applies an affine-gap-penalty dynamic programming to extend MAMs seeds. Experimental results on simulated and real sequencing datasets show that MAM achieves better performance in speed than other state-of-the-art alignment tools. The source code is available at https://github.com/weiquan/mam.https://www.frontiersin.org/articles/10.3389/fmolb.2020.572934/fullwhole-genome resequencingnext-generation sequencingrepeatssequence alignmentFM-index
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Wei Quan Dengfeng Guan Dengfeng Guan Guangri Quan Bo Liu Yadong Wang
spellingShingle	Wei Quan Dengfeng Guan Dengfeng Guan Guangri Quan Bo Liu Yadong Wang Short Read Alignment Based on Maximal Approximate Match Seeds Frontiers in Molecular Biosciences whole-genome resequencing next-generation sequencing repeats sequence alignment FM-index
author_facet	Wei Quan Dengfeng Guan Dengfeng Guan Guangri Quan Bo Liu Yadong Wang
author_sort	Wei Quan
title	Short Read Alignment Based on Maximal Approximate Match Seeds
title_short	Short Read Alignment Based on Maximal Approximate Match Seeds
title_full	Short Read Alignment Based on Maximal Approximate Match Seeds
title_fullStr	Short Read Alignment Based on Maximal Approximate Match Seeds
title_full_unstemmed	Short Read Alignment Based on Maximal Approximate Match Seeds
title_sort	short read alignment based on maximal approximate match seeds
publisher	Frontiers Media S.A.
series	Frontiers in Molecular Biosciences
issn	2296-889X
publishDate	2020-11-01
description	Sequence alignment is a critical step in many critical genomic studies, such as variant calling, quantitative transcriptome analysis (RNA-seq), and metagenomic sequence classification. However, the alignment performance is largely affected by repetitive sequences in the reference genome, which extensively exist in species from bacteria to mammals. Aligning repeating sequences might lead to tremendous candidate locations, bringing about a challenging computational burden. Thus, most alignment tools prefer to simply discard highly repetitive seeds, but this may cause the true alignment to be missed. Using maximal approximate matches (MAMs) as seeds is an option, but MEMs seeds may fail due to sequencing errors or genomic variations in MEMs seeds. Here, we propose a novel sequence alignment algorithm, named MAM, which can efficiently align short DNA sequences. MAM first builds a modified Burrows-Wheeler transform (BWT) structure of a reference genome to accelerate approximate seed matching. Then, MAM uses maximal approximate matches (MAMs) seeds to reduce the candidate locations. Finally, MAM applies an affine-gap-penalty dynamic programming to extend MAMs seeds. Experimental results on simulated and real sequencing datasets show that MAM achieves better performance in speed than other state-of-the-art alignment tools. The source code is available at https://github.com/weiquan/mam.
topic	whole-genome resequencing next-generation sequencing repeats sequence alignment FM-index
url	https://www.frontiersin.org/articles/10.3389/fmolb.2020.572934/full
work_keys_str_mv	AT weiquan shortreadalignmentbasedonmaximalapproximatematchseeds AT dengfengguan shortreadalignmentbasedonmaximalapproximatematchseeds AT dengfengguan shortreadalignmentbasedonmaximalapproximatematchseeds AT guangriquan shortreadalignmentbasedonmaximalapproximatematchseeds AT boliu shortreadalignmentbasedonmaximalapproximatematchseeds AT yadongwang shortreadalignmentbasedonmaximalapproximatematchseeds
_version_	1724426044432187392

Short Read Alignment Based on Maximal Approximate Match Seeds

Similar Items