Spark-based Parallelization of Basic Local Alignment Search Tool

Sequence alignment is a key link of bioinformatics analysis. The basic local alignment search tool (BLAST) is a popular sequence alignment algorithm with high accuracy. However, the BLAST is inefficient in comparing and analyzing a massive amount of gene sequencing data. To solve the problem, this p...

Full description

Bibliographic Details
Main Authors: Hui Wang, Leixiao Li, Chengdong Zhou, Hao Lin, Dan Deng
Format: Article
Language:English
Published: Bulgarian Academy of Sciences 2020-03-01
Series:International Journal Bioautomation
Subjects:
Online Access:http://www.biomed.bas.bg/bioautomation/2020/vol_24.1/files/24.1_08.pdf
id doaj-06906e74bd194b5985b3494ca121cba6
record_format Article
spelling doaj-06906e74bd194b5985b3494ca121cba62020-11-25T03:20:05ZengBulgarian Academy of SciencesInternational Journal Bioautomation1314-19021314-23212020-03-01241879810.7546/ijba.2020.24.1.000767Spark-based Parallelization of Basic Local Alignment Search ToolHui Wang0Leixiao LiChengdong ZhouHao LinDan DengCollege of Data Science and Application, Inner Mongolia University of Technology, Hohhot 010080, ChinaSequence alignment is a key link of bioinformatics analysis. The basic local alignment search tool (BLAST) is a popular sequence alignment algorithm with high accuracy. However, the BLAST is inefficient in comparing and analyzing a massive amount of gene sequencing data. To solve the problem, this paper designs a distributed parallel BLAST method called SparkBLAST, based on the big data technique Spark. Under the in-memory computing framework Spark, SparkBLAST identifies the task of sequence alignment, divides the sequence dataset, and compares the sequence data. The Apache Hadoop YARN was adopted to task scheduling and resource allocation. Finally, the SparkBLAST was compared with standalone BLAST through experiments. The results show that SparkBLAST realized the speedup ratio of 3.95, without sacrificing the accuracy. In other words, SparkBLAST greatly outshines the standalone BLAST in calculation efficiency. The research findings provide bioinformatics researchers a highly efficient tool for sequence alignment.http://www.biomed.bas.bg/bioautomation/2020/vol_24.1/files/24.1_08.pdfsequence alignmentbasic local alignment search toolsparkparallelizationspeedup
collection DOAJ
language English
format Article
sources DOAJ
author Hui Wang
Leixiao Li
Chengdong Zhou
Hao Lin
Dan Deng
spellingShingle Hui Wang
Leixiao Li
Chengdong Zhou
Hao Lin
Dan Deng
Spark-based Parallelization of Basic Local Alignment Search Tool
International Journal Bioautomation
sequence alignment
basic local alignment search tool
spark
parallelization
speedup
author_facet Hui Wang
Leixiao Li
Chengdong Zhou
Hao Lin
Dan Deng
author_sort Hui Wang
title Spark-based Parallelization of Basic Local Alignment Search Tool
title_short Spark-based Parallelization of Basic Local Alignment Search Tool
title_full Spark-based Parallelization of Basic Local Alignment Search Tool
title_fullStr Spark-based Parallelization of Basic Local Alignment Search Tool
title_full_unstemmed Spark-based Parallelization of Basic Local Alignment Search Tool
title_sort spark-based parallelization of basic local alignment search tool
publisher Bulgarian Academy of Sciences
series International Journal Bioautomation
issn 1314-1902
1314-2321
publishDate 2020-03-01
description Sequence alignment is a key link of bioinformatics analysis. The basic local alignment search tool (BLAST) is a popular sequence alignment algorithm with high accuracy. However, the BLAST is inefficient in comparing and analyzing a massive amount of gene sequencing data. To solve the problem, this paper designs a distributed parallel BLAST method called SparkBLAST, based on the big data technique Spark. Under the in-memory computing framework Spark, SparkBLAST identifies the task of sequence alignment, divides the sequence dataset, and compares the sequence data. The Apache Hadoop YARN was adopted to task scheduling and resource allocation. Finally, the SparkBLAST was compared with standalone BLAST through experiments. The results show that SparkBLAST realized the speedup ratio of 3.95, without sacrificing the accuracy. In other words, SparkBLAST greatly outshines the standalone BLAST in calculation efficiency. The research findings provide bioinformatics researchers a highly efficient tool for sequence alignment.
topic sequence alignment
basic local alignment search tool
spark
parallelization
speedup
url http://www.biomed.bas.bg/bioautomation/2020/vol_24.1/files/24.1_08.pdf
work_keys_str_mv AT huiwang sparkbasedparallelizationofbasiclocalalignmentsearchtool
AT leixiaoli sparkbasedparallelizationofbasiclocalalignmentsearchtool
AT chengdongzhou sparkbasedparallelizationofbasiclocalalignmentsearchtool
AT haolin sparkbasedparallelizationofbasiclocalalignmentsearchtool
AT dandeng sparkbasedparallelizationofbasiclocalalignmentsearchtool
_version_ 1724619393490485248