Spark-based Parallelization of Basic Local Alignment Search Tool
Sequence alignment is a key link of bioinformatics analysis. The basic local alignment search tool (BLAST) is a popular sequence alignment algorithm with high accuracy. However, the BLAST is inefficient in comparing and analyzing a massive amount of gene sequencing data. To solve the problem, this p...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Bulgarian Academy of Sciences
2020-03-01
|
Series: | International Journal Bioautomation |
Subjects: | |
Online Access: | http://www.biomed.bas.bg/bioautomation/2020/vol_24.1/files/24.1_08.pdf |
id |
doaj-06906e74bd194b5985b3494ca121cba6 |
---|---|
record_format |
Article |
spelling |
doaj-06906e74bd194b5985b3494ca121cba62020-11-25T03:20:05ZengBulgarian Academy of SciencesInternational Journal Bioautomation1314-19021314-23212020-03-01241879810.7546/ijba.2020.24.1.000767Spark-based Parallelization of Basic Local Alignment Search ToolHui Wang0Leixiao LiChengdong ZhouHao LinDan DengCollege of Data Science and Application, Inner Mongolia University of Technology, Hohhot 010080, ChinaSequence alignment is a key link of bioinformatics analysis. The basic local alignment search tool (BLAST) is a popular sequence alignment algorithm with high accuracy. However, the BLAST is inefficient in comparing and analyzing a massive amount of gene sequencing data. To solve the problem, this paper designs a distributed parallel BLAST method called SparkBLAST, based on the big data technique Spark. Under the in-memory computing framework Spark, SparkBLAST identifies the task of sequence alignment, divides the sequence dataset, and compares the sequence data. The Apache Hadoop YARN was adopted to task scheduling and resource allocation. Finally, the SparkBLAST was compared with standalone BLAST through experiments. The results show that SparkBLAST realized the speedup ratio of 3.95, without sacrificing the accuracy. In other words, SparkBLAST greatly outshines the standalone BLAST in calculation efficiency. The research findings provide bioinformatics researchers a highly efficient tool for sequence alignment.http://www.biomed.bas.bg/bioautomation/2020/vol_24.1/files/24.1_08.pdfsequence alignmentbasic local alignment search toolsparkparallelizationspeedup |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Hui Wang Leixiao Li Chengdong Zhou Hao Lin Dan Deng |
spellingShingle |
Hui Wang Leixiao Li Chengdong Zhou Hao Lin Dan Deng Spark-based Parallelization of Basic Local Alignment Search Tool International Journal Bioautomation sequence alignment basic local alignment search tool spark parallelization speedup |
author_facet |
Hui Wang Leixiao Li Chengdong Zhou Hao Lin Dan Deng |
author_sort |
Hui Wang |
title |
Spark-based Parallelization of Basic Local Alignment Search Tool |
title_short |
Spark-based Parallelization of Basic Local Alignment Search Tool |
title_full |
Spark-based Parallelization of Basic Local Alignment Search Tool |
title_fullStr |
Spark-based Parallelization of Basic Local Alignment Search Tool |
title_full_unstemmed |
Spark-based Parallelization of Basic Local Alignment Search Tool |
title_sort |
spark-based parallelization of basic local alignment search tool |
publisher |
Bulgarian Academy of Sciences |
series |
International Journal Bioautomation |
issn |
1314-1902 1314-2321 |
publishDate |
2020-03-01 |
description |
Sequence alignment is a key link of bioinformatics analysis. The basic local alignment search tool (BLAST) is a popular sequence alignment algorithm with high accuracy. However, the BLAST is inefficient in comparing and analyzing a massive amount of gene sequencing data. To solve the problem, this paper designs a distributed parallel BLAST method called SparkBLAST, based on the big data technique Spark. Under the in-memory computing framework Spark, SparkBLAST identifies the task of sequence alignment, divides the sequence dataset, and compares the sequence data. The Apache Hadoop YARN was adopted to task scheduling and resource allocation. Finally, the SparkBLAST was compared with standalone BLAST through experiments. The results show that SparkBLAST realized the speedup ratio of 3.95, without sacrificing the accuracy. In other words, SparkBLAST greatly outshines the standalone BLAST in calculation efficiency. The research findings provide bioinformatics researchers a highly efficient tool for sequence alignment. |
topic |
sequence alignment basic local alignment search tool spark parallelization speedup |
url |
http://www.biomed.bas.bg/bioautomation/2020/vol_24.1/files/24.1_08.pdf |
work_keys_str_mv |
AT huiwang sparkbasedparallelizationofbasiclocalalignmentsearchtool AT leixiaoli sparkbasedparallelizationofbasiclocalalignmentsearchtool AT chengdongzhou sparkbasedparallelizationofbasiclocalalignmentsearchtool AT haolin sparkbasedparallelizationofbasiclocalalignmentsearchtool AT dandeng sparkbasedparallelizationofbasiclocalalignmentsearchtool |
_version_ |
1724619393490485248 |