Gene Sequence Input Formatting and MapReduce Computing
Considering the limitations of the application programming interface (API) of Hadoop in gene sequence computing, this paper puts forward an input formatting method that reads the format of gene sequence as key-value pairs in the form of records. This method relies on the rewriting of Hadoop source c...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Bulgarian Academy of Sciences
2019-06-01
|
Series: | International Journal Bioautomation |
Subjects: | |
Online Access: | http://www.biomed.bas.bg/bioautomation/2019/vol_23.2/files/23.2_10.pdf |
id |
doaj-6303e8d130ac4867827ad241c1095b8a |
---|---|
record_format |
Article |
spelling |
doaj-6303e8d130ac4867827ad241c1095b8a2020-11-25T03:41:54ZengBulgarian Academy of SciencesInternational Journal Bioautomation1314-19021314-23212019-06-0123223324610.7546/ijba.2019.23.2.000675Gene Sequence Input Formatting and MapReduce ComputingXiaolong Feng0Jing GaoCollege of Computer and Information Engineering, Inner Mongolia Agricultural University, Hohhot 010018, ChinaConsidering the limitations of the application programming interface (API) of Hadoop in gene sequence computing, this paper puts forward an input formatting method that reads the format of gene sequence as key-value pairs in the form of records. This method relies on the rewriting of Hadoop source code, which is an extension of platform function, and eliminates the need to preprocess data with other tools. On this basis, a MapReduce computing model was designed for distributed parallel computing of gene sequence alignment tasks. Experimental verification shows that the proposed method can read many kinds of gene sequence files effectively on Hadoop, and the proposed model can realize distributed parallel computing of gene sequence alignment. The research findings provide a valuable reference for bioinformatics computing tasks on Hadoop platform.http://www.biomed.bas.bg/bioautomation/2019/vol_23.2/files/23.2_10.pdfInput formattingMapReduceGene sequenceSequence alignment |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Xiaolong Feng Jing Gao |
spellingShingle |
Xiaolong Feng Jing Gao Gene Sequence Input Formatting and MapReduce Computing International Journal Bioautomation Input formatting MapReduce Gene sequence Sequence alignment |
author_facet |
Xiaolong Feng Jing Gao |
author_sort |
Xiaolong Feng |
title |
Gene Sequence Input Formatting and MapReduce Computing |
title_short |
Gene Sequence Input Formatting and MapReduce Computing |
title_full |
Gene Sequence Input Formatting and MapReduce Computing |
title_fullStr |
Gene Sequence Input Formatting and MapReduce Computing |
title_full_unstemmed |
Gene Sequence Input Formatting and MapReduce Computing |
title_sort |
gene sequence input formatting and mapreduce computing |
publisher |
Bulgarian Academy of Sciences |
series |
International Journal Bioautomation |
issn |
1314-1902 1314-2321 |
publishDate |
2019-06-01 |
description |
Considering the limitations of the application programming interface (API) of Hadoop in gene sequence computing, this paper puts forward an input formatting method that reads the format of gene sequence as key-value pairs in the form of records. This method relies on the rewriting of Hadoop source code, which is an extension of platform function, and eliminates the need to preprocess data with other tools. On this basis, a MapReduce computing model was designed for distributed parallel computing of gene sequence alignment tasks. Experimental verification shows that the proposed method can read many kinds of gene sequence files effectively on Hadoop, and the proposed model can realize distributed parallel computing of gene sequence alignment. The research findings provide a valuable reference for bioinformatics computing tasks on Hadoop platform. |
topic |
Input formatting MapReduce Gene sequence Sequence alignment |
url |
http://www.biomed.bas.bg/bioautomation/2019/vol_23.2/files/23.2_10.pdf |
work_keys_str_mv |
AT xiaolongfeng genesequenceinputformattingandmapreducecomputing AT jinggao genesequenceinputformattingandmapreducecomputing |
_version_ |
1724527543147560960 |