Gene Sequence Input Formatting and MapReduce Computing

Considering the limitations of the application programming interface (API) of Hadoop in gene sequence computing, this paper puts forward an input formatting method that reads the format of gene sequence as key-value pairs in the form of records. This method relies on the rewriting of Hadoop source c...

Full description

Bibliographic Details
Main Authors: Xiaolong Feng, Jing Gao
Format: Article
Language:English
Published: Bulgarian Academy of Sciences 2019-06-01
Series:International Journal Bioautomation
Subjects:
Online Access:http://www.biomed.bas.bg/bioautomation/2019/vol_23.2/files/23.2_10.pdf
id doaj-6303e8d130ac4867827ad241c1095b8a
record_format Article
spelling doaj-6303e8d130ac4867827ad241c1095b8a2020-11-25T03:41:54ZengBulgarian Academy of SciencesInternational Journal Bioautomation1314-19021314-23212019-06-0123223324610.7546/ijba.2019.23.2.000675Gene Sequence Input Formatting and MapReduce ComputingXiaolong Feng0Jing GaoCollege of Computer and Information Engineering, Inner Mongolia Agricultural University, Hohhot 010018, ChinaConsidering the limitations of the application programming interface (API) of Hadoop in gene sequence computing, this paper puts forward an input formatting method that reads the format of gene sequence as key-value pairs in the form of records. This method relies on the rewriting of Hadoop source code, which is an extension of platform function, and eliminates the need to preprocess data with other tools. On this basis, a MapReduce computing model was designed for distributed parallel computing of gene sequence alignment tasks. Experimental verification shows that the proposed method can read many kinds of gene sequence files effectively on Hadoop, and the proposed model can realize distributed parallel computing of gene sequence alignment. The research findings provide a valuable reference for bioinformatics computing tasks on Hadoop platform.http://www.biomed.bas.bg/bioautomation/2019/vol_23.2/files/23.2_10.pdfInput formattingMapReduceGene sequenceSequence alignment
collection DOAJ
language English
format Article
sources DOAJ
author Xiaolong Feng
Jing Gao
spellingShingle Xiaolong Feng
Jing Gao
Gene Sequence Input Formatting and MapReduce Computing
International Journal Bioautomation
Input formatting
MapReduce
Gene sequence
Sequence alignment
author_facet Xiaolong Feng
Jing Gao
author_sort Xiaolong Feng
title Gene Sequence Input Formatting and MapReduce Computing
title_short Gene Sequence Input Formatting and MapReduce Computing
title_full Gene Sequence Input Formatting and MapReduce Computing
title_fullStr Gene Sequence Input Formatting and MapReduce Computing
title_full_unstemmed Gene Sequence Input Formatting and MapReduce Computing
title_sort gene sequence input formatting and mapreduce computing
publisher Bulgarian Academy of Sciences
series International Journal Bioautomation
issn 1314-1902
1314-2321
publishDate 2019-06-01
description Considering the limitations of the application programming interface (API) of Hadoop in gene sequence computing, this paper puts forward an input formatting method that reads the format of gene sequence as key-value pairs in the form of records. This method relies on the rewriting of Hadoop source code, which is an extension of platform function, and eliminates the need to preprocess data with other tools. On this basis, a MapReduce computing model was designed for distributed parallel computing of gene sequence alignment tasks. Experimental verification shows that the proposed method can read many kinds of gene sequence files effectively on Hadoop, and the proposed model can realize distributed parallel computing of gene sequence alignment. The research findings provide a valuable reference for bioinformatics computing tasks on Hadoop platform.
topic Input formatting
MapReduce
Gene sequence
Sequence alignment
url http://www.biomed.bas.bg/bioautomation/2019/vol_23.2/files/23.2_10.pdf
work_keys_str_mv AT xiaolongfeng genesequenceinputformattingandmapreducecomputing
AT jinggao genesequenceinputformattingandmapreducecomputing
_version_ 1724527543147560960