Frequent Patterns Mining in DNA Sequence

As a common biological sequence, DNA sequences contain important information. The discovery of frequent patterns in DNA sequences can help to study the evolution, function and variation of genes. The findings are of great significance to genetic and mutation analysis, analysis of disease causes and...

Full description

Bibliographic Details
Main Authors: Na Deng, Xu Chen, Desheng Li, Caiquan Xiong
Format: Article
Language:English
Published: IEEE 2019-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8787813/
id doaj-5656fe8c52424a5881e2a2e3daa3d006
record_format Article
spelling doaj-5656fe8c52424a5881e2a2e3daa3d0062021-04-05T17:04:11ZengIEEEIEEE Access2169-35362019-01-01710840010841010.1109/ACCESS.2019.29330448787813Frequent Patterns Mining in DNA SequenceNa Deng0https://orcid.org/0000-0002-4413-2625Xu Chen1Desheng Li2Caiquan Xiong3School of Computer, Hubei University of Technology, Wuhan, ChinaSchool of Information and Safety Engineering, Zhongnan University of Economics and Law, Wuhan, ChinaCollege of Mathematics, Physics and Information Engineering, Anhui University of Science and Technology, Huainan, ChinaSchool of Computer, Hubei University of Technology, Wuhan, ChinaAs a common biological sequence, DNA sequences contain important information. The discovery of frequent patterns in DNA sequences can help to study the evolution, function and variation of genes. The findings are of great significance to genetic and mutation analysis, analysis of disease causes and treatment of diseases. Traditional methods of frequent pattern discovery need to scan DNA sequences multiple times. To overcome this shortcoming, this article proposes a new method to discover frequent patterns from DNA sequences. This method is based on a two-level nested hash table data structure and set operation. All frequent patterns and their positions in DNA sequences can be found by scanning DNA sequences only once. Experimental results show that this method can correctly recognize all the frequent patterns in DNA sequences and their locations. The method can also be applied to discover frequent patterns in RNA, protein or other biological sequences.https://ieeexplore.ieee.org/document/8787813/Big databiological informationdata miningDNA sequencefrequent patternhash table
collection DOAJ
language English
format Article
sources DOAJ
author Na Deng
Xu Chen
Desheng Li
Caiquan Xiong
spellingShingle Na Deng
Xu Chen
Desheng Li
Caiquan Xiong
Frequent Patterns Mining in DNA Sequence
IEEE Access
Big data
biological information
data mining
DNA sequence
frequent pattern
hash table
author_facet Na Deng
Xu Chen
Desheng Li
Caiquan Xiong
author_sort Na Deng
title Frequent Patterns Mining in DNA Sequence
title_short Frequent Patterns Mining in DNA Sequence
title_full Frequent Patterns Mining in DNA Sequence
title_fullStr Frequent Patterns Mining in DNA Sequence
title_full_unstemmed Frequent Patterns Mining in DNA Sequence
title_sort frequent patterns mining in dna sequence
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2019-01-01
description As a common biological sequence, DNA sequences contain important information. The discovery of frequent patterns in DNA sequences can help to study the evolution, function and variation of genes. The findings are of great significance to genetic and mutation analysis, analysis of disease causes and treatment of diseases. Traditional methods of frequent pattern discovery need to scan DNA sequences multiple times. To overcome this shortcoming, this article proposes a new method to discover frequent patterns from DNA sequences. This method is based on a two-level nested hash table data structure and set operation. All frequent patterns and their positions in DNA sequences can be found by scanning DNA sequences only once. Experimental results show that this method can correctly recognize all the frequent patterns in DNA sequences and their locations. The method can also be applied to discover frequent patterns in RNA, protein or other biological sequences.
topic Big data
biological information
data mining
DNA sequence
frequent pattern
hash table
url https://ieeexplore.ieee.org/document/8787813/
work_keys_str_mv AT nadeng frequentpatternsminingindnasequence
AT xuchen frequentpatternsminingindnasequence
AT deshengli frequentpatternsminingindnasequence
AT caiquanxiong frequentpatternsminingindnasequence
_version_ 1721540356273602560