Efficient Mining of Interesting Patterns in Large Biological Sequences

Pattern discovery in biological sequences (e.g., DNA sequences) is one of the most challenging tasks in computational biology and bioinformatics. So far, in most approaches, the number of occurrences is a major measure of determining whether a pattern is interesting or not. In computational biology,...

Full description

Bibliographic Details
Main Authors: Md. Mamunur Rashid, Md. Rezaul Karim, Byeong-Soo Jeong, Ho-Jin Choi
Format: Article
Language:English
Published: Korea Genome Organization 2012-03-01
Series:Genomics & Informatics
Subjects:
Online Access:http://genominfo.org/upload/pdf/gni-10-44.pdf
id doaj-9c7280b230d544ffb0f5fd9bd35bb3fa
record_format Article
spelling doaj-9c7280b230d544ffb0f5fd9bd35bb3fa2020-11-24T22:44:49ZengKorea Genome OrganizationGenomics & Informatics1598-866X2234-07422012-03-01101445010.5808/GI.2012.10.1.4432Efficient Mining of Interesting Patterns in Large Biological SequencesMd. Mamunur Rashid0Md. Rezaul Karim1Byeong-Soo Jeong2Ho-Jin Choi3Department of Computer Engineering, College of Electronics and Information, Kyung Hee University, Yongin 446-701, Korea.Department of Computer Engineering, College of Electronics and Information, Kyung Hee University, Yongin 446-701, Korea.Department of Computer Engineering, College of Electronics and Information, Kyung Hee University, Yongin 446-701, Korea.Department of Computer Science, Korea Advanced Institute of Science and Technology, Daejeon 305-701, Korea.Pattern discovery in biological sequences (e.g., DNA sequences) is one of the most challenging tasks in computational biology and bioinformatics. So far, in most approaches, the number of occurrences is a major measure of determining whether a pattern is interesting or not. In computational biology, however, a pattern that is not frequent may still be considered very informative if its actual support frequency exceeds the prior expectation by a large margin. In this paper, we propose a new interesting measure that can provide meaningful biological information. We also propose an efficient index-based method for mining such interesting patterns. Experimental results show that our approach can find interesting patterns within an acceptable computation time.http://genominfo.org/upload/pdf/gni-10-44.pdfDNA sequenceindex-based methodinformation gainpattern mining
collection DOAJ
language English
format Article
sources DOAJ
author Md. Mamunur Rashid
Md. Rezaul Karim
Byeong-Soo Jeong
Ho-Jin Choi
spellingShingle Md. Mamunur Rashid
Md. Rezaul Karim
Byeong-Soo Jeong
Ho-Jin Choi
Efficient Mining of Interesting Patterns in Large Biological Sequences
Genomics & Informatics
DNA sequence
index-based method
information gain
pattern mining
author_facet Md. Mamunur Rashid
Md. Rezaul Karim
Byeong-Soo Jeong
Ho-Jin Choi
author_sort Md. Mamunur Rashid
title Efficient Mining of Interesting Patterns in Large Biological Sequences
title_short Efficient Mining of Interesting Patterns in Large Biological Sequences
title_full Efficient Mining of Interesting Patterns in Large Biological Sequences
title_fullStr Efficient Mining of Interesting Patterns in Large Biological Sequences
title_full_unstemmed Efficient Mining of Interesting Patterns in Large Biological Sequences
title_sort efficient mining of interesting patterns in large biological sequences
publisher Korea Genome Organization
series Genomics & Informatics
issn 1598-866X
2234-0742
publishDate 2012-03-01
description Pattern discovery in biological sequences (e.g., DNA sequences) is one of the most challenging tasks in computational biology and bioinformatics. So far, in most approaches, the number of occurrences is a major measure of determining whether a pattern is interesting or not. In computational biology, however, a pattern that is not frequent may still be considered very informative if its actual support frequency exceeds the prior expectation by a large margin. In this paper, we propose a new interesting measure that can provide meaningful biological information. We also propose an efficient index-based method for mining such interesting patterns. Experimental results show that our approach can find interesting patterns within an acceptable computation time.
topic DNA sequence
index-based method
information gain
pattern mining
url http://genominfo.org/upload/pdf/gni-10-44.pdf
work_keys_str_mv AT mdmamunurrashid efficientminingofinterestingpatternsinlargebiologicalsequences
AT mdrezaulkarim efficientminingofinterestingpatternsinlargebiologicalsequences
AT byeongsoojeong efficientminingofinterestingpatternsinlargebiologicalsequences
AT hojinchoi efficientminingofinterestingpatternsinlargebiologicalsequences
_version_ 1725690206274715648