Mining Maximal Sequential Patterns in Protein Databases

碩士 === 國立臺灣大學 === 資訊管理學研究所 === 93 === Because of the close relationship between sequential patterns and protein function, systematically mining significant sequential patterns in protein databases has become an important research topic. In this thesis, we proposed a suffix-tree-based algorithm to di...

Full description

Bibliographic Details
Main Authors: Yu Ling, 凌宇
Other Authors: 李瑞庭
Format: Others
Language:en_US
Published: 2005
Online Access:http://ndltd.ncl.edu.tw/handle/28917977808357764502
Description
Summary:碩士 === 國立臺灣大學 === 資訊管理學研究所 === 93 === Because of the close relationship between sequential patterns and protein function, systematically mining significant sequential patterns in protein databases has become an important research topic. In this thesis, we proposed a suffix-tree-based algorithm to discover patterns in protein databases. We use the occurrence information maintained in the suffix tree to mine closed frequent substrings, generate maximal frequent sequential patterns, and adjust the gaps within the patterns. To ensure the compactness of the patterns we generate, we do not generate all patterns but only maximal patterns. From the experimental results, our proposed algorithm can find not only the patterns recorded in PROSITE database, but also some other patterns worth of further biological studying, such as longer patterns and the classifier pattern set. Besides, our proposed algorithm generates better results than those of Chang and Halgamuge’s method in the experiment.