Applying Sequential Pattern Mining Technique to Construct a Pairwise Sequence Similarity Kernel for Support Vector Machine Classifiers

碩士 === 元智大學 === 工業工程與管理學系 === 103 === Sequence classification problem can be found and discussed in many real world applications such as protein function prediction, text classification, and so on. SVMs (Support Vector Machines) have been used to deal with sequence classification problem, since SVMs...

Full description

Bibliographic Details
Main Authors: Yu-Yu Yao, 姚佑俞
Other Authors: Chieh-Yuan Tsai
Format: Others
Language:en_US
Online Access:http://ndltd.ncl.edu.tw/handle/75795224962225592217
id ndltd-TW-103YZU05031046
record_format oai_dc
spelling ndltd-TW-103YZU050310462016-12-04T04:07:59Z http://ndltd.ncl.edu.tw/handle/75795224962225592217 Applying Sequential Pattern Mining Technique to Construct a Pairwise Sequence Similarity Kernel for Support Vector Machine Classifiers 應用序列樣式探勘技術建構成對序列相似核方法於支援向量機分類器 Yu-Yu Yao 姚佑俞 碩士 元智大學 工業工程與管理學系 103 Sequence classification problem can be found and discussed in many real world applications such as protein function prediction, text classification, and so on. SVMs (Support Vector Machines) have been used to deal with sequence classification problem, since SVMs can deal with the nonlinear data and possess high efficiency in classification. However, the most difficult part in SVMs is to design an appropriate kernel function. Therefore, a pairwise sequence similarity kernel is proposed which takes sequential patterns instead of taking k-mers as reference sequences and evaluates the similarity scores between reference sequences and sequence data by a map function. To obtain sequential patterns, three different sequential pattern mining methods are used to extract frequent sequential patterns, frequent closed sequential patterns, and frequent maximal sequential patterns from sequence databases. The three sequential patterns are then evaluated to know which one could achieve higher accuracy. A map function, which is edit distance algorithm, is used in the proposed kernel to calculate the similarity score. Next, the sequence SVM classifier is built according to the proposed pairwise sequence similarity kernel. Through the proposed sequence SVM classifier with pairwise sequence similarity kernel, the class label of a new sequence will be predicted precisely. The artificial dataset and the real protein sequence dataset are employed to test the proposed SVM classification model using pairwise sequence similarity kernel with three different sequential patterns. The experiment results indicate the proposed SVM classification model using pairwise sequence similarity kernel is efficient and feasible. Chieh-Yuan Tsai 蔡介元 學位論文 ; thesis 83 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 元智大學 === 工業工程與管理學系 === 103 === Sequence classification problem can be found and discussed in many real world applications such as protein function prediction, text classification, and so on. SVMs (Support Vector Machines) have been used to deal with sequence classification problem, since SVMs can deal with the nonlinear data and possess high efficiency in classification. However, the most difficult part in SVMs is to design an appropriate kernel function. Therefore, a pairwise sequence similarity kernel is proposed which takes sequential patterns instead of taking k-mers as reference sequences and evaluates the similarity scores between reference sequences and sequence data by a map function. To obtain sequential patterns, three different sequential pattern mining methods are used to extract frequent sequential patterns, frequent closed sequential patterns, and frequent maximal sequential patterns from sequence databases. The three sequential patterns are then evaluated to know which one could achieve higher accuracy. A map function, which is edit distance algorithm, is used in the proposed kernel to calculate the similarity score. Next, the sequence SVM classifier is built according to the proposed pairwise sequence similarity kernel. Through the proposed sequence SVM classifier with pairwise sequence similarity kernel, the class label of a new sequence will be predicted precisely. The artificial dataset and the real protein sequence dataset are employed to test the proposed SVM classification model using pairwise sequence similarity kernel with three different sequential patterns. The experiment results indicate the proposed SVM classification model using pairwise sequence similarity kernel is efficient and feasible.
author2 Chieh-Yuan Tsai
author_facet Chieh-Yuan Tsai
Yu-Yu Yao
姚佑俞
author Yu-Yu Yao
姚佑俞
spellingShingle Yu-Yu Yao
姚佑俞
Applying Sequential Pattern Mining Technique to Construct a Pairwise Sequence Similarity Kernel for Support Vector Machine Classifiers
author_sort Yu-Yu Yao
title Applying Sequential Pattern Mining Technique to Construct a Pairwise Sequence Similarity Kernel for Support Vector Machine Classifiers
title_short Applying Sequential Pattern Mining Technique to Construct a Pairwise Sequence Similarity Kernel for Support Vector Machine Classifiers
title_full Applying Sequential Pattern Mining Technique to Construct a Pairwise Sequence Similarity Kernel for Support Vector Machine Classifiers
title_fullStr Applying Sequential Pattern Mining Technique to Construct a Pairwise Sequence Similarity Kernel for Support Vector Machine Classifiers
title_full_unstemmed Applying Sequential Pattern Mining Technique to Construct a Pairwise Sequence Similarity Kernel for Support Vector Machine Classifiers
title_sort applying sequential pattern mining technique to construct a pairwise sequence similarity kernel for support vector machine classifiers
url http://ndltd.ncl.edu.tw/handle/75795224962225592217
work_keys_str_mv AT yuyuyao applyingsequentialpatternminingtechniquetoconstructapairwisesequencesimilaritykernelforsupportvectormachineclassifiers
AT yáoyòuyú applyingsequentialpatternminingtechniquetoconstructapairwisesequencesimilaritykernelforsupportvectormachineclassifiers
AT yuyuyao yīngyòngxùlièyàngshìtànkānjìshùjiàngòuchéngduìxùlièxiāngshìhéfāngfǎyúzhīyuánxiàngliàngjīfēnlèiqì
AT yáoyòuyú yīngyòngxùlièyàngshìtànkānjìshùjiàngòuchéngduìxùlièxiāngshìhéfāngfǎyúzhīyuánxiàngliàngjīfēnlèiqì
_version_ 1718399112507817984