A Study of Error-Correcting Output Codes for Structure Classification of Protein Sequence

碩士 === 國立中正大學 === 資訊工程所 === 93 === Structure classification and prediction of protein sequence has been an important research theme in structural bioinformatics. Most discriminative methods in machine learning suffers the well-known "False Positives" problem due to the larger amount of fol...

Full description

Bibliographic Details
Main Authors: Che-Chi Wu, 吳哲奇
Other Authors: Jyh-Jong Tasy
Format: Others
Language:en_US
Published: 2005
Online Access:http://ndltd.ncl.edu.tw/handle/77344414623169086094
id ndltd-TW-093CCU05392024
record_format oai_dc
spelling ndltd-TW-093CCU053920242015-10-13T10:45:04Z http://ndltd.ncl.edu.tw/handle/77344414623169086094 A Study of Error-Correcting Output Codes for Structure Classification of Protein Sequence 錯誤更正輸出碼在蛋白質序列之結構分類的研究 Che-Chi Wu 吳哲奇 碩士 國立中正大學 資訊工程所 93 Structure classification and prediction of protein sequence has been an important research theme in structural bioinformatics. Most discriminative methods in machine learning suffers the well-known "False Positives" problem due to the larger amount of folds available. In this thesis, we study new approach with multi-class classification methods to reduce the influence of the existing problem. We use the support vector machine (SVM) method as base classifiers and apply Error-Correcting Output Codes (ECOC) methods to achieve high-level multi-class classification. When scores of multiple parameter datasets are combined, majority voting reduces noise and increases recognition accuracy. The results show that our methods can obtain prediction accuracy 62.72% on a protein test dataset, where most of the proteins have below 25% sequence identity with the proteins used in training. Jyh-Jong Tasy 蔡志忠 2005 學位論文 ; thesis 24 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 國立中正大學 === 資訊工程所 === 93 === Structure classification and prediction of protein sequence has been an important research theme in structural bioinformatics. Most discriminative methods in machine learning suffers the well-known "False Positives" problem due to the larger amount of folds available. In this thesis, we study new approach with multi-class classification methods to reduce the influence of the existing problem. We use the support vector machine (SVM) method as base classifiers and apply Error-Correcting Output Codes (ECOC) methods to achieve high-level multi-class classification. When scores of multiple parameter datasets are combined, majority voting reduces noise and increases recognition accuracy. The results show that our methods can obtain prediction accuracy 62.72% on a protein test dataset, where most of the proteins have below 25% sequence identity with the proteins used in training.
author2 Jyh-Jong Tasy
author_facet Jyh-Jong Tasy
Che-Chi Wu
吳哲奇
author Che-Chi Wu
吳哲奇
spellingShingle Che-Chi Wu
吳哲奇
A Study of Error-Correcting Output Codes for Structure Classification of Protein Sequence
author_sort Che-Chi Wu
title A Study of Error-Correcting Output Codes for Structure Classification of Protein Sequence
title_short A Study of Error-Correcting Output Codes for Structure Classification of Protein Sequence
title_full A Study of Error-Correcting Output Codes for Structure Classification of Protein Sequence
title_fullStr A Study of Error-Correcting Output Codes for Structure Classification of Protein Sequence
title_full_unstemmed A Study of Error-Correcting Output Codes for Structure Classification of Protein Sequence
title_sort study of error-correcting output codes for structure classification of protein sequence
publishDate 2005
url http://ndltd.ncl.edu.tw/handle/77344414623169086094
work_keys_str_mv AT chechiwu astudyoferrorcorrectingoutputcodesforstructureclassificationofproteinsequence
AT wúzhéqí astudyoferrorcorrectingoutputcodesforstructureclassificationofproteinsequence
AT chechiwu cuòwùgèngzhèngshūchūmǎzàidànbáizhìxùlièzhījiégòufēnlèideyánjiū
AT wúzhéqí cuòwùgèngzhèngshūchūmǎzàidànbáizhìxùlièzhījiégòufēnlèideyánjiū
AT chechiwu studyoferrorcorrectingoutputcodesforstructureclassificationofproteinsequence
AT wúzhéqí studyoferrorcorrectingoutputcodesforstructureclassificationofproteinsequence
_version_ 1716832020135411712