Fine-grained Protein Fold Assignment by Support Vector Machines using generalized n-peptide Coding Schemes and jury voting from multiple parameter sets

碩士 === 國立清華大學 === 生命科學系 === 90 === Fold assignment directly from sequences is valuable in the prediction of protein structures. Unlike secondary structure prediction, where a local coding scheme of sequence information will usually suffice, fold identification calls for global protein des...

Full description

Bibliographic Details
Main Authors: Chin-Sheng Yu, 游景盛
Other Authors: P. C. Lyu
Format: Others
Language:en_US
Published: 2002
Online Access:http://ndltd.ncl.edu.tw/handle/57177749858969618310
Description
Summary:碩士 === 國立清華大學 === 生命科學系 === 90 === Fold assignment directly from sequences is valuable in the prediction of protein structures. Unlike secondary structure prediction, where a local coding scheme of sequence information will usually suffice, fold identification calls for global protein descriptors as well local descriptors for the whole protein sequences. Previous studies have shown that machine learning methods can yield reasonable prediction accuracy of fold assignment directly from sequences by a variety of global sequence coding schemes. In this thesis, using global protein descriptors based on -peptide distribution, we apply the support vector machine method (SVM) to the 27 most populated folds that contain 386 representative proteins in the Structural Classification of Protein (SCOP) database. Our approach achieved a prediction accuracy 69.6% on an independent set, and 55.5% in the ten-fold cross validation, both of which are an order of magnitude higher than the current methods. Our results show that SVM using suitable global sequence coding schemes can significantly improve prediction in fold recognition from sequences, and should offer a useful tool in structure modeling.