Protein Quaternary Structure Classification: Using Bootstrapping for Model Selection

碩士 === 國立中興大學 === 基因體暨生物資訊學研究所 === 100 === Protein quaternary structure complex is also known multimer, which plays an important role in the cell. Such as dimer structure of the transcription factor involved in gene regulation, but trimer structure of the virus infection associated glycoprotein is r...

Full description

Bibliographic Details
Main Authors: Shao-yu Ho, 何紹瑜
Other Authors: 朱彥煒
Format: Others
Language:en_US
Published: 2012
Online Access:http://ndltd.ncl.edu.tw/handle/20846528102198757572
Description
Summary:碩士 === 國立中興大學 === 基因體暨生物資訊學研究所 === 100 === Protein quaternary structure complex is also known multimer, which plays an important role in the cell. Such as dimer structure of the transcription factor involved in gene regulation, but trimer structure of the virus infection associated glycoprotein is related to the system with the human immunodeficiency virus. Therefore, if we can classification the protein quaternary structure complex for post genome era of proteomics research is of great help. Nowadays, the classification systems among protein quaternary structures have not been widely developed yet, therefore, in this study, we designed the architecture of the two layer machine learning and developed the classification system, PClass. Protein quaternary structure of the complex is divided into five categories, including monomer, dimer, trimer, tetramer and other subunits class. The first layer in the framework of the bootstrap method with support vector machine to propose a new model selection method, each type of complex according to sequences, entropy and accessible surface area as the feature encoding, generating a plurality of feature models and through the evaluation way to select the optimal model of effectiveness as each kind of complex feature model. In this stage, the best performance can reach as high as 70% of MCC. Then the second layer construction combines the first layer model to integrate mechanisms and use of six machine learning methods to improve the prediction performance, this system can be improved over 10% in MCC. Finally, we analyzed the performance of our classification system by transcription factor in dimer structure and virus infection associated glycoprotein in trimer structure.