Prediction of Human Protein-Protein Interactions Using SupportVector Machines

博士 === 臺灣大學 === 資訊工程學研究所 === 96 === The recent increase in the use of high-throughput two-hybrid analysis has generated a large amount of data on protein interactions. Specifically, the availability of information about experimental protein-protein interactions and other protein features on the Inte...

Full description

Bibliographic Details
Main Authors: Tao-Wei Huang, 黃韜維
Other Authors: 高成炎
Format: Others
Language:en_US
Published: 2008
Online Access:http://ndltd.ncl.edu.tw/handle/60898753640594858720
Description
Summary:博士 === 臺灣大學 === 資訊工程學研究所 === 96 === The recent increase in the use of high-throughput two-hybrid analysis has generated a large amount of data on protein interactions. Specifically, the availability of information about experimental protein-protein interactions and other protein features on the Internet enables human protein-protein interactions to be computationally predicted from co-evolution events (interolog). Computational methods must be developed to integrate these heterogeneous biological data to facilitate the maximum accuracy of the human protein interaction prediction. In knowledge-based study, we proposes a relative conservation score by identifying maximal quasi-cliques in protein interaction networks, and addressing of other interaction features to formulate a scoring method. The scoring method can be adopted to discover which protein pairs are the most likely to interact in multiple protein pairs. The predicted human protein-protein interactions associated with confidence scores are derived from six eukaryotic organisms - rat, mouse, fly, worm, thale cress and baker''s yeast. The evaluation of our proposed method using functional keyword and gene ontology annotations indicates that some confidence is justified in the accuracy of the predicted interactions. Comparisons among existing methods also reveal that the proposed method predicts human protein-protein interactions more accurately than other interolog-based methods. This study considers protein interaction features, including interolog, spatial proximity (sub-cellular localization, tissue-specificity), temporal synchronicity (the cell-cycle stage), and domain-domain pair combinations. Using these $6$ protein features, and combination of hydrophobic, charge, and volume amino acid property as $3$ sets of $16$-dimension features to construct committee models of support vector machines (SVMs). The final $5$-fold cross validation testing for $10$ different size test sets revealed that the accuracy of test set can be obtained above 90\%. Moreover, the analytical comparisons also suggested our proposed method have higher accuracy than other SVM-based methods.