Prediction of Paired Binding Regions in Protein-Protein Interactions by Sequential Pattern Mining

碩士 === 國立臺灣大學 === 生物產業機電工程學研究所 === 96 === Abstract Recent advances in fully sequenced genomes have provided a huge amount of accessible sequence information. It raises a great challenge to detect the interface residues participating in protein-protein interactions directly from the primary structure...

Full description

Bibliographic Details
Main Authors: Chien-Chieh Lin, 林千捷
Other Authors: 陳倩瑜
Format: Others
Language:en_US
Published: 2008
Online Access:http://ndltd.ncl.edu.tw/handle/14615197860443997117
id ndltd-TW-096NTU05415026
record_format oai_dc
spelling ndltd-TW-096NTU054150262015-11-25T04:04:36Z http://ndltd.ncl.edu.tw/handle/14615197860443997117 Prediction of Paired Binding Regions in Protein-Protein Interactions by Sequential Pattern Mining 利用序列特徵探勘預測蛋白質-蛋白質互動鍵結區之配對 Chien-Chieh Lin 林千捷 碩士 國立臺灣大學 生物產業機電工程學研究所 96 Abstract Recent advances in fully sequenced genomes have provided a huge amount of accessible sequence information. It raises a great challenge to detect the interface residues participating in protein-protein interactions directly from the primary structures, the amino acid sequences. To address the problem, we propose a two-phase pattern mining method to predict the interacting regions of a pair of proteins, which are known to have physical interactions, based on the co-occurrence of residues found in a set of concatenated protein homologues. Once a valid training data can be prepared, it is potential to recognize the interacting regions by the patterns that cross two proteins. In this thesis, we apply the proposed approach to 41 protein pairs from three different data sets. The performance of the proposed method is evaulated by calculating the distance between the predicted paired interacting regions from different protein chains in existing structure complexes. In summary, we predicted 128 conserved regions in the first phase of mining, where 60 of them can find their potential partners among the patterns derived in the second phase. Thirty three of the predicted interacting pairs are found to be within 10 Å in available complexes, resulting an accuracy of 56% (33/60). If we only trust the mining results from protein pairs with similar evolution rates, our method can deliver an accuracy of 72% (24/33). This reveals the potential of our method and suggests that how to incorporating other useful information to refine the current predictions deserves more studies in the future. 陳倩瑜 2008 學位論文 ; thesis 45 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 國立臺灣大學 === 生物產業機電工程學研究所 === 96 === Abstract Recent advances in fully sequenced genomes have provided a huge amount of accessible sequence information. It raises a great challenge to detect the interface residues participating in protein-protein interactions directly from the primary structures, the amino acid sequences. To address the problem, we propose a two-phase pattern mining method to predict the interacting regions of a pair of proteins, which are known to have physical interactions, based on the co-occurrence of residues found in a set of concatenated protein homologues. Once a valid training data can be prepared, it is potential to recognize the interacting regions by the patterns that cross two proteins. In this thesis, we apply the proposed approach to 41 protein pairs from three different data sets. The performance of the proposed method is evaulated by calculating the distance between the predicted paired interacting regions from different protein chains in existing structure complexes. In summary, we predicted 128 conserved regions in the first phase of mining, where 60 of them can find their potential partners among the patterns derived in the second phase. Thirty three of the predicted interacting pairs are found to be within 10 Å in available complexes, resulting an accuracy of 56% (33/60). If we only trust the mining results from protein pairs with similar evolution rates, our method can deliver an accuracy of 72% (24/33). This reveals the potential of our method and suggests that how to incorporating other useful information to refine the current predictions deserves more studies in the future.
author2 陳倩瑜
author_facet 陳倩瑜
Chien-Chieh Lin
林千捷
author Chien-Chieh Lin
林千捷
spellingShingle Chien-Chieh Lin
林千捷
Prediction of Paired Binding Regions in Protein-Protein Interactions by Sequential Pattern Mining
author_sort Chien-Chieh Lin
title Prediction of Paired Binding Regions in Protein-Protein Interactions by Sequential Pattern Mining
title_short Prediction of Paired Binding Regions in Protein-Protein Interactions by Sequential Pattern Mining
title_full Prediction of Paired Binding Regions in Protein-Protein Interactions by Sequential Pattern Mining
title_fullStr Prediction of Paired Binding Regions in Protein-Protein Interactions by Sequential Pattern Mining
title_full_unstemmed Prediction of Paired Binding Regions in Protein-Protein Interactions by Sequential Pattern Mining
title_sort prediction of paired binding regions in protein-protein interactions by sequential pattern mining
publishDate 2008
url http://ndltd.ncl.edu.tw/handle/14615197860443997117
work_keys_str_mv AT chienchiehlin predictionofpairedbindingregionsinproteinproteininteractionsbysequentialpatternmining
AT línqiānjié predictionofpairedbindingregionsinproteinproteininteractionsbysequentialpatternmining
AT chienchiehlin lìyòngxùliètèzhēngtànkānyùcèdànbáizhìdànbáizhìhùdòngjiànjiéqūzhīpèiduì
AT línqiānjié lìyòngxùliètèzhēngtànkānyùcèdànbáizhìdànbáizhìhùdòngjiànjiéqūzhīpèiduì
_version_ 1718135312127885312