Identification of hot regions in protein-protein interactions by sequential pattern mining

Abstract Background Identification of protein interacting sites is an important task in computational molecular biology. As more and more protein sequences are deposited without available structural information, it is strongly desirable to predict prote...

Full description

Bibliographic Details
Main Authors:	Lin Chien-Chieh, Laio Min-Hung, Huang Chih-Chang, Liu Baw-Jhiune, Chen Chien-Yu, Hsu Chen-Ming, Wu Tzung-Lin
Format:	Article
Language:	English
Published:	BMC 2007-05-01
Series:	BMC Bioinformatics
Online Access:	http://www.biomedcentral.com/1471-2105/8/S5/S8

id	doaj-de3b43f2473242829eb53a0c19e1bc88
record_format	Article
spelling	doaj-de3b43f2473242829eb53a0c19e1bc882020-11-24T21:13:49ZengBMCBMC Bioinformatics1471-21052007-05-018Suppl 5S810.1186/1471-2105-8-S5-S8Identification of hot regions in protein-protein interactions by sequential pattern miningLin Chien-ChiehLaio Min-HungHuang Chih-ChangLiu Baw-JhiuneChen Chien-YuHsu Chen-MingWu Tzung-Lin<p>Abstract</p> <p>Background</p> <p>Identification of protein interacting sites is an important task in computational molecular biology. As more and more protein sequences are deposited without available structural information, it is strongly desirable to predict protein binding regions by their sequences alone. This paper presents a pattern mining approach to tackle this problem. It is observed that a functional region of protein structures usually consists of several peptide segments linked with large wildcard regions. Thus, the proposed mining technology considers large irregular gaps when growing patterns, in order to find the residues that are simultaneously conserved but largely separated on the sequences. A derived pattern is called a cluster-like pattern since the discovered conserved residues are always grouped into several blocks, which each corresponds to a local conserved region on the protein sequence.</p> <p>Results</p> <p>The experiments conducted in this work demonstrate that the derived long patterns automatically discover the important residues that form one or several hot regions of protein-protein interactions. The methodology is evaluated by conducting experiments on the web server MAGIIC-PRO based on a well known benchmark containing 220 protein chains from 72 distinct complexes. Among the tested 218 proteins, there are 900 sequential blocks discovered, 4.25 blocks per protein chain on average. About 92% of the derived blocks are observed to be clustered in space with at least one of the other blocks, and about 66% of the blocks are found to be near the interface of protein-protein interactions. It is summarized that for about 83% of the tested proteins, at least two interacting blocks can be discovered by this approach.</p> <p>Conclusion</p> <p>This work aims to demonstrate that the important residues associated with the interface of protein-protein interactions may be automatically discovered by sequential pattern mining. The detected regions possess high conservation and thus are considered as the computational hot regions. This information would be useful to characterizing protein sequences, predicting protein function, finding potential partners, and facilitating protein docking for drug discovery.</p> http://www.biomedcentral.com/1471-2105/8/S5/S8
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Lin Chien-Chieh Laio Min-Hung Huang Chih-Chang Liu Baw-Jhiune Chen Chien-Yu Hsu Chen-Ming Wu Tzung-Lin
spellingShingle	Lin Chien-Chieh Laio Min-Hung Huang Chih-Chang Liu Baw-Jhiune Chen Chien-Yu Hsu Chen-Ming Wu Tzung-Lin Identification of hot regions in protein-protein interactions by sequential pattern mining BMC Bioinformatics
author_facet	Lin Chien-Chieh Laio Min-Hung Huang Chih-Chang Liu Baw-Jhiune Chen Chien-Yu Hsu Chen-Ming Wu Tzung-Lin
author_sort	Lin Chien-Chieh
title	Identification of hot regions in protein-protein interactions by sequential pattern mining
title_short	Identification of hot regions in protein-protein interactions by sequential pattern mining
title_full	Identification of hot regions in protein-protein interactions by sequential pattern mining
title_fullStr	Identification of hot regions in protein-protein interactions by sequential pattern mining
title_full_unstemmed	Identification of hot regions in protein-protein interactions by sequential pattern mining
title_sort	identification of hot regions in protein-protein interactions by sequential pattern mining
publisher	BMC
series	BMC Bioinformatics
issn	1471-2105
publishDate	2007-05-01
description	<p>Abstract</p> <p>Background</p> <p>Identification of protein interacting sites is an important task in computational molecular biology. As more and more protein sequences are deposited without available structural information, it is strongly desirable to predict protein binding regions by their sequences alone. This paper presents a pattern mining approach to tackle this problem. It is observed that a functional region of protein structures usually consists of several peptide segments linked with large wildcard regions. Thus, the proposed mining technology considers large irregular gaps when growing patterns, in order to find the residues that are simultaneously conserved but largely separated on the sequences. A derived pattern is called a cluster-like pattern since the discovered conserved residues are always grouped into several blocks, which each corresponds to a local conserved region on the protein sequence.</p> <p>Results</p> <p>The experiments conducted in this work demonstrate that the derived long patterns automatically discover the important residues that form one or several hot regions of protein-protein interactions. The methodology is evaluated by conducting experiments on the web server MAGIIC-PRO based on a well known benchmark containing 220 protein chains from 72 distinct complexes. Among the tested 218 proteins, there are 900 sequential blocks discovered, 4.25 blocks per protein chain on average. About 92% of the derived blocks are observed to be clustered in space with at least one of the other blocks, and about 66% of the blocks are found to be near the interface of protein-protein interactions. It is summarized that for about 83% of the tested proteins, at least two interacting blocks can be discovered by this approach.</p> <p>Conclusion</p> <p>This work aims to demonstrate that the important residues associated with the interface of protein-protein interactions may be automatically discovered by sequential pattern mining. The detected regions possess high conservation and thus are considered as the computational hot regions. This information would be useful to characterizing protein sequences, predicting protein function, finding potential partners, and facilitating protein docking for drug discovery.</p>
url	http://www.biomedcentral.com/1471-2105/8/S5/S8
work_keys_str_mv	AT linchienchieh identificationofhotregionsinproteinproteininteractionsbysequentialpatternmining AT laiominhung identificationofhotregionsinproteinproteininteractionsbysequentialpatternmining AT huangchihchang identificationofhotregionsinproteinproteininteractionsbysequentialpatternmining AT liubawjhiune identificationofhotregionsinproteinproteininteractionsbysequentialpatternmining AT chenchienyu identificationofhotregionsinproteinproteininteractionsbysequentialpatternmining AT hsuchenming identificationofhotregionsinproteinproteininteractionsbysequentialpatternmining AT wutzunglin identificationofhotregionsinproteinproteininteractionsbysequentialpatternmining
_version_	1716748040820228096

Identification of hot regions in protein-protein interactions by sequential pattern mining

Similar Items