Summary: | 博士 === 國立交通大學 === 生物資訊研究所 === 96 === Disulfide bonds play important roles in both stabilizing the protein conformations and regulating protein functions. The ability to infer disulfide connectivity directly from protein sequences will be useful in structural modeling and in functional analysis. However, the prediction of disulfide connectivity from protein sequences presents a major challenge to computational biologists due to the nonlocal nature of disulfide connectivity, i.e., close spatial proximity of the cysteine pair that forms a disulfide bond does not necessarily imply short sequence separation between the cysteines. Recently, Chen and Hwang have developed an approach with each distinct disulfide pattern defined as a class, and treat the problem as a multi-class classification using the support vector machine technique. Their method significantly improves the prediction accuracy of disulfide connectivity for a standard benchmark dataset sharing less than 30% sequence identity. However, this method suffers from the drawback that the number of possible disulfide patterns grows rapidly when disulfide bonds increase. The performance of the method quickly drops off as the number of disulfide bonds increases. In this work, we represent the disulfide patterns in terms of cysteine pairs. We predict the bonding states of the cysteine pairs using support vector machine together with feature selection through the genetic algorithm. Since the number of bonding states of the cysteine pairs remains constant independent of the number of disulfide bonds, we avoid the problem of class explosion upon larger number of disulfide bonds. Consequently, we construct the connectivity matrix from the bonding states of the cysteine pairs to predict the complete disulfide pattern. Our approach outperforms other current approaches and may provide a useful tool in the study of disulfide proteins.
Identify functional structural motifs from protein structures of unknown function becomes increasingly important in recent years due to the progress of the structural genomics projects. Though some structural patterns such as the Asp-His-Ser catalytic triad are easy to be detected because of their conserved residues and stringently constrained geometry, it is usually more challenging to detect a general structural motifs like, for example, the bba-metal binding motif, which has a much more variable conformation and sequence. At present, the identification of these motifs usually relies on manual procedures based on different structure and sequence analysis tools. In this study, we developed a structural alignment algorithm combining both structural and sequence information to identify the local structure motifs. We applied our method to two test cases: the bba-metal binding motif and the treble clef motif. The bba-metal binding motif plays an important role in non-specific DNA interactions and cleavage in host defense and apoptosis. The treble clef motif is a zinc-binding motif adaptable to diverse functions such as the binding of nucleic acid and hydrolysis of phosphodiester bonds. Our results are encouraging, indicating that we can effectively identify these structural motifs in an automatic fashion. Our method may provide a useful means for automatic functional annotation through detecting structural motifs associated with particular functions.
Recently, Shih et al. have developed a method (Shih et al. Proteins: Structure, Function, and Bioinformatics 2007) to compute correlation of fluctuations. This method, referred to as the protein fixed-point model, is based on the positional vectors of atoms issuing from the fixed point, which is the point of the least fluctuations in proteins. One corollary from this model is that atoms lying on the same shell centered at the fixed point will have the same thermal fluctuations. In practice, this model provides a convenient way to compute the average dynamical properties of proteins directly from the geometrical shapes of proteins without the need of any mechanical models, and hence no trajectory integration or sophisticated matrix operations are needed. As a result, it is more efficient than molecular dynamics simulation or normal mode analysis. Though in the previous study the protein fixed-point model has been successfully applied to a number of proteins of various folds, it is not clear to what extent this model can be applied. In this report, we carried out comprehensive analysis of the protein fixed-point model for a dataset comprising high-resolution X-ray structures with pairwise sequence identity >=25%. We found that in most cases the protein fixed-point model works well. However, in case of proteins comprising multiple domains, each domain should be treated separately as an independent dynamical module with its own fixed point; and in case of the protein complex comprising a number of subunits, if functioning as a biological unit, the whole complex should be considered as one single dynamical module with one fixed point. Under such considerations, the resultant correlation coefficient between the computed and the X-ray structural B-factors for the data set is 0.59 and 75% (727/972) of proteins with a correlation coefficient >=0.5. Our result shows that the fixed-point model is indeed quite general and will be a useful tool for high throughput analysis of dynamical properties of proteins.
|