In-silico prediction of disorder content using hybrid sequence representation

<p>Abstract</p> <p>Background</p> <p>Intrinsically disordered proteins play important roles in various cellular activities and their prevalence was implicated in a number of human diseases. The knowledge of the content of the intrinsic disorder in proteins is useful for...

Full description

Bibliographic Details
Main Authors: Dunker A, Zhou Yaoqi, Xue Bin, Zhang Tuo, Mizianty Marcin J, Uversky Vladimir N, Kurgan Lukasz
Format: Article
Language:English
Published: BMC 2011-06-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/12/245
Description
Summary:<p>Abstract</p> <p>Background</p> <p>Intrinsically disordered proteins play important roles in various cellular activities and their prevalence was implicated in a number of human diseases. The knowledge of the content of the intrinsic disorder in proteins is useful for a variety of studies including estimation of the abundance of disorder in protein families, classes, and complete proteomes, and for the analysis of disorder-related protein functions. The above investigations currently utilize the disorder content derived from the per-residue disorder predictions. We show that these predictions may over-or under-predict the overall amount of disorder, which motivates development of novel tools for direct and accurate sequence-based prediction of the disorder content.</p> <p>Results</p> <p>We hypothesize that sequence-level aggregation of input information may provide more accurate content prediction when compared with the content extracted from the local window-based residue-level disorder predictors. We propose a novel predictor, DisCon, that takes advantage of a small set of 29 custom-designed descriptors that aggregate and hybridize information concerning sequence, evolutionary profiles, and predicted secondary structure, solvent accessibility, flexibility, and annotation of globular domains. Using these descriptors and a ridge regression model, DisCon predicts the content with low, 0.05, mean squared error and high, 0.68, Pearson correlation. This is a statistically significant improvement over the content computed from outputs of ten modern disorder predictors on a test dataset with proteins that share low sequence identity with the training sequences. The proposed predictive model is analyzed to discuss factors related to the prediction of the disorder content.</p> <p>Conclusions</p> <p>DisCon is a high-quality alternative for high-throughput annotation of the disorder content. We also empirically demonstrate that the DisCon's predictions can be used to improve binary annotations of the disordered residues from the real-value disorder propensities generated by current residue-level disorder predictors. The web server that implements the DisCon is available at <url>http://biomine.ece.ualberta.ca/DisCon/</url>.</p>
ISSN:1471-2105