Summary: | As a novel type of post-translational modification, lysine 2-Hydroxyisobutyrylation (Khib) plays an important role in gene transcription and signal transduction. In order to understand its regulatory mechanism, the essential step is the recognition of Khib sites. Thousands of Khib sites have been experimentally verified across five different species. However, there are only a couple traditional machine-learning algorithms developed to predict Khib sites for limited species, lacking a general prediction algorithm. We constructed a deep-learning algorithm based on convolutional neural network with the one-hot encoding approach, dubbed CNNOH. It performs favorably to the traditional machine-learning models and other deep-learning models across different species, in terms of cross-validation and independent test. The area under the ROC curve (AUC) values for CNNOH ranged from 0.82 to 0.87 for different organisms, which is superior to the currently available Khib predictors. Moreover, we developed the general model based on the integrated data from multiple species and it showed great universality and effectiveness with the AUC values in the range of 0.79–0.87. Accordingly, we constructed the on-line prediction tool dubbed DeepKhib for easily identifying Khib sites, which includes both species-specific and general models. DeepKhib is available at http://www.bioinfogo.org/DeepKhib.
|