DeepKcrot: A Deep-Learning Architecture for General and Species-Specific Lysine Crotonylation Site Prediction

Lysine crotonylation (Kcrot), as a post-translational modification (PTM) originally identified at histone proteins, is involved in diverse biological processes. Several conventional machine-learning (ML) predictors were developed based on the Kcrot sites from histone proteins. Recently, thousands of...

Full description

Bibliographic Details
Main Authors: Xilin Wei, Yutong Sha, Yiming Zhao, Ningning He, Lei Li
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9385145/
id doaj-43a7e29506e641528b5ab2ce82e8b0d8
record_format Article
spelling doaj-43a7e29506e641528b5ab2ce82e8b0d82021-04-05T17:38:18ZengIEEEIEEE Access2169-35362021-01-019495044951310.1109/ACCESS.2021.30684139385145DeepKcrot: A Deep-Learning Architecture for General and Species-Specific Lysine Crotonylation Site PredictionXilin Wei0https://orcid.org/0000-0003-1106-4811Yutong Sha1Yiming Zhao2https://orcid.org/0000-0001-9930-8635Ningning He3https://orcid.org/0000-0001-9453-6911Lei Li4https://orcid.org/0000-0002-0956-1205School of Data Science and Software Engineering, Qingdao University, Qingdao, ChinaSchool of Basic Medicine, Qingdao University, Qingdao, ChinaSchool of Data Science and Software Engineering, Qingdao University, Qingdao, ChinaSchool of Basic Medicine, Qingdao University, Qingdao, ChinaSchool of Data Science and Software Engineering, Qingdao University, Qingdao, ChinaLysine crotonylation (Kcrot), as a post-translational modification (PTM) originally identified at histone proteins, is involved in diverse biological processes. Several conventional machine-learning (ML) predictors were developed based on the Kcrot sites from histone proteins. Recently, thousands of Kcrot sites have been experimentally verified on non-histone proteins from multiple species. Accordingly, a few predictors have been developed for predicting the Krot sites for specific organisms (i.e. humans and papaya). Nevertheless, there is a lack of research on the comparison of the crotonylomes of different organisms. Here, we collected around 20,000 Kcrot sites experimentally identified from four different species as the benchmark data set. We present the deep-learning (DL) architecture dubbed DeepKcrot for predicting Kcrot sites on the proteomes across various species. DeepKcrot includes species-specific and general classifiers using a convolutional neural network with the word embedding (CNN<sub>WE</sub>) encoding approach. CNN<sub>WE</sub> performs better than both the traditional ML-based and other DL-based classifiers in terms of ten-fold cross-validation and independent test, independent of the size of the training set. Additionally, cross-species performance for each species-specific predictor is not as good as the self-species performance whereas the cross-species performance generally increases with the size of the training dataset. Moreover, the predictors developed based on the non-histone Kcrot sites are unsuccessful for the histone Kcrot prediction, suggesting that the Kcrot-containing peptides from non-histone and histone proteins have significantly different characteristics and data integration is required. Overall, DeepKcrot is an efficient prediction tool and freely available at <uri>http://www.bioinfogo.org/deepkcrot</uri>.https://ieeexplore.ieee.org/document/9385145/Deep learningconvolutional neural networklysine crotonylationnon-histone proteinrandom forest
collection DOAJ
language English
format Article
sources DOAJ
author Xilin Wei
Yutong Sha
Yiming Zhao
Ningning He
Lei Li
spellingShingle Xilin Wei
Yutong Sha
Yiming Zhao
Ningning He
Lei Li
DeepKcrot: A Deep-Learning Architecture for General and Species-Specific Lysine Crotonylation Site Prediction
IEEE Access
Deep learning
convolutional neural network
lysine crotonylation
non-histone protein
random forest
author_facet Xilin Wei
Yutong Sha
Yiming Zhao
Ningning He
Lei Li
author_sort Xilin Wei
title DeepKcrot: A Deep-Learning Architecture for General and Species-Specific Lysine Crotonylation Site Prediction
title_short DeepKcrot: A Deep-Learning Architecture for General and Species-Specific Lysine Crotonylation Site Prediction
title_full DeepKcrot: A Deep-Learning Architecture for General and Species-Specific Lysine Crotonylation Site Prediction
title_fullStr DeepKcrot: A Deep-Learning Architecture for General and Species-Specific Lysine Crotonylation Site Prediction
title_full_unstemmed DeepKcrot: A Deep-Learning Architecture for General and Species-Specific Lysine Crotonylation Site Prediction
title_sort deepkcrot: a deep-learning architecture for general and species-specific lysine crotonylation site prediction
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2021-01-01
description Lysine crotonylation (Kcrot), as a post-translational modification (PTM) originally identified at histone proteins, is involved in diverse biological processes. Several conventional machine-learning (ML) predictors were developed based on the Kcrot sites from histone proteins. Recently, thousands of Kcrot sites have been experimentally verified on non-histone proteins from multiple species. Accordingly, a few predictors have been developed for predicting the Krot sites for specific organisms (i.e. humans and papaya). Nevertheless, there is a lack of research on the comparison of the crotonylomes of different organisms. Here, we collected around 20,000 Kcrot sites experimentally identified from four different species as the benchmark data set. We present the deep-learning (DL) architecture dubbed DeepKcrot for predicting Kcrot sites on the proteomes across various species. DeepKcrot includes species-specific and general classifiers using a convolutional neural network with the word embedding (CNN<sub>WE</sub>) encoding approach. CNN<sub>WE</sub> performs better than both the traditional ML-based and other DL-based classifiers in terms of ten-fold cross-validation and independent test, independent of the size of the training set. Additionally, cross-species performance for each species-specific predictor is not as good as the self-species performance whereas the cross-species performance generally increases with the size of the training dataset. Moreover, the predictors developed based on the non-histone Kcrot sites are unsuccessful for the histone Kcrot prediction, suggesting that the Kcrot-containing peptides from non-histone and histone proteins have significantly different characteristics and data integration is required. Overall, DeepKcrot is an efficient prediction tool and freely available at <uri>http://www.bioinfogo.org/deepkcrot</uri>.
topic Deep learning
convolutional neural network
lysine crotonylation
non-histone protein
random forest
url https://ieeexplore.ieee.org/document/9385145/
work_keys_str_mv AT xilinwei deepkcrotadeeplearningarchitectureforgeneralandspeciesspecificlysinecrotonylationsiteprediction
AT yutongsha deepkcrotadeeplearningarchitectureforgeneralandspeciesspecificlysinecrotonylationsiteprediction
AT yimingzhao deepkcrotadeeplearningarchitectureforgeneralandspeciesspecificlysinecrotonylationsiteprediction
AT ningninghe deepkcrotadeeplearningarchitectureforgeneralandspeciesspecificlysinecrotonylationsiteprediction
AT leili deepkcrotadeeplearningarchitectureforgeneralandspeciesspecificlysinecrotonylationsiteprediction
_version_ 1721539149389889536