A deep learning method to more accurately recall known lysine acetylation sites

Abstract Background Lysine acetylation in protein is one of the most important post-translational modifications (PTMs). It plays an important role in essential biological processes and is related to various diseases. To obtain a comprehensive understanding of regulatory mechanism of lysine acetylati...

Full description

Bibliographic Details
Main Authors: Meiqi Wu, Yingxi Yang, Hui Wang, Yan Xu
Format: Article
Language:English
Published: BMC 2019-01-01
Series:BMC Bioinformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12859-019-2632-9
id doaj-9dd8b78065654fda831c5370138e9625
record_format Article
spelling doaj-9dd8b78065654fda831c5370138e96252020-11-25T01:12:52ZengBMCBMC Bioinformatics1471-21052019-01-0120111110.1186/s12859-019-2632-9A deep learning method to more accurately recall known lysine acetylation sitesMeiqi Wu0Yingxi Yang1Hui Wang2Yan Xu3Department of Information and Computer Science, University of Science and Technology BeijingDepartment of Information and Computer Science, University of Science and Technology BeijingInstitute of Computing Technology, Chinese Academy of SciencesDepartment of Information and Computer Science, University of Science and Technology BeijingAbstract Background Lysine acetylation in protein is one of the most important post-translational modifications (PTMs). It plays an important role in essential biological processes and is related to various diseases. To obtain a comprehensive understanding of regulatory mechanism of lysine acetylation, the key is to identify lysine acetylation sites. Previously, several shallow machine learning algorithms had been applied to predict lysine modification sites in proteins. However, shallow machine learning has some disadvantages. For instance, it is not as effective as deep learning for processing big data. Results In this work, a novel predictor named DeepAcet was developed to predict acetylation sites. Six encoding schemes were adopted, including a one-hot, BLOSUM62 matrix, a composition of K-space amino acid pairs, information gain, physicochemical properties, and a position specific scoring matrix to represent the modified residues. A multilayer perceptron (MLP) was utilized to construct a model to predict lysine acetylation sites in proteins with many different features. We also integrated all features and implemented the feature selection method to select a feature set that contained 2199 features. As a result, the best prediction achieved 84.95% accuracy, 83.45% specificity, 86.44% sensitivity, 0.8540 AUC, and 0.6993 MCC in a 10-fold cross-validation. For an independent test set, the prediction achieved 84.87% accuracy, 83.46% specificity, 86.28% sensitivity, 0.8407 AUC, and 0.6977 MCC. Conclusion The predictive performance of our DeepAcet is better than that of other existing methods. DeepAcet can be freely downloaded from https://github.com/Sunmile/DeepAcet.http://link.springer.com/article/10.1186/s12859-019-2632-9Lysine acetylationPTMsDeep learning
collection DOAJ
language English
format Article
sources DOAJ
author Meiqi Wu
Yingxi Yang
Hui Wang
Yan Xu
spellingShingle Meiqi Wu
Yingxi Yang
Hui Wang
Yan Xu
A deep learning method to more accurately recall known lysine acetylation sites
BMC Bioinformatics
Lysine acetylation
PTMs
Deep learning
author_facet Meiqi Wu
Yingxi Yang
Hui Wang
Yan Xu
author_sort Meiqi Wu
title A deep learning method to more accurately recall known lysine acetylation sites
title_short A deep learning method to more accurately recall known lysine acetylation sites
title_full A deep learning method to more accurately recall known lysine acetylation sites
title_fullStr A deep learning method to more accurately recall known lysine acetylation sites
title_full_unstemmed A deep learning method to more accurately recall known lysine acetylation sites
title_sort deep learning method to more accurately recall known lysine acetylation sites
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2019-01-01
description Abstract Background Lysine acetylation in protein is one of the most important post-translational modifications (PTMs). It plays an important role in essential biological processes and is related to various diseases. To obtain a comprehensive understanding of regulatory mechanism of lysine acetylation, the key is to identify lysine acetylation sites. Previously, several shallow machine learning algorithms had been applied to predict lysine modification sites in proteins. However, shallow machine learning has some disadvantages. For instance, it is not as effective as deep learning for processing big data. Results In this work, a novel predictor named DeepAcet was developed to predict acetylation sites. Six encoding schemes were adopted, including a one-hot, BLOSUM62 matrix, a composition of K-space amino acid pairs, information gain, physicochemical properties, and a position specific scoring matrix to represent the modified residues. A multilayer perceptron (MLP) was utilized to construct a model to predict lysine acetylation sites in proteins with many different features. We also integrated all features and implemented the feature selection method to select a feature set that contained 2199 features. As a result, the best prediction achieved 84.95% accuracy, 83.45% specificity, 86.44% sensitivity, 0.8540 AUC, and 0.6993 MCC in a 10-fold cross-validation. For an independent test set, the prediction achieved 84.87% accuracy, 83.46% specificity, 86.28% sensitivity, 0.8407 AUC, and 0.6977 MCC. Conclusion The predictive performance of our DeepAcet is better than that of other existing methods. DeepAcet can be freely downloaded from https://github.com/Sunmile/DeepAcet.
topic Lysine acetylation
PTMs
Deep learning
url http://link.springer.com/article/10.1186/s12859-019-2632-9
work_keys_str_mv AT meiqiwu adeeplearningmethodtomoreaccuratelyrecallknownlysineacetylationsites
AT yingxiyang adeeplearningmethodtomoreaccuratelyrecallknownlysineacetylationsites
AT huiwang adeeplearningmethodtomoreaccuratelyrecallknownlysineacetylationsites
AT yanxu adeeplearningmethodtomoreaccuratelyrecallknownlysineacetylationsites
AT meiqiwu deeplearningmethodtomoreaccuratelyrecallknownlysineacetylationsites
AT yingxiyang deeplearningmethodtomoreaccuratelyrecallknownlysineacetylationsites
AT huiwang deeplearningmethodtomoreaccuratelyrecallknownlysineacetylationsites
AT yanxu deeplearningmethodtomoreaccuratelyrecallknownlysineacetylationsites
_version_ 1725164654788870144