Bacterial Immunogenicity Prediction by Machine Learning Methods

The identification of protective immunogens is the most important and vigorous initial step in the long-lasting and expensive process of vaccine design and development. Machine learning (ML) methods are very effective in data mining and in the analysis of big data such as microbial proteomes. They a...

Full description

Bibliographic Details
Main Authors:	Ivan Dimitrov, Nevena Zaharieva, Irini Doytchinova
Format:	Article
Language:	English
Published:	MDPI AG 2020-11-01
Series:	Vaccines
Subjects:	protective immunogens machine learning immunogenicity prediction
Online Access:	https://www.mdpi.com/2076-393X/8/4/709

id	doaj-c8bb44b963e8459ca46b1ca615387793
record_format	Article
spelling	doaj-c8bb44b963e8459ca46b1ca6153877932020-12-01T00:00:29ZengMDPI AGVaccines2076-393X2020-11-01870970910.3390/vaccines8040709Bacterial Immunogenicity Prediction by Machine Learning MethodsIvan Dimitrov0Nevena Zaharieva1Irini Doytchinova2Faculty of Pharmacy, Medical University of Sofia, 1000 Sofia, BulgariaFaculty of Pharmacy, Medical University of Sofia, 1000 Sofia, BulgariaFaculty of Pharmacy, Medical University of Sofia, 1000 Sofia, BulgariaThe identification of protective immunogens is the most important and vigorous initial step in the long-lasting and expensive process of vaccine design and development. Machine learning (ML) methods are very effective in data mining and in the analysis of big data such as microbial proteomes. They are able to significantly reduce the experimental work for discovering novel vaccine candidates. Here, we applied six supervised ML methods (partial least squares-based discriminant analysis, <i>k</i> nearest neighbor (<i>k</i>NN), random forest (RF), support vector machine (SVM), random subspace method (RSM), and extreme gradient boosting) on a set of 317 known bacterial immunogens and 317 bacterial non-immunogens and derived models for immunogenicity prediction. The models were validated by internal cross-validation in 10 groups from the training set and by the external test set. All of them showed good predictive ability, but the xgboost model displays the most prominent ability to identify immunogens by recognizing 84% of the known immunogens in the test set. The combined RSM-<i>k</i>NN model was the best in the recognition of non-immunogens, identifying 92% of them in the test set. The three best performing ML models (xgboost, RSM-<i>k</i>NN, and RF) were implemented in the new version of the server VaxiJen, and the prediction of bacterial immunogens is now based on majority voting.https://www.mdpi.com/2076-393X/8/4/709protective immunogensmachine learningimmunogenicity prediction
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Ivan Dimitrov Nevena Zaharieva Irini Doytchinova
spellingShingle	Ivan Dimitrov Nevena Zaharieva Irini Doytchinova Bacterial Immunogenicity Prediction by Machine Learning Methods Vaccines protective immunogens machine learning immunogenicity prediction
author_facet	Ivan Dimitrov Nevena Zaharieva Irini Doytchinova
author_sort	Ivan Dimitrov
title	Bacterial Immunogenicity Prediction by Machine Learning Methods
title_short	Bacterial Immunogenicity Prediction by Machine Learning Methods
title_full	Bacterial Immunogenicity Prediction by Machine Learning Methods
title_fullStr	Bacterial Immunogenicity Prediction by Machine Learning Methods
title_full_unstemmed	Bacterial Immunogenicity Prediction by Machine Learning Methods
title_sort	bacterial immunogenicity prediction by machine learning methods
publisher	MDPI AG
series	Vaccines
issn	2076-393X
publishDate	2020-11-01
description	The identification of protective immunogens is the most important and vigorous initial step in the long-lasting and expensive process of vaccine design and development. Machine learning (ML) methods are very effective in data mining and in the analysis of big data such as microbial proteomes. They are able to significantly reduce the experimental work for discovering novel vaccine candidates. Here, we applied six supervised ML methods (partial least squares-based discriminant analysis, <i>k</i> nearest neighbor (<i>k</i>NN), random forest (RF), support vector machine (SVM), random subspace method (RSM), and extreme gradient boosting) on a set of 317 known bacterial immunogens and 317 bacterial non-immunogens and derived models for immunogenicity prediction. The models were validated by internal cross-validation in 10 groups from the training set and by the external test set. All of them showed good predictive ability, but the xgboost model displays the most prominent ability to identify immunogens by recognizing 84% of the known immunogens in the test set. The combined RSM-<i>k</i>NN model was the best in the recognition of non-immunogens, identifying 92% of them in the test set. The three best performing ML models (xgboost, RSM-<i>k</i>NN, and RF) were implemented in the new version of the server VaxiJen, and the prediction of bacterial immunogens is now based on majority voting.
topic	protective immunogens machine learning immunogenicity prediction
url	https://www.mdpi.com/2076-393X/8/4/709
work_keys_str_mv	AT ivandimitrov bacterialimmunogenicitypredictionbymachinelearningmethods AT nevenazaharieva bacterialimmunogenicitypredictionbymachinelearningmethods AT irinidoytchinova bacterialimmunogenicitypredictionbymachinelearningmethods
_version_	1724411456787578880

Bacterial Immunogenicity Prediction by Machine Learning Methods

Similar Items