Enhancing the Biological Relevance of Machine Learning Classifiers for Reverse Vaccinology
Reverse vaccinology (RV) is a bioinformatics approach that can predict antigens with protective potential from the protein coding genomes of bacterial pathogens for subunit vaccine design. RV has become firmly established following the development of the BEXSERO® vaccine against Neisseria meningitid...
Main Authors: | , , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2017-02-01
|
Series: | International Journal of Molecular Sciences |
Subjects: | |
Online Access: | http://www.mdpi.com/1422-0067/18/2/312 |
id |
doaj-392e525a902443cca08fb5901a171dab |
---|---|
record_format |
Article |
spelling |
doaj-392e525a902443cca08fb5901a171dab2020-11-25T01:29:28ZengMDPI AGInternational Journal of Molecular Sciences1422-00672017-02-0118231210.3390/ijms18020312ijms18020312Enhancing the Biological Relevance of Machine Learning Classifiers for Reverse VaccinologyAshley I. Heinson0Yawwani Gunawardana1Bastiaan Moesker2Carmen C. Denman Hume3Elena Vataga4Yper Hall5Elena Stylianou6Helen McShane7Ann Williams8Mahesan Niranjan9Christopher H. Woelk10Faculty of Medicine, University of Southampton, Southampton SO17 1BJ, UKFaculty of Medicine, University of Southampton, Southampton SO17 1BJ, UKFaculty of Medicine, University of Southampton, Southampton SO17 1BJ, UKLondon School of Hygiene and Tropical Medicine (LSHTM), Department of Pathogen Molecular BiologyLondon WC1E 7HT, UKSolutions, University of Southampton, Southampton SO17 1BJ, UKPublic Health England, National Infection Service, Porton Down Salisbury, SP4 0JG, UKThe Jenner Institute, University of Oxford, Oxford OX3 7DQ, UKThe Jenner Institute, University of Oxford, Oxford OX3 7DQ, UKPublic Health England, National Infection Service, Porton Down Salisbury, SP4 0JG, UKDepartment of Electronics and Computer Science, University of Southampton, Southampton SO17 1BJ, UKFaculty of Medicine, University of Southampton, Southampton SO17 1BJ, UKReverse vaccinology (RV) is a bioinformatics approach that can predict antigens with protective potential from the protein coding genomes of bacterial pathogens for subunit vaccine design. RV has become firmly established following the development of the BEXSERO® vaccine against Neisseria meningitidis serogroup B. RV studies have begun to incorporate machine learning (ML) techniques to distinguish bacterial protective antigens (BPAs) from non-BPAs. This research contributes significantly to the RV field by using permutation analysis to demonstrate that a signal for protective antigens can be curated from published data. Furthermore, the effects of the following on an ML approach to RV were also assessed: nested cross-validation, balancing selection of non-BPAs for subcellular localization, increasing the training data, and incorporating greater numbers of protein annotation tools for feature generation. These enhancements yielded a support vector machine (SVM) classifier that could discriminate BPAs (n = 200) from non-BPAs (n = 200) with an area under the curve (AUC) of 0.787. In addition, hierarchical clustering of BPAs revealed that intracellular BPAs clustered separately from extracellular BPAs. However, no immediate benefit was derived when training SVM classifiers on data sets exclusively containing intra- or extracellular BPAs. In conclusion, this work demonstrates that ML classifiers have great utility in RV approaches and will lead to new subunit vaccines in the future.http://www.mdpi.com/1422-0067/18/2/312reverse vaccinologymachine learningsupport vector machinebacterial protective antigenbacterial pathogen |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Ashley I. Heinson Yawwani Gunawardana Bastiaan Moesker Carmen C. Denman Hume Elena Vataga Yper Hall Elena Stylianou Helen McShane Ann Williams Mahesan Niranjan Christopher H. Woelk |
spellingShingle |
Ashley I. Heinson Yawwani Gunawardana Bastiaan Moesker Carmen C. Denman Hume Elena Vataga Yper Hall Elena Stylianou Helen McShane Ann Williams Mahesan Niranjan Christopher H. Woelk Enhancing the Biological Relevance of Machine Learning Classifiers for Reverse Vaccinology International Journal of Molecular Sciences reverse vaccinology machine learning support vector machine bacterial protective antigen bacterial pathogen |
author_facet |
Ashley I. Heinson Yawwani Gunawardana Bastiaan Moesker Carmen C. Denman Hume Elena Vataga Yper Hall Elena Stylianou Helen McShane Ann Williams Mahesan Niranjan Christopher H. Woelk |
author_sort |
Ashley I. Heinson |
title |
Enhancing the Biological Relevance of Machine Learning Classifiers for Reverse Vaccinology |
title_short |
Enhancing the Biological Relevance of Machine Learning Classifiers for Reverse Vaccinology |
title_full |
Enhancing the Biological Relevance of Machine Learning Classifiers for Reverse Vaccinology |
title_fullStr |
Enhancing the Biological Relevance of Machine Learning Classifiers for Reverse Vaccinology |
title_full_unstemmed |
Enhancing the Biological Relevance of Machine Learning Classifiers for Reverse Vaccinology |
title_sort |
enhancing the biological relevance of machine learning classifiers for reverse vaccinology |
publisher |
MDPI AG |
series |
International Journal of Molecular Sciences |
issn |
1422-0067 |
publishDate |
2017-02-01 |
description |
Reverse vaccinology (RV) is a bioinformatics approach that can predict antigens with protective potential from the protein coding genomes of bacterial pathogens for subunit vaccine design. RV has become firmly established following the development of the BEXSERO® vaccine against Neisseria meningitidis serogroup B. RV studies have begun to incorporate machine learning (ML) techniques to distinguish bacterial protective antigens (BPAs) from non-BPAs. This research contributes significantly to the RV field by using permutation analysis to demonstrate that a signal for protective antigens can be curated from published data. Furthermore, the effects of the following on an ML approach to RV were also assessed: nested cross-validation, balancing selection of non-BPAs for subcellular localization, increasing the training data, and incorporating greater numbers of protein annotation tools for feature generation. These enhancements yielded a support vector machine (SVM) classifier that could discriminate BPAs (n = 200) from non-BPAs (n = 200) with an area under the curve (AUC) of 0.787. In addition, hierarchical clustering of BPAs revealed that intracellular BPAs clustered separately from extracellular BPAs. However, no immediate benefit was derived when training SVM classifiers on data sets exclusively containing intra- or extracellular BPAs. In conclusion, this work demonstrates that ML classifiers have great utility in RV approaches and will lead to new subunit vaccines in the future. |
topic |
reverse vaccinology machine learning support vector machine bacterial protective antigen bacterial pathogen |
url |
http://www.mdpi.com/1422-0067/18/2/312 |
work_keys_str_mv |
AT ashleyiheinson enhancingthebiologicalrelevanceofmachinelearningclassifiersforreversevaccinology AT yawwanigunawardana enhancingthebiologicalrelevanceofmachinelearningclassifiersforreversevaccinology AT bastiaanmoesker enhancingthebiologicalrelevanceofmachinelearningclassifiersforreversevaccinology AT carmencdenmanhume enhancingthebiologicalrelevanceofmachinelearningclassifiersforreversevaccinology AT elenavataga enhancingthebiologicalrelevanceofmachinelearningclassifiersforreversevaccinology AT yperhall enhancingthebiologicalrelevanceofmachinelearningclassifiersforreversevaccinology AT elenastylianou enhancingthebiologicalrelevanceofmachinelearningclassifiersforreversevaccinology AT helenmcshane enhancingthebiologicalrelevanceofmachinelearningclassifiersforreversevaccinology AT annwilliams enhancingthebiologicalrelevanceofmachinelearningclassifiersforreversevaccinology AT mahesanniranjan enhancingthebiologicalrelevanceofmachinelearningclassifiersforreversevaccinology AT christopherhwoelk enhancingthebiologicalrelevanceofmachinelearningclassifiersforreversevaccinology |
_version_ |
1725096952761155584 |