EpitopeVec: Linear epitope prediction using deep protein sequence embeddings

Motivation: B-cell epitopes (BCEs) play a pivotal role in the development of peptide vaccines, immuno-diagnostic reagents and antibody production, and thus in infectious disease prevention and diagnostics in general. Experimental methods used to determine BCEs are costly and time-consuming. Therefor...

Full description

Bibliographic Details
Main Authors: Asgari, E. (Author), Bahai, A. (Author), Kloetgen, A. (Author), McHardy, A.C (Author), Mofrad, M.R.K (Author)
Format: Article
Language:English
Published: Oxford University Press 2021
Online Access:View Fulltext in Publisher
LEADER 02105nam a2200181Ia 4500
001 10.1093-bioinformatics-btab467
008 220427s2021 CNT 000 0 und d
020 |a 13674803 (ISSN) 
245 1 0 |a EpitopeVec: Linear epitope prediction using deep protein sequence embeddings 
260 0 |b Oxford University Press  |c 2021 
856 |z View Fulltext in Publisher  |u https://doi.org/10.1093/bioinformatics/btab467 
520 3 |a Motivation: B-cell epitopes (BCEs) play a pivotal role in the development of peptide vaccines, immuno-diagnostic reagents and antibody production, and thus in infectious disease prevention and diagnostics in general. Experimental methods used to determine BCEs are costly and time-consuming. Therefore, it is essential to develop computational methods for the rapid identification of BCEs. Although several computational methods have been developed for this task, generalizability is still a major concern, where cross-testing of the classifiers trained and tested on different datasets has revealed accuracies of 51-53%. Results: We describe a new method called EpitopeVec, which uses a combination of residue properties, modified antigenicity scales, and protein language model-based representations (protein vectors) as features of peptides for linear BCE predictions. Extensive benchmarking of EpitopeVec and other state-of-the-art methods for linear BCE prediction on several large and small datasets, as well as cross-testing, demonstrated an improvement in the performance of EpitopeVec over other methods in terms of accuracy and area under the curve. As the predictive performance depended on the species origin of the respective antigens (viral, bacterial and eukaryotic), we also trained our method on a large viral dataset to create a dedicated linear viral BCE predictor with improved cross-testing performance. © 2021 The Author(s) 2021. Published by Oxford University Press. 
700 1 |a Asgari, E.  |e author 
700 1 |a Bahai, A.  |e author 
700 1 |a Kloetgen, A.  |e author 
700 1 |a McHardy, A.C.  |e author 
700 1 |a Mofrad, M.R.K.  |e author 
773 |t Bioinformatics