A multitask transfer learning framework for the prediction of virus-human protein–protein interactions

Background: Viral infections are causing significant morbidity and mortality worldwide. Understanding the interaction patterns between a particular virus and human proteins plays a crucial role in unveiling the underlying mechanism of viral infection and pathogenesis. This could further help in prev...

Full description

Bibliographic Details
Main Authors: Brogden, G. (Author), Dong, T.N (Author), Gerold, G. (Author), Khosla, M. (Author)
Format: Article
Language:English
Published: BioMed Central Ltd 2021
Subjects:
Online Access:View Fulltext in Publisher
LEADER 03353nam a2200577Ia 4500
001 10.1186-s12859-021-04484-y
008 220427s2021 CNT 000 0 und d
020 |a 14712105 (ISSN) 
245 1 0 |a A multitask transfer learning framework for the prediction of virus-human protein–protein interactions 
260 0 |b BioMed Central Ltd  |c 2021 
856 |z View Fulltext in Publisher  |u https://doi.org/10.1186/s12859-021-04484-y 
520 3 |a Background: Viral infections are causing significant morbidity and mortality worldwide. Understanding the interaction patterns between a particular virus and human proteins plays a crucial role in unveiling the underlying mechanism of viral infection and pathogenesis. This could further help in prevention and treatment of virus-related diseases. However, the task of predicting protein–protein interactions between a new virus and human cells is extremely challenging due to scarce data on virus-human interactions and fast mutation rates of most viruses. Results: We developed a multitask transfer learning approach that exploits the information of around 24 million protein sequences and the interaction patterns from the human interactome to counter the problem of small training datasets. Instead of using hand-crafted protein features, we utilize statistically rich protein representations learned by a deep language modeling approach from a massive source of protein sequences. Additionally, we employ an additional objective which aims to maximize the probability of observing human protein–protein interactions. This additional task objective acts as a regularizer and also allows to incorporate domain knowledge to inform the virus-human protein–protein interaction prediction model. Conclusions: Our approach achieved competitive results on 13 benchmark datasets and the case study for the SARS-CoV-2 virus receptor. Experimental results show that our proposed model works effectively for both virus-human and bacteria-human protein–protein interaction prediction tasks. We share our code for reproducibility and future research at https://git.l3s.uni-hannover.de/dong/multitask-transfer. © 2021, The Author(s). 
650 0 4 |a algorithm 
650 0 4 |a Algorithms 
650 0 4 |a COVID-19 
650 0 4 |a Diseases 
650 0 4 |a Embeddings 
650 0 4 |a Forecasting 
650 0 4 |a human 
650 0 4 |a Human PPI 
650 0 4 |a Human PPI 
650 0 4 |a Human proteins 
650 0 4 |a Humans 
650 0 4 |a Interaction pattern 
650 0 4 |a Learning systems 
650 0 4 |a machine learning 
650 0 4 |a Machine Learning 
650 0 4 |a Modeling languages 
650 0 4 |a Multitask 
650 0 4 |a Multitask 
650 0 4 |a Protein embedding 
650 0 4 |a Protein embedding 
650 0 4 |a Protein–protein interaction 
650 0 4 |a Protein-protein interactions 
650 0 4 |a Proteins 
650 0 4 |a reproducibility 
650 0 4 |a Reproducibility of Results 
650 0 4 |a SARS-CoV-2 
650 0 4 |a Transfer learning 
650 0 4 |a Transfer learning 
650 0 4 |a Viral infections 
650 0 4 |a virus 
650 0 4 |a Viruses 
650 0 4 |a Viruses 
650 0 4 |a Virus-human PPI 
650 0 4 |a Virus-human PPI 
700 1 |a Brogden, G.  |e author 
700 1 |a Dong, T.N.  |e author 
700 1 |a Gerold, G.  |e author 
700 1 |a Khosla, M.  |e author 
773 |t BMC Bioinformatics