Predicting host dependency factors of pathogens in Drosophila melanogaster using machine learning

Pathogens causing infections, and particularly when invading the host cells, require the host cell machinery for efficient regeneration and proliferation during infection. For their life cycle, host proteins are needed and these Host Dependency Factors (HDF) may serve as therapeutic targets. Several...

Full description

Bibliographic Details
Main Authors: Olufemi Aromolaran, Thomas Beder, Eunice Adedeji, Yvonne Ajamma, Jelili Oyelade, Ezekiel Adebiyi, Rainer Koenig
Format: Article
Language:English
Published: Elsevier 2021-01-01
Series:Computational and Structural Biotechnology Journal
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S200103702100341X
id doaj-864a4b40c5c9449dacb121860a0d2548
record_format Article
collection DOAJ
language English
format Article
sources DOAJ
author Olufemi Aromolaran
Thomas Beder
Eunice Adedeji
Yvonne Ajamma
Jelili Oyelade
Ezekiel Adebiyi
Rainer Koenig
spellingShingle Olufemi Aromolaran
Thomas Beder
Eunice Adedeji
Yvonne Ajamma
Jelili Oyelade
Ezekiel Adebiyi
Rainer Koenig
Predicting host dependency factors of pathogens in Drosophila melanogaster using machine learning
Computational and Structural Biotechnology Journal
Host factors
Bacteria
Infection
Knockout screen
Machine learning
Drosophila
author_facet Olufemi Aromolaran
Thomas Beder
Eunice Adedeji
Yvonne Ajamma
Jelili Oyelade
Ezekiel Adebiyi
Rainer Koenig
author_sort Olufemi Aromolaran
title Predicting host dependency factors of pathogens in Drosophila melanogaster using machine learning
title_short Predicting host dependency factors of pathogens in Drosophila melanogaster using machine learning
title_full Predicting host dependency factors of pathogens in Drosophila melanogaster using machine learning
title_fullStr Predicting host dependency factors of pathogens in Drosophila melanogaster using machine learning
title_full_unstemmed Predicting host dependency factors of pathogens in Drosophila melanogaster using machine learning
title_sort predicting host dependency factors of pathogens in drosophila melanogaster using machine learning
publisher Elsevier
series Computational and Structural Biotechnology Journal
issn 2001-0370
publishDate 2021-01-01
description Pathogens causing infections, and particularly when invading the host cells, require the host cell machinery for efficient regeneration and proliferation during infection. For their life cycle, host proteins are needed and these Host Dependency Factors (HDF) may serve as therapeutic targets. Several attempts have approached screening for HDF producing large lists of potential HDF with, however, only marginal overlap.To get consistency into the data of these experimental studies, we developed a machine learning pipeline. As a case study, we used publicly available lists of experimentally derived HDF from twelve different screening studies based on gene perturbation in Drosophila melanogaster cells or in vivo upon bacterial or protozoan infection. A total of 50,334 gene features were generated from diverse categories including their functional annotations, topology attributes in protein interaction networks, nucleotide and protein sequence features, homology properties and subcellular localization. Cross-validation revealed an excellent prediction performance. All feature categories contributed to the model. Predicted and experimentally derived HDF showed a good consistency when investigating their common cellular processes and function. Cellular processes and molecular function of these genes were highly enriched in membrane trafficking, particularly in the trans-Golgi network, cell cycle and the Rab GTPase binding family.Using our machine learning approach, we show that HDF in organisms can be predicted with high accuracy evidencing their common investigated characteristics. We elucidated cellular processes which are utilized by invading pathogens during infection. Finally, we provide a list of 208 novel HDF proposed for future experimental studies.
topic Host factors
Bacteria
Infection
Knockout screen
Machine learning
Drosophila
url http://www.sciencedirect.com/science/article/pii/S200103702100341X
work_keys_str_mv AT olufemiaromolaran predictinghostdependencyfactorsofpathogensindrosophilamelanogasterusingmachinelearning
AT thomasbeder predictinghostdependencyfactorsofpathogensindrosophilamelanogasterusingmachinelearning
AT euniceadedeji predictinghostdependencyfactorsofpathogensindrosophilamelanogasterusingmachinelearning
AT yvonneajamma predictinghostdependencyfactorsofpathogensindrosophilamelanogasterusingmachinelearning
AT jelilioyelade predictinghostdependencyfactorsofpathogensindrosophilamelanogasterusingmachinelearning
AT ezekieladebiyi predictinghostdependencyfactorsofpathogensindrosophilamelanogasterusingmachinelearning
AT rainerkoenig predictinghostdependencyfactorsofpathogensindrosophilamelanogasterusingmachinelearning
_version_ 1721197939357908992
spelling doaj-864a4b40c5c9449dacb121860a0d25482021-08-24T04:07:23ZengElsevierComputational and Structural Biotechnology Journal2001-03702021-01-011945814592Predicting host dependency factors of pathogens in Drosophila melanogaster using machine learningOlufemi Aromolaran0Thomas Beder1Eunice Adedeji2Yvonne Ajamma3Jelili Oyelade4Ezekiel Adebiyi5Rainer Koenig6Department of Computer & Information Sciences, Covenant University, Ota, Ogun State, Nigeria; Integrated Research and Treatment Center, Center for Sepsis Control and Care (CSCC), Jena University Hospital, Am Klinikum 1, 07747 Jena, Germany; Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, NigeriaIntegrated Research and Treatment Center, Center for Sepsis Control and Care (CSCC), Jena University Hospital, Am Klinikum 1, 07747 Jena, Germany; Institute of Infectious Diseases and Infection Control, Jena University Hospital, Am Klinikum 1, 07747 Jena, GermanyCovenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, Nigeria; Department of Biochemistry, Covenant University, Ota, Ogun State, NigeriaCovenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, NigeriaDepartment of Computer & Information Sciences, Covenant University, Ota, Ogun State, Nigeria; Integrated Research and Treatment Center, Center for Sepsis Control and Care (CSCC), Jena University Hospital, Am Klinikum 1, 07747 Jena, Germany; Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, NigeriaDepartment of Computer & Information Sciences, Covenant University, Ota, Ogun State, Nigeria; Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, NigeriaIntegrated Research and Treatment Center, Center for Sepsis Control and Care (CSCC), Jena University Hospital, Am Klinikum 1, 07747 Jena, Germany; Institute of Infectious Diseases and Infection Control, Jena University Hospital, Am Klinikum 1, 07747 Jena, Germany; Corresponding author at: Integrated Research and Treatment Center, Center for Sepsis Control and Care (CSCC), Jena University Hospital, Am Klinikum 1, 07747 Jena, Germany.Pathogens causing infections, and particularly when invading the host cells, require the host cell machinery for efficient regeneration and proliferation during infection. For their life cycle, host proteins are needed and these Host Dependency Factors (HDF) may serve as therapeutic targets. Several attempts have approached screening for HDF producing large lists of potential HDF with, however, only marginal overlap.To get consistency into the data of these experimental studies, we developed a machine learning pipeline. As a case study, we used publicly available lists of experimentally derived HDF from twelve different screening studies based on gene perturbation in Drosophila melanogaster cells or in vivo upon bacterial or protozoan infection. A total of 50,334 gene features were generated from diverse categories including their functional annotations, topology attributes in protein interaction networks, nucleotide and protein sequence features, homology properties and subcellular localization. Cross-validation revealed an excellent prediction performance. All feature categories contributed to the model. Predicted and experimentally derived HDF showed a good consistency when investigating their common cellular processes and function. Cellular processes and molecular function of these genes were highly enriched in membrane trafficking, particularly in the trans-Golgi network, cell cycle and the Rab GTPase binding family.Using our machine learning approach, we show that HDF in organisms can be predicted with high accuracy evidencing their common investigated characteristics. We elucidated cellular processes which are utilized by invading pathogens during infection. Finally, we provide a list of 208 novel HDF proposed for future experimental studies.http://www.sciencedirect.com/science/article/pii/S200103702100341XHost factorsBacteriaInfectionKnockout screenMachine learningDrosophila