A Genome-Based Model to Predict the Virulence of Pseudomonas aeruginosa Isolates

Pseudomonas aeruginosa is a clinically important Gram-negative opportunistic pathogen. P. aeruginosa shows a large degree of genomic heterogeneity both through variation in sequences found throughout the species (core genome) and through the presence or absence of sequences in different isolates (ac...

Full description

Bibliographic Details
Main Authors: Nathan B. Pincus, Egon A. Ozer, Jonathan P. Allen, Marcus Nguyen, James J. Davis, Deborah R. Winter, Chih-Hsien Chuang, Cheng-Hsun Chiu, Laura Zamorano, Antonio Oliver, Alan R. Hauser
Format: Article
Language:English
Published: American Society for Microbiology 2020-08-01
Series:mBio
Subjects:
Online Access:https://doi.org/10.1128/mBio.01527-20
id doaj-18477388e9ed43e0974f539ffcaf0a4f
record_format Article
spelling doaj-18477388e9ed43e0974f539ffcaf0a4f2021-07-02T10:26:55ZengAmerican Society for MicrobiologymBio2150-75112020-08-01114e01527-2010.1128/mBio.01527-20A Genome-Based Model to Predict the Virulence of Pseudomonas aeruginosa IsolatesNathan B. PincusEgon A. OzerJonathan P. AllenMarcus NguyenJames J. DavisDeborah R. WinterChih-Hsien ChuangCheng-Hsun ChiuLaura ZamoranoAntonio OliverAlan R. HauserPseudomonas aeruginosa is a clinically important Gram-negative opportunistic pathogen. P. aeruginosa shows a large degree of genomic heterogeneity both through variation in sequences found throughout the species (core genome) and through the presence or absence of sequences in different isolates (accessory genome). P. aeruginosa isolates also differ markedly in their ability to cause disease. In this study, we used machine learning to predict the virulence level of P. aeruginosa isolates in a mouse bacteremia model based on genomic content. We show that both the accessory and core genomes are predictive of virulence. This study provides a machine learning framework to investigate relationships between bacterial genomes and complex phenotypes such as virulence.Variation in the genome of Pseudomonas aeruginosa, an important pathogen, can have dramatic impacts on the bacterium’s ability to cause disease. We therefore asked whether it was possible to predict the virulence of P. aeruginosa isolates based on their genomic content. We applied a machine learning approach to a genetically and phenotypically diverse collection of 115 clinical P. aeruginosa isolates using genomic information and corresponding virulence phenotypes in a mouse model of bacteremia. We defined the accessory genome of these isolates through the presence or absence of accessory genomic elements (AGEs), sequences present in some strains but not others. Machine learning models trained using AGEs were predictive of virulence, with a mean nested cross-validation accuracy of 75% using the random forest algorithm. However, individual AGEs did not have a large influence on the algorithm’s performance, suggesting instead that virulence predictions are derived from a diffuse genomic signature. These results were validated with an independent test set of 25 P. aeruginosa isolates whose virulence was predicted with 72% accuracy. Machine learning models trained using core genome single-nucleotide variants and whole-genome k-mers also predicted virulence. Our findings are a proof of concept for the use of bacterial genomes to predict pathogenicity in P. aeruginosa and highlight the potential of this approach for predicting patient outcomes.https://doi.org/10.1128/mBio.01527-20pseudomonas aeruginosagenome analysismachine learningmodelingpredictionvirulence
collection DOAJ
language English
format Article
sources DOAJ
author Nathan B. Pincus
Egon A. Ozer
Jonathan P. Allen
Marcus Nguyen
James J. Davis
Deborah R. Winter
Chih-Hsien Chuang
Cheng-Hsun Chiu
Laura Zamorano
Antonio Oliver
Alan R. Hauser
spellingShingle Nathan B. Pincus
Egon A. Ozer
Jonathan P. Allen
Marcus Nguyen
James J. Davis
Deborah R. Winter
Chih-Hsien Chuang
Cheng-Hsun Chiu
Laura Zamorano
Antonio Oliver
Alan R. Hauser
A Genome-Based Model to Predict the Virulence of Pseudomonas aeruginosa Isolates
mBio
pseudomonas aeruginosa
genome analysis
machine learning
modeling
prediction
virulence
author_facet Nathan B. Pincus
Egon A. Ozer
Jonathan P. Allen
Marcus Nguyen
James J. Davis
Deborah R. Winter
Chih-Hsien Chuang
Cheng-Hsun Chiu
Laura Zamorano
Antonio Oliver
Alan R. Hauser
author_sort Nathan B. Pincus
title A Genome-Based Model to Predict the Virulence of Pseudomonas aeruginosa Isolates
title_short A Genome-Based Model to Predict the Virulence of Pseudomonas aeruginosa Isolates
title_full A Genome-Based Model to Predict the Virulence of Pseudomonas aeruginosa Isolates
title_fullStr A Genome-Based Model to Predict the Virulence of Pseudomonas aeruginosa Isolates
title_full_unstemmed A Genome-Based Model to Predict the Virulence of Pseudomonas aeruginosa Isolates
title_sort genome-based model to predict the virulence of pseudomonas aeruginosa isolates
publisher American Society for Microbiology
series mBio
issn 2150-7511
publishDate 2020-08-01
description Pseudomonas aeruginosa is a clinically important Gram-negative opportunistic pathogen. P. aeruginosa shows a large degree of genomic heterogeneity both through variation in sequences found throughout the species (core genome) and through the presence or absence of sequences in different isolates (accessory genome). P. aeruginosa isolates also differ markedly in their ability to cause disease. In this study, we used machine learning to predict the virulence level of P. aeruginosa isolates in a mouse bacteremia model based on genomic content. We show that both the accessory and core genomes are predictive of virulence. This study provides a machine learning framework to investigate relationships between bacterial genomes and complex phenotypes such as virulence.Variation in the genome of Pseudomonas aeruginosa, an important pathogen, can have dramatic impacts on the bacterium’s ability to cause disease. We therefore asked whether it was possible to predict the virulence of P. aeruginosa isolates based on their genomic content. We applied a machine learning approach to a genetically and phenotypically diverse collection of 115 clinical P. aeruginosa isolates using genomic information and corresponding virulence phenotypes in a mouse model of bacteremia. We defined the accessory genome of these isolates through the presence or absence of accessory genomic elements (AGEs), sequences present in some strains but not others. Machine learning models trained using AGEs were predictive of virulence, with a mean nested cross-validation accuracy of 75% using the random forest algorithm. However, individual AGEs did not have a large influence on the algorithm’s performance, suggesting instead that virulence predictions are derived from a diffuse genomic signature. These results were validated with an independent test set of 25 P. aeruginosa isolates whose virulence was predicted with 72% accuracy. Machine learning models trained using core genome single-nucleotide variants and whole-genome k-mers also predicted virulence. Our findings are a proof of concept for the use of bacterial genomes to predict pathogenicity in P. aeruginosa and highlight the potential of this approach for predicting patient outcomes.
topic pseudomonas aeruginosa
genome analysis
machine learning
modeling
prediction
virulence
url https://doi.org/10.1128/mBio.01527-20
work_keys_str_mv AT nathanbpincus agenomebasedmodeltopredictthevirulenceofpseudomonasaeruginosaisolates
AT egonaozer agenomebasedmodeltopredictthevirulenceofpseudomonasaeruginosaisolates
AT jonathanpallen agenomebasedmodeltopredictthevirulenceofpseudomonasaeruginosaisolates
AT marcusnguyen agenomebasedmodeltopredictthevirulenceofpseudomonasaeruginosaisolates
AT jamesjdavis agenomebasedmodeltopredictthevirulenceofpseudomonasaeruginosaisolates
AT deborahrwinter agenomebasedmodeltopredictthevirulenceofpseudomonasaeruginosaisolates
AT chihhsienchuang agenomebasedmodeltopredictthevirulenceofpseudomonasaeruginosaisolates
AT chenghsunchiu agenomebasedmodeltopredictthevirulenceofpseudomonasaeruginosaisolates
AT laurazamorano agenomebasedmodeltopredictthevirulenceofpseudomonasaeruginosaisolates
AT antoniooliver agenomebasedmodeltopredictthevirulenceofpseudomonasaeruginosaisolates
AT alanrhauser agenomebasedmodeltopredictthevirulenceofpseudomonasaeruginosaisolates
AT nathanbpincus genomebasedmodeltopredictthevirulenceofpseudomonasaeruginosaisolates
AT egonaozer genomebasedmodeltopredictthevirulenceofpseudomonasaeruginosaisolates
AT jonathanpallen genomebasedmodeltopredictthevirulenceofpseudomonasaeruginosaisolates
AT marcusnguyen genomebasedmodeltopredictthevirulenceofpseudomonasaeruginosaisolates
AT jamesjdavis genomebasedmodeltopredictthevirulenceofpseudomonasaeruginosaisolates
AT deborahrwinter genomebasedmodeltopredictthevirulenceofpseudomonasaeruginosaisolates
AT chihhsienchuang genomebasedmodeltopredictthevirulenceofpseudomonasaeruginosaisolates
AT chenghsunchiu genomebasedmodeltopredictthevirulenceofpseudomonasaeruginosaisolates
AT laurazamorano genomebasedmodeltopredictthevirulenceofpseudomonasaeruginosaisolates
AT antoniooliver genomebasedmodeltopredictthevirulenceofpseudomonasaeruginosaisolates
AT alanrhauser genomebasedmodeltopredictthevirulenceofpseudomonasaeruginosaisolates
_version_ 1721331970330329088