Machine learning with random subspace ensembles identifies antimicrobial resistance determinants from pan-genomes of three pathogens.

The evolution of antimicrobial resistance (AMR) poses a persistent threat to global public health. Sequencing efforts have already yielded genome sequences for thousands of resistant microbial isolates and require robust computational tools to systematically elucidate the genetic basis for AMR. Here...

Full description

Bibliographic Details
Main Authors: Jason C Hyun, Erol S Kavvas, Jonathan M Monk, Bernhard O Palsson
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2020-03-01
Series:PLoS Computational Biology
Online Access:https://doi.org/10.1371/journal.pcbi.1007608
id doaj-468c84af76484b478dab2caa61ca1172
record_format Article
spelling doaj-468c84af76484b478dab2caa61ca11722021-04-21T15:44:14ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582020-03-01163e100760810.1371/journal.pcbi.1007608Machine learning with random subspace ensembles identifies antimicrobial resistance determinants from pan-genomes of three pathogens.Jason C HyunErol S KavvasJonathan M MonkBernhard O PalssonThe evolution of antimicrobial resistance (AMR) poses a persistent threat to global public health. Sequencing efforts have already yielded genome sequences for thousands of resistant microbial isolates and require robust computational tools to systematically elucidate the genetic basis for AMR. Here, we present a generalizable machine learning workflow for identifying genetic features driving AMR based on constructing reference strain-agnostic pan-genomes and training random subspace ensembles (RSEs). This workflow was applied to the resistance profiles of 14 antimicrobials across three urgent threat pathogens encompassing 288 Staphylococcus aureus, 456 Pseudomonas aeruginosa, and 1588 Escherichia coli genomes. We find that feature selection by RSE detects known AMR associations more reliably than common statistical tests and previous ensemble approaches, identifying a total of 45 known AMR-conferring genes and alleles across the three organisms, as well as 25 candidate associations backed by domain-level annotations. Furthermore, we find that results from the RSE approach are consistent with existing understanding of fluoroquinolone (FQ) resistance due to mutations in the main drug targets, gyrA and parC, in all three organisms, and suggest the mutational landscape of those genes with respect to FQ resistance is simple. As larger datasets become available, we expect this approach to more reliably predict AMR determinants for a wider range of microbial pathogens.https://doi.org/10.1371/journal.pcbi.1007608
collection DOAJ
language English
format Article
sources DOAJ
author Jason C Hyun
Erol S Kavvas
Jonathan M Monk
Bernhard O Palsson
spellingShingle Jason C Hyun
Erol S Kavvas
Jonathan M Monk
Bernhard O Palsson
Machine learning with random subspace ensembles identifies antimicrobial resistance determinants from pan-genomes of three pathogens.
PLoS Computational Biology
author_facet Jason C Hyun
Erol S Kavvas
Jonathan M Monk
Bernhard O Palsson
author_sort Jason C Hyun
title Machine learning with random subspace ensembles identifies antimicrobial resistance determinants from pan-genomes of three pathogens.
title_short Machine learning with random subspace ensembles identifies antimicrobial resistance determinants from pan-genomes of three pathogens.
title_full Machine learning with random subspace ensembles identifies antimicrobial resistance determinants from pan-genomes of three pathogens.
title_fullStr Machine learning with random subspace ensembles identifies antimicrobial resistance determinants from pan-genomes of three pathogens.
title_full_unstemmed Machine learning with random subspace ensembles identifies antimicrobial resistance determinants from pan-genomes of three pathogens.
title_sort machine learning with random subspace ensembles identifies antimicrobial resistance determinants from pan-genomes of three pathogens.
publisher Public Library of Science (PLoS)
series PLoS Computational Biology
issn 1553-734X
1553-7358
publishDate 2020-03-01
description The evolution of antimicrobial resistance (AMR) poses a persistent threat to global public health. Sequencing efforts have already yielded genome sequences for thousands of resistant microbial isolates and require robust computational tools to systematically elucidate the genetic basis for AMR. Here, we present a generalizable machine learning workflow for identifying genetic features driving AMR based on constructing reference strain-agnostic pan-genomes and training random subspace ensembles (RSEs). This workflow was applied to the resistance profiles of 14 antimicrobials across three urgent threat pathogens encompassing 288 Staphylococcus aureus, 456 Pseudomonas aeruginosa, and 1588 Escherichia coli genomes. We find that feature selection by RSE detects known AMR associations more reliably than common statistical tests and previous ensemble approaches, identifying a total of 45 known AMR-conferring genes and alleles across the three organisms, as well as 25 candidate associations backed by domain-level annotations. Furthermore, we find that results from the RSE approach are consistent with existing understanding of fluoroquinolone (FQ) resistance due to mutations in the main drug targets, gyrA and parC, in all three organisms, and suggest the mutational landscape of those genes with respect to FQ resistance is simple. As larger datasets become available, we expect this approach to more reliably predict AMR determinants for a wider range of microbial pathogens.
url https://doi.org/10.1371/journal.pcbi.1007608
work_keys_str_mv AT jasonchyun machinelearningwithrandomsubspaceensemblesidentifiesantimicrobialresistancedeterminantsfrompangenomesofthreepathogens
AT erolskavvas machinelearningwithrandomsubspaceensemblesidentifiesantimicrobialresistancedeterminantsfrompangenomesofthreepathogens
AT jonathanmmonk machinelearningwithrandomsubspaceensemblesidentifiesantimicrobialresistancedeterminantsfrompangenomesofthreepathogens
AT bernhardopalsson machinelearningwithrandomsubspaceensemblesidentifiesantimicrobialresistancedeterminantsfrompangenomesofthreepathogens
_version_ 1714667002003456000