Machine learning identifies signatures of host adaptation in the bacterial pathogen Salmonella enterica.

Emerging pathogens are a major threat to public health, however understanding how pathogens adapt to new niches remains a challenge. New methods are urgently required to provide functional insights into pathogens from the massive genomic data sets now being generated from routine pathogen surveillan...

Full description

Bibliographic Details
Main Authors: Nicole E Wheeler, Paul P Gardner, Lars Barquist
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2018-05-01
Series:PLoS Genetics
Online Access:http://europepmc.org/articles/PMC5940178?pdf=render
id doaj-937587dae0294a458478b00511b00221
record_format Article
spelling doaj-937587dae0294a458478b00511b002212020-11-25T01:19:26ZengPublic Library of Science (PLoS)PLoS Genetics1553-73901553-74042018-05-01145e100733310.1371/journal.pgen.1007333Machine learning identifies signatures of host adaptation in the bacterial pathogen Salmonella enterica.Nicole E WheelerPaul P GardnerLars BarquistEmerging pathogens are a major threat to public health, however understanding how pathogens adapt to new niches remains a challenge. New methods are urgently required to provide functional insights into pathogens from the massive genomic data sets now being generated from routine pathogen surveillance for epidemiological purposes. Here, we measure the burden of atypical mutations in protein coding genes across independently evolved Salmonella enterica lineages, and use these as input to train a random forest classifier to identify strains associated with extraintestinal disease. Members of the species fall along a continuum, from pathovars which cause gastrointestinal infection and low mortality, associated with a broad host-range, to those that cause invasive infection and high mortality, associated with a narrowed host range. Our random forest classifier learned to perfectly discriminate long-established gastrointestinal and invasive serovars of Salmonella. Additionally, it was able to discriminate recently emerged Salmonella Enteritidis and Typhimurium lineages associated with invasive disease in immunocompromised populations in sub-Saharan Africa, and within-host adaptation to invasive infection. We dissect the architecture of the model to identify the genes that were most informative of phenotype, revealing a common theme of degradation of metabolic pathways in extraintestinal lineages. This approach accurately identifies patterns of gene degradation and diversifying selection specific to invasive serovars that have been captured by more labour-intensive investigations, but can be readily scaled to larger analyses.http://europepmc.org/articles/PMC5940178?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author Nicole E Wheeler
Paul P Gardner
Lars Barquist
spellingShingle Nicole E Wheeler
Paul P Gardner
Lars Barquist
Machine learning identifies signatures of host adaptation in the bacterial pathogen Salmonella enterica.
PLoS Genetics
author_facet Nicole E Wheeler
Paul P Gardner
Lars Barquist
author_sort Nicole E Wheeler
title Machine learning identifies signatures of host adaptation in the bacterial pathogen Salmonella enterica.
title_short Machine learning identifies signatures of host adaptation in the bacterial pathogen Salmonella enterica.
title_full Machine learning identifies signatures of host adaptation in the bacterial pathogen Salmonella enterica.
title_fullStr Machine learning identifies signatures of host adaptation in the bacterial pathogen Salmonella enterica.
title_full_unstemmed Machine learning identifies signatures of host adaptation in the bacterial pathogen Salmonella enterica.
title_sort machine learning identifies signatures of host adaptation in the bacterial pathogen salmonella enterica.
publisher Public Library of Science (PLoS)
series PLoS Genetics
issn 1553-7390
1553-7404
publishDate 2018-05-01
description Emerging pathogens are a major threat to public health, however understanding how pathogens adapt to new niches remains a challenge. New methods are urgently required to provide functional insights into pathogens from the massive genomic data sets now being generated from routine pathogen surveillance for epidemiological purposes. Here, we measure the burden of atypical mutations in protein coding genes across independently evolved Salmonella enterica lineages, and use these as input to train a random forest classifier to identify strains associated with extraintestinal disease. Members of the species fall along a continuum, from pathovars which cause gastrointestinal infection and low mortality, associated with a broad host-range, to those that cause invasive infection and high mortality, associated with a narrowed host range. Our random forest classifier learned to perfectly discriminate long-established gastrointestinal and invasive serovars of Salmonella. Additionally, it was able to discriminate recently emerged Salmonella Enteritidis and Typhimurium lineages associated with invasive disease in immunocompromised populations in sub-Saharan Africa, and within-host adaptation to invasive infection. We dissect the architecture of the model to identify the genes that were most informative of phenotype, revealing a common theme of degradation of metabolic pathways in extraintestinal lineages. This approach accurately identifies patterns of gene degradation and diversifying selection specific to invasive serovars that have been captured by more labour-intensive investigations, but can be readily scaled to larger analyses.
url http://europepmc.org/articles/PMC5940178?pdf=render
work_keys_str_mv AT nicoleewheeler machinelearningidentifiessignaturesofhostadaptationinthebacterialpathogensalmonellaenterica
AT paulpgardner machinelearningidentifiessignaturesofhostadaptationinthebacterialpathogensalmonellaenterica
AT larsbarquist machinelearningidentifiessignaturesofhostadaptationinthebacterialpathogensalmonellaenterica
_version_ 1725138269458399232