Predictive Models May Complement or Provide an Alternative to Existing Strategies for Assessing the Enteric Pathogen Contamination Status of Northeastern Streams Used to Provide Water for Produce Production

While the Food Safety Modernization Act established standards for the use of surface water for produce production, water quality is known to vary over space and time. Targeted approaches for identifying hazards in water that account for this variation may improve growers' ability to address pre...

Full description

Bibliographic Details
Main Authors: Daniel L. Weller, Tanzy M. T. Love, Alexandra Belias, Martin Wiedmann
Format: Article
Language:English
Published: Frontiers Media S.A. 2020-10-01
Series:Frontiers in Sustainable Food Systems
Subjects:
stx
Online Access:https://www.frontiersin.org/article/10.3389/fsufs.2020.561517/full
id doaj-df4508a3ca3749e4aec86662562e5267
record_format Article
spelling doaj-df4508a3ca3749e4aec86662562e52672020-11-25T01:19:17ZengFrontiers Media S.A.Frontiers in Sustainable Food Systems2571-581X2020-10-01410.3389/fsufs.2020.561517561517Predictive Models May Complement or Provide an Alternative to Existing Strategies for Assessing the Enteric Pathogen Contamination Status of Northeastern Streams Used to Provide Water for Produce ProductionDaniel L. Weller0Daniel L. Weller1Tanzy M. T. Love2Alexandra Belias3Martin Wiedmann4Department of Food Science, Cornell University, Ithaca, NY, United StatesDepartment of Biostatistics and Computational Biology, University of Rochester, Rochester, NY, United StatesDepartment of Biostatistics and Computational Biology, University of Rochester, Rochester, NY, United StatesDepartment of Food Science, Cornell University, Ithaca, NY, United StatesDepartment of Food Science, Cornell University, Ithaca, NY, United StatesWhile the Food Safety Modernization Act established standards for the use of surface water for produce production, water quality is known to vary over space and time. Targeted approaches for identifying hazards in water that account for this variation may improve growers' ability to address pre-harvest food safety risks. Models that utilize publicly-available data (e.g., land-use, real-time weather) may be useful for developing these approaches. The objective of this study was to use pre-existing datasets collected in 2017 (N = 181 samples) and 2018 (N = 191 samples) to train and test models that predict the likelihood of detecting Salmonella and pathogenic E. coli markers (eaeA, stx) in agricultural water. Four types of features were used to train the models: microbial, physicochemical, spatial and weather. “Full models” were built using all four features types, while “nested models” were built using between one and three types. Twenty learners were used to develop separate full models for each pathogen. Separately, to assess information gain associated with using different feature types, six learners were randomly selected and used to develop nine, nested models each. Performance measures for each model were then calculated and compared against baseline models where E. coli concentration was the sole covariate. In the methods, we outline the advantages and disadvantages of each learner. Overall, full models built using ensemble (e.g., Node Harvest) and “black-box” (e.g., SVMs) learners out-performed full models built using more interpretable learners (e.g., tree- and rule-based learners) for both outcomes. However, nested eaeA-stx models built using interpretable learners and microbial data performed almost as well as these full models. While none of the nested Salmonella models performed as well as the full models, nested models built using spatial data consistently out-performed models that excluded spatial data. These findings demonstrate that machine learning approaches can be used to predict when and where pathogens are likely to be present in agricultural water. This study serves as a proof-of-concept that can be built upon once larger datasets become available and provides guidance on the learner-data combinations that should be the foci of future efforts (e.g., tree-based microbial models for pathogenic E. coli).https://www.frontiersin.org/article/10.3389/fsufs.2020.561517/fullagricultural waterstxeaeASalmonellaE. colimachine learning
collection DOAJ
language English
format Article
sources DOAJ
author Daniel L. Weller
Daniel L. Weller
Tanzy M. T. Love
Alexandra Belias
Martin Wiedmann
spellingShingle Daniel L. Weller
Daniel L. Weller
Tanzy M. T. Love
Alexandra Belias
Martin Wiedmann
Predictive Models May Complement or Provide an Alternative to Existing Strategies for Assessing the Enteric Pathogen Contamination Status of Northeastern Streams Used to Provide Water for Produce Production
Frontiers in Sustainable Food Systems
agricultural water
stx
eaeA
Salmonella
E. coli
machine learning
author_facet Daniel L. Weller
Daniel L. Weller
Tanzy M. T. Love
Alexandra Belias
Martin Wiedmann
author_sort Daniel L. Weller
title Predictive Models May Complement or Provide an Alternative to Existing Strategies for Assessing the Enteric Pathogen Contamination Status of Northeastern Streams Used to Provide Water for Produce Production
title_short Predictive Models May Complement or Provide an Alternative to Existing Strategies for Assessing the Enteric Pathogen Contamination Status of Northeastern Streams Used to Provide Water for Produce Production
title_full Predictive Models May Complement or Provide an Alternative to Existing Strategies for Assessing the Enteric Pathogen Contamination Status of Northeastern Streams Used to Provide Water for Produce Production
title_fullStr Predictive Models May Complement or Provide an Alternative to Existing Strategies for Assessing the Enteric Pathogen Contamination Status of Northeastern Streams Used to Provide Water for Produce Production
title_full_unstemmed Predictive Models May Complement or Provide an Alternative to Existing Strategies for Assessing the Enteric Pathogen Contamination Status of Northeastern Streams Used to Provide Water for Produce Production
title_sort predictive models may complement or provide an alternative to existing strategies for assessing the enteric pathogen contamination status of northeastern streams used to provide water for produce production
publisher Frontiers Media S.A.
series Frontiers in Sustainable Food Systems
issn 2571-581X
publishDate 2020-10-01
description While the Food Safety Modernization Act established standards for the use of surface water for produce production, water quality is known to vary over space and time. Targeted approaches for identifying hazards in water that account for this variation may improve growers' ability to address pre-harvest food safety risks. Models that utilize publicly-available data (e.g., land-use, real-time weather) may be useful for developing these approaches. The objective of this study was to use pre-existing datasets collected in 2017 (N = 181 samples) and 2018 (N = 191 samples) to train and test models that predict the likelihood of detecting Salmonella and pathogenic E. coli markers (eaeA, stx) in agricultural water. Four types of features were used to train the models: microbial, physicochemical, spatial and weather. “Full models” were built using all four features types, while “nested models” were built using between one and three types. Twenty learners were used to develop separate full models for each pathogen. Separately, to assess information gain associated with using different feature types, six learners were randomly selected and used to develop nine, nested models each. Performance measures for each model were then calculated and compared against baseline models where E. coli concentration was the sole covariate. In the methods, we outline the advantages and disadvantages of each learner. Overall, full models built using ensemble (e.g., Node Harvest) and “black-box” (e.g., SVMs) learners out-performed full models built using more interpretable learners (e.g., tree- and rule-based learners) for both outcomes. However, nested eaeA-stx models built using interpretable learners and microbial data performed almost as well as these full models. While none of the nested Salmonella models performed as well as the full models, nested models built using spatial data consistently out-performed models that excluded spatial data. These findings demonstrate that machine learning approaches can be used to predict when and where pathogens are likely to be present in agricultural water. This study serves as a proof-of-concept that can be built upon once larger datasets become available and provides guidance on the learner-data combinations that should be the foci of future efforts (e.g., tree-based microbial models for pathogenic E. coli).
topic agricultural water
stx
eaeA
Salmonella
E. coli
machine learning
url https://www.frontiersin.org/article/10.3389/fsufs.2020.561517/full
work_keys_str_mv AT daniellweller predictivemodelsmaycomplementorprovideanalternativetoexistingstrategiesforassessingtheentericpathogencontaminationstatusofnortheasternstreamsusedtoprovidewaterforproduceproduction
AT daniellweller predictivemodelsmaycomplementorprovideanalternativetoexistingstrategiesforassessingtheentericpathogencontaminationstatusofnortheasternstreamsusedtoprovidewaterforproduceproduction
AT tanzymtlove predictivemodelsmaycomplementorprovideanalternativetoexistingstrategiesforassessingtheentericpathogencontaminationstatusofnortheasternstreamsusedtoprovidewaterforproduceproduction
AT alexandrabelias predictivemodelsmaycomplementorprovideanalternativetoexistingstrategiesforassessingtheentericpathogencontaminationstatusofnortheasternstreamsusedtoprovidewaterforproduceproduction
AT martinwiedmann predictivemodelsmaycomplementorprovideanalternativetoexistingstrategiesforassessingtheentericpathogencontaminationstatusofnortheasternstreamsusedtoprovidewaterforproduceproduction
_version_ 1725139090759745536