Evaluating the Performance of Machine Learning Approaches to Predict the Microbial Quality of Surface Waters and to Optimize the Sampling Effort

Exposure to contaminated water during aquatic recreational activities can lead to gastrointestinal diseases. In order to decrease the exposure risk, the fecal indicator bacteria <i>Escherichia coli</i> is routinely monitored, which is time-consuming, labor-intensive, and costly. To assis...

Full description

Bibliographic Details
Main Authors: Manel Naloufi, Françoise S. Lucas, Sami Souihi, Pierre Servais, Aurélie Janne, Thiago Wanderley Matos De Abreu
Format: Article
Language:English
Published: MDPI AG 2021-09-01
Series:Water
Subjects:
Online Access:https://www.mdpi.com/2073-4441/13/18/2457
id doaj-1d9b67bcb1fa4b7ba0fafc38b43ac766
record_format Article
spelling doaj-1d9b67bcb1fa4b7ba0fafc38b43ac7662021-09-26T01:38:18ZengMDPI AGWater2073-44412021-09-01132457245710.3390/w13182457Evaluating the Performance of Machine Learning Approaches to Predict the Microbial Quality of Surface Waters and to Optimize the Sampling EffortManel Naloufi0Françoise S. Lucas1Sami Souihi2Pierre Servais3Aurélie Janne4Thiago Wanderley Matos De Abreu5Direction de la Propreté et de l’Eau, Service Technique de l’Eau et de l’Assainissement, 27 rue du Commandeur, 75014 Paris, FranceLaboratoire Eau, Environnement et Systèmes Urbains (Leesu), Université Paris-Est Créteil, École des Ponts ParisTech, 61 Avenue du Général de Gaulle, CEDEX, Créteil, 94010 Paris, FranceImage, Signal and Intelligent Systems (LiSSi) Laboratory, University of Paris-Est Créteil Val de Marne, 122 rue Paul Armangot, Vitry sur Seine, 94400 Paris, FranceEcology of Aquatic Systems, Université Libre de Bruxelles, 50 Av. Franklin Roosevelt, 1050 Brussels, BelgiumSyndicat Marne Vive, Maison de la Nature, 77 quai de la Pie, Saint-Maur-des-Fossés, 94100 Paris, FranceImage, Signal and Intelligent Systems (LiSSi) Laboratory, University of Paris-Est Créteil Val de Marne, 122 rue Paul Armangot, Vitry sur Seine, 94400 Paris, FranceExposure to contaminated water during aquatic recreational activities can lead to gastrointestinal diseases. In order to decrease the exposure risk, the fecal indicator bacteria <i>Escherichia coli</i> is routinely monitored, which is time-consuming, labor-intensive, and costly. To assist the stakeholders in the daily management of bathing sites, models have been developed to predict the microbiological quality. However, model performances are highly dependent on the quality of the input data which are usually scarce. In our study, we proposed a conceptual framework for optimizing the selection of the most adapted model, and to enrich the training dataset. This frameword was successfully applied to the prediction of <i>Escherichia coli</i> concentrations in the Marne River (Paris Area, France). We compared the performance of six machine learning (ML)-based models: K-nearest neighbors, Decision Tree, Support Vector Machines, Bagging, Random Forest, and Adaptive boosting. Based on several statistical metrics, the Random Forest model presented the best accuracy compared to the other models. However, 53.2 ± 3.5% of the predicted <i>E. coli</i> densities were inaccurately estimated according to the mean absolute percentage error (MAPE). Four parameters (temperature, conductivity, 24 h cumulative rainfall of the previous day the sampling, and the river flow) were identified as key variables to be monitored for optimization of the ML model. The set of values to be optimized will feed an alert system for monitoring the microbiological quality of the water through combined strategy of in situ manual sampling and the deployment of a network of sensors. Based on these results, we propose a guideline for ML model selection and sampling optimization.https://www.mdpi.com/2073-4441/13/18/2457water quality predictionmachine learning<i>Escherichia coli</i> concentrationoptimized samplingriver monitoring
collection DOAJ
language English
format Article
sources DOAJ
author Manel Naloufi
Françoise S. Lucas
Sami Souihi
Pierre Servais
Aurélie Janne
Thiago Wanderley Matos De Abreu
spellingShingle Manel Naloufi
Françoise S. Lucas
Sami Souihi
Pierre Servais
Aurélie Janne
Thiago Wanderley Matos De Abreu
Evaluating the Performance of Machine Learning Approaches to Predict the Microbial Quality of Surface Waters and to Optimize the Sampling Effort
Water
water quality prediction
machine learning
<i>Escherichia coli</i> concentration
optimized sampling
river monitoring
author_facet Manel Naloufi
Françoise S. Lucas
Sami Souihi
Pierre Servais
Aurélie Janne
Thiago Wanderley Matos De Abreu
author_sort Manel Naloufi
title Evaluating the Performance of Machine Learning Approaches to Predict the Microbial Quality of Surface Waters and to Optimize the Sampling Effort
title_short Evaluating the Performance of Machine Learning Approaches to Predict the Microbial Quality of Surface Waters and to Optimize the Sampling Effort
title_full Evaluating the Performance of Machine Learning Approaches to Predict the Microbial Quality of Surface Waters and to Optimize the Sampling Effort
title_fullStr Evaluating the Performance of Machine Learning Approaches to Predict the Microbial Quality of Surface Waters and to Optimize the Sampling Effort
title_full_unstemmed Evaluating the Performance of Machine Learning Approaches to Predict the Microbial Quality of Surface Waters and to Optimize the Sampling Effort
title_sort evaluating the performance of machine learning approaches to predict the microbial quality of surface waters and to optimize the sampling effort
publisher MDPI AG
series Water
issn 2073-4441
publishDate 2021-09-01
description Exposure to contaminated water during aquatic recreational activities can lead to gastrointestinal diseases. In order to decrease the exposure risk, the fecal indicator bacteria <i>Escherichia coli</i> is routinely monitored, which is time-consuming, labor-intensive, and costly. To assist the stakeholders in the daily management of bathing sites, models have been developed to predict the microbiological quality. However, model performances are highly dependent on the quality of the input data which are usually scarce. In our study, we proposed a conceptual framework for optimizing the selection of the most adapted model, and to enrich the training dataset. This frameword was successfully applied to the prediction of <i>Escherichia coli</i> concentrations in the Marne River (Paris Area, France). We compared the performance of six machine learning (ML)-based models: K-nearest neighbors, Decision Tree, Support Vector Machines, Bagging, Random Forest, and Adaptive boosting. Based on several statistical metrics, the Random Forest model presented the best accuracy compared to the other models. However, 53.2 ± 3.5% of the predicted <i>E. coli</i> densities were inaccurately estimated according to the mean absolute percentage error (MAPE). Four parameters (temperature, conductivity, 24 h cumulative rainfall of the previous day the sampling, and the river flow) were identified as key variables to be monitored for optimization of the ML model. The set of values to be optimized will feed an alert system for monitoring the microbiological quality of the water through combined strategy of in situ manual sampling and the deployment of a network of sensors. Based on these results, we propose a guideline for ML model selection and sampling optimization.
topic water quality prediction
machine learning
<i>Escherichia coli</i> concentration
optimized sampling
river monitoring
url https://www.mdpi.com/2073-4441/13/18/2457
work_keys_str_mv AT manelnaloufi evaluatingtheperformanceofmachinelearningapproachestopredictthemicrobialqualityofsurfacewatersandtooptimizethesamplingeffort
AT francoiseslucas evaluatingtheperformanceofmachinelearningapproachestopredictthemicrobialqualityofsurfacewatersandtooptimizethesamplingeffort
AT samisouihi evaluatingtheperformanceofmachinelearningapproachestopredictthemicrobialqualityofsurfacewatersandtooptimizethesamplingeffort
AT pierreservais evaluatingtheperformanceofmachinelearningapproachestopredictthemicrobialqualityofsurfacewatersandtooptimizethesamplingeffort
AT aureliejanne evaluatingtheperformanceofmachinelearningapproachestopredictthemicrobialqualityofsurfacewatersandtooptimizethesamplingeffort
AT thiagowanderleymatosdeabreu evaluatingtheperformanceofmachinelearningapproachestopredictthemicrobialqualityofsurfacewatersandtooptimizethesamplingeffort
_version_ 1716868598470803456