Proposition of New Ensemble Data-Intelligence Models for Surface Water Quality Prediction

An accurate prediction of water quality (WQ) related parameters is considered as pivotal decisive tool in sustainable water resources management. In this study, five different ensemble machine learning (ML) models including Quantile regression forest (QRF), Random Forest (RF), radial support vector...

Full description

Bibliographic Details
Main Authors: Ali Omran Al-Sulttani, Mustafa Al-Mukhtar, Ali B. Roomi, Aitazaz Ahsan Farooque, Khaled Mohamed Khedher, Zaher Mundher Yaseen
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9497111/
id doaj-c3411a181de944a5a5b557c8922024e7
record_format Article
spelling doaj-c3411a181de944a5a5b557c8922024e72021-08-09T23:00:58ZengIEEEIEEE Access2169-35362021-01-01910852710854110.1109/ACCESS.2021.31004909497111Proposition of New Ensemble Data-Intelligence Models for Surface Water Quality PredictionAli Omran Al-Sulttani0https://orcid.org/0000-0001-8734-5287Mustafa Al-Mukhtar1https://orcid.org/0000-0002-8850-0899Ali B. Roomi2https://orcid.org/0000-0002-5107-5550Aitazaz Ahsan Farooque3https://orcid.org/0000-0002-5353-6752Khaled Mohamed Khedher4https://orcid.org/0000-0002-4167-1690Zaher Mundher Yaseen5https://orcid.org/0000-0003-3647-7137Department of Water Resources Engineering, College of Engineering, University of Baghdad, Baghdad, IraqCivil Engineering Department, University of Technology, Baghdad, IraqMinistry of Education, Directorate of Education Thi-Qar, Thi-Qar, IraqFaculty of Sustainable Design Engineering, University of Prince Edward Island, Charlottetown, CanadaDepartment of Civil Engineering, College of Engineering, King Khalid University, Abha, Saudi ArabiaNew Era and Development in Civil Engineering Research Group, Scientific Research Center, Al-Ayen University, Thi-Qar, IraqAn accurate prediction of water quality (WQ) related parameters is considered as pivotal decisive tool in sustainable water resources management. In this study, five different ensemble machine learning (ML) models including Quantile regression forest (QRF), Random Forest (RF), radial support vector machine (SVM), Stochastic Gradient Boosting (GBM) and Gradient Boosting Machines (GBM&#x005F;H2O) were developed to predict the monthly biochemical oxygen demand (BOD) values of the Euphrates River, Iraq. For this aim, monthly average data of water temperature (T), Turbidity, pH, Electrical Conductivity (EC), Alkalinity (Alk), Calcium (Ca), chemical oxygen demand (COD), Sulfate (SO<sub>4</sub>), total dissolved solids (TDS), total suspended solids (TSS), and BOD measured for ten years period were used in this study. The performances of these standalone models were compared with integrative models developed by coupling the applied ML models with two different feature extraction algorithms i.e., Genetic Algorithm (GA) and Principal Components Analysis (PCA). The reliability of the applied models was evaluated based on the statistical performance criteria of determination coefficient (R<sup>2</sup>), root mean square error (RMSE), mean absolute error (MAE), Nash-Sutcliffe model efficiency coefficient (NSE), Willmott index (d), and percent bias (PBIAS). Results showed that among the developed models, QRF model attained the superior performance. The performance of the evaluated models presented in this study proved that the developed integrative PCA-QRF model presented much better performance compared with the standalone ones and with those integrated with GA. The statistical criteria of R<sup>2</sup>, RMSE, MAE, NSE, d, and PBIAS of PCA-QRF were 0.94, 0.12, 0.05, 0.93, 0.98, and 0.3, respectively.https://ieeexplore.ieee.org/document/9497111/Semi-arid regionriver water qualitybiochemical oxygen demandprincipal component analysis
collection DOAJ
language English
format Article
sources DOAJ
author Ali Omran Al-Sulttani
Mustafa Al-Mukhtar
Ali B. Roomi
Aitazaz Ahsan Farooque
Khaled Mohamed Khedher
Zaher Mundher Yaseen
spellingShingle Ali Omran Al-Sulttani
Mustafa Al-Mukhtar
Ali B. Roomi
Aitazaz Ahsan Farooque
Khaled Mohamed Khedher
Zaher Mundher Yaseen
Proposition of New Ensemble Data-Intelligence Models for Surface Water Quality Prediction
IEEE Access
Semi-arid region
river water quality
biochemical oxygen demand
principal component analysis
author_facet Ali Omran Al-Sulttani
Mustafa Al-Mukhtar
Ali B. Roomi
Aitazaz Ahsan Farooque
Khaled Mohamed Khedher
Zaher Mundher Yaseen
author_sort Ali Omran Al-Sulttani
title Proposition of New Ensemble Data-Intelligence Models for Surface Water Quality Prediction
title_short Proposition of New Ensemble Data-Intelligence Models for Surface Water Quality Prediction
title_full Proposition of New Ensemble Data-Intelligence Models for Surface Water Quality Prediction
title_fullStr Proposition of New Ensemble Data-Intelligence Models for Surface Water Quality Prediction
title_full_unstemmed Proposition of New Ensemble Data-Intelligence Models for Surface Water Quality Prediction
title_sort proposition of new ensemble data-intelligence models for surface water quality prediction
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2021-01-01
description An accurate prediction of water quality (WQ) related parameters is considered as pivotal decisive tool in sustainable water resources management. In this study, five different ensemble machine learning (ML) models including Quantile regression forest (QRF), Random Forest (RF), radial support vector machine (SVM), Stochastic Gradient Boosting (GBM) and Gradient Boosting Machines (GBM&#x005F;H2O) were developed to predict the monthly biochemical oxygen demand (BOD) values of the Euphrates River, Iraq. For this aim, monthly average data of water temperature (T), Turbidity, pH, Electrical Conductivity (EC), Alkalinity (Alk), Calcium (Ca), chemical oxygen demand (COD), Sulfate (SO<sub>4</sub>), total dissolved solids (TDS), total suspended solids (TSS), and BOD measured for ten years period were used in this study. The performances of these standalone models were compared with integrative models developed by coupling the applied ML models with two different feature extraction algorithms i.e., Genetic Algorithm (GA) and Principal Components Analysis (PCA). The reliability of the applied models was evaluated based on the statistical performance criteria of determination coefficient (R<sup>2</sup>), root mean square error (RMSE), mean absolute error (MAE), Nash-Sutcliffe model efficiency coefficient (NSE), Willmott index (d), and percent bias (PBIAS). Results showed that among the developed models, QRF model attained the superior performance. The performance of the evaluated models presented in this study proved that the developed integrative PCA-QRF model presented much better performance compared with the standalone ones and with those integrated with GA. The statistical criteria of R<sup>2</sup>, RMSE, MAE, NSE, d, and PBIAS of PCA-QRF were 0.94, 0.12, 0.05, 0.93, 0.98, and 0.3, respectively.
topic Semi-arid region
river water quality
biochemical oxygen demand
principal component analysis
url https://ieeexplore.ieee.org/document/9497111/
work_keys_str_mv AT aliomranalsulttani propositionofnewensembledataintelligencemodelsforsurfacewaterqualityprediction
AT mustafaalmukhtar propositionofnewensembledataintelligencemodelsforsurfacewaterqualityprediction
AT alibroomi propositionofnewensembledataintelligencemodelsforsurfacewaterqualityprediction
AT aitazazahsanfarooque propositionofnewensembledataintelligencemodelsforsurfacewaterqualityprediction
AT khaledmohamedkhedher propositionofnewensembledataintelligencemodelsforsurfacewaterqualityprediction
AT zahermundheryaseen propositionofnewensembledataintelligencemodelsforsurfacewaterqualityprediction
_version_ 1721213458695847936