Proposition of New Ensemble Data-Intelligence Models for Surface Water Quality Prediction

An accurate prediction of water quality (WQ) related parameters is considered as pivotal decisive tool in sustainable water resources management. In this study, five different ensemble machine learning (ML) models including Quantile regression forest (QRF), Random Forest (RF), radial support vector...

Full description

Bibliographic Details
Main Authors: Ali Omran Al-Sulttani, Mustafa Al-Mukhtar, Ali B. Roomi, Aitazaz Ahsan Farooque, Khaled Mohamed Khedher, Zaher Mundher Yaseen
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9497111/
Description
Summary:An accurate prediction of water quality (WQ) related parameters is considered as pivotal decisive tool in sustainable water resources management. In this study, five different ensemble machine learning (ML) models including Quantile regression forest (QRF), Random Forest (RF), radial support vector machine (SVM), Stochastic Gradient Boosting (GBM) and Gradient Boosting Machines (GBM&#x005F;H2O) were developed to predict the monthly biochemical oxygen demand (BOD) values of the Euphrates River, Iraq. For this aim, monthly average data of water temperature (T), Turbidity, pH, Electrical Conductivity (EC), Alkalinity (Alk), Calcium (Ca), chemical oxygen demand (COD), Sulfate (SO<sub>4</sub>), total dissolved solids (TDS), total suspended solids (TSS), and BOD measured for ten years period were used in this study. The performances of these standalone models were compared with integrative models developed by coupling the applied ML models with two different feature extraction algorithms i.e., Genetic Algorithm (GA) and Principal Components Analysis (PCA). The reliability of the applied models was evaluated based on the statistical performance criteria of determination coefficient (R<sup>2</sup>), root mean square error (RMSE), mean absolute error (MAE), Nash-Sutcliffe model efficiency coefficient (NSE), Willmott index (d), and percent bias (PBIAS). Results showed that among the developed models, QRF model attained the superior performance. The performance of the evaluated models presented in this study proved that the developed integrative PCA-QRF model presented much better performance compared with the standalone ones and with those integrated with GA. The statistical criteria of R<sup>2</sup>, RMSE, MAE, NSE, d, and PBIAS of PCA-QRF were 0.94, 0.12, 0.05, 0.93, 0.98, and 0.3, respectively.
ISSN:2169-3536