Prediction of Water Quality Classification of the Kelantan River Basin, Malaysia, Using Machine Learning Techniques

Machine Learning (ML) has been used for a long time and has gained wide attention over the last several years. It can handle a large amount of data and allow non-linear structures by using complex mathematical computations. However, traditional ML models do suffer some problems, such as high bias an...

Full description

Bibliographic Details
Main Authors: Malek, N.H.A (Author), Nasir, S.A.M (Author), Shaadan, N. (Author), Yaacob, W.F.W (Author)
Format: Article
Language:English
Published: MDPI 2022
Subjects:
Online Access:View Fulltext in Publisher
LEADER 03365nam a2200577Ia 4500
001 10-3390-w14071067
008 220425s2022 CNT 000 0 und d
020 |a 20734441 (ISSN) 
245 1 0 |a Prediction of Water Quality Classification of the Kelantan River Basin, Malaysia, Using Machine Learning Techniques 
260 0 |b MDPI  |c 2022 
856 |z View Fulltext in Publisher  |u https://doi.org/10.3390/w14071067 
520 3 |a Machine Learning (ML) has been used for a long time and has gained wide attention over the last several years. It can handle a large amount of data and allow non-linear structures by using complex mathematical computations. However, traditional ML models do suffer some problems, such as high bias and overfitting. Therefore, this has resulted in the advancement and improvement of ML techniques, such as the bagging and boosting approach, to address these problems. This study explores a series of ML models to predict the water quality classification (WQC) in the Kelantan River using data from 2005 to 2020. The proposed methodology employed 13 physical and chemical parameters of water quality and 7 ML models that are Decision Tree, Artificial Neural Networks, K-Nearest Neighbors, Naïve Bayes, Support Vector Machine, Random Forest and Gradient Boosting. Based on the analysis, the ensemble model of Gradient Boosting with a learning rate of 0.1 exhibited the best prediction performance compared to the other algorithms. It had the highest accuracy (94.90%), sensitivity (80.00%) and f-measure (86.49%), with the lowest classification error. Total Suspended Solid (TSS) was the most significant variable for the Gradient Boosting (GB) model to predict WQC, followed by Ammoniacal Nitrogen (NH3N), Biochemical Oxygen Demand (BOD) and Chemical Oxygen Demand (COD). Based on the accurate water quality prediction, the results could help to improve the National Environmental Policy regarding water resources by continuously improving water quality. © 2022 by the authors. Licensee MDPI, Basel, Switzerland. 
650 0 4 |a Adaptive boosting 
650 0 4 |a algorithm 
650 0 4 |a Biochemical oxygen demand 
650 0 4 |a decision tree 
650 0 4 |a Decision trees 
650 0 4 |a Environmental protection 
650 0 4 |a Forecasting 
650 0 4 |a gradient boosting 
650 0 4 |a Gradient boosting 
650 0 4 |a Kelantan Basin 
650 0 4 |a Linear regression 
650 0 4 |a machine learning 
650 0 4 |a Machine learning models 
650 0 4 |a Machine learning techniques 
650 0 4 |a Malaysia 
650 0 4 |a methodology 
650 0 4 |a Nearest neighbor search 
650 0 4 |a Neural networks 
650 0 4 |a Quality classification 
650 0 4 |a random forest 
650 0 4 |a Random forests 
650 0 4 |a river basin 
650 0 4 |a River basins 
650 0 4 |a spatiotemporal analysis 
650 0 4 |a supervised machine learning 
650 0 4 |a Supervised machine learning 
650 0 4 |a Support vector machines 
650 0 4 |a water quality 
650 0 4 |a Water quality 
650 0 4 |a water quality class 
650 0 4 |a Water quality class 
650 0 4 |a water quality index 
650 0 4 |a Water quality indexes 
650 0 4 |a Water resources 
700 1 |a Malek, N.H.A.  |e author 
700 1 |a Nasir, S.A.M.  |e author 
700 1 |a Shaadan, N.  |e author 
700 1 |a Yaacob, W.F.W.  |e author 
773 |t Water (Switzerland)