Forecasting groundwater levels using machine learning methods: The case of California’s Central Valley

Groundwater, the second largest stock of freshwater on the planet, is an important water source used for municipal water supply, irrigation, or industrial needs. For instance, California’s arid Central Valley relies on groundwater resources to produce a quarter of the United States’ food demand as f...

Full description

Bibliographic Details
Published in:Journal of Hydrology X
Main Authors: Gabriela May-Lagunes, Valerie Chau, Eric Ellestad, Leyla Greengard, Paolo D'Odorico, Puya Vahabi, Alberto Todeschini, Manuela Girotto
Format: Article
Language:English
Published: Elsevier 2023-12-01
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2589915523000147
_version_ 1850111377443651584
author Gabriela May-Lagunes
Valerie Chau
Eric Ellestad
Leyla Greengard
Paolo D'Odorico
Puya Vahabi
Alberto Todeschini
Manuela Girotto
author_facet Gabriela May-Lagunes
Valerie Chau
Eric Ellestad
Leyla Greengard
Paolo D'Odorico
Puya Vahabi
Alberto Todeschini
Manuela Girotto
author_sort Gabriela May-Lagunes
collection DOAJ
container_title Journal of Hydrology X
description Groundwater, the second largest stock of freshwater on the planet, is an important water source used for municipal water supply, irrigation, or industrial needs. For instance, California’s arid Central Valley relies on groundwater resources to produce a quarter of the United States’ food demand as farmers rely on this precious resource when surface water is scarce. Despite its importance, the nexus between groundwater dynamics and climate drivers remains difficult to quantify, model, and predict because of the lack of a comprehensive observation network. In this study, machine learning techniques were used to predict groundwater levels with a 3-month forecasting horizon for the Sacramento River Basin. For this, publicly available meteorological and hydrological datasets and in-situ well-level measurements were used. Time series, ensemble-based, and deep-learning models including transformers were all tested, with an ensemble-based, XGBoost model, producing the best mean standard deviation percent error (MSPE) of 32.23% and a root mean squared error (RMSE) of 1.05 m (m) when using a 3- month forecasting horizon and when tested using a monthly rolling window over the years 2017–2020. The model proved to be better at predicting into wet months than the dry summer months and was found to be better at extracting seasonality than explaining well-level residuals, with well-specific features, as opposed to exogenous meteorological features specific to the hydrological unit of the well, ranking as the most important features to the model. Though other forecasting horizons were tested, a 3-month look-ahead window resulted in the best balance of precision and accuracy, where smaller forecasting horizons resulted in smaller RMSE but larger MSPE scores and vice-versa for larger forecasting horizons.
format Article
id doaj-art-68cc3ff43fb14e9a94fbd4ab6ea41a4e
institution Directory of Open Access Journals
issn 2589-9155
language English
publishDate 2023-12-01
publisher Elsevier
record_format Article
spelling doaj-art-68cc3ff43fb14e9a94fbd4ab6ea41a4e2025-08-19T23:59:33ZengElsevierJournal of Hydrology X2589-91552023-12-012110016110.1016/j.hydroa.2023.100161Forecasting groundwater levels using machine learning methods: The case of California’s Central ValleyGabriela May-Lagunes0Valerie Chau1Eric Ellestad2Leyla Greengard3Paolo D'Odorico4Puya Vahabi5Alberto Todeschini6Manuela Girotto7University of California, School of Information, Berkeley, CA 94720, USA; University of California, Department of Environmental Science, Policy and Management, Berkeley, CA 94720, USA; Corresponding authors at: University of California, School of Information, Berkeley, CA 94720, USA (Gabriela May-Lagunes).University of California, School of Information, Berkeley, CA 94720, USAUniversity of California, School of Information, Berkeley, CA 94720, USAUniversity of California, School of Information, Berkeley, CA 94720, USA; Corresponding authors at: University of California, School of Information, Berkeley, CA 94720, USA (Gabriela May-Lagunes).University of California, Department of Environmental Science, Policy and Management, Berkeley, CA 94720, USAUniversity of California, School of Information, Berkeley, CA 94720, USAUniversity of California, School of Information, Berkeley, CA 94720, USAUniversity of California, Department of Environmental Science, Policy and Management, Berkeley, CA 94720, USAGroundwater, the second largest stock of freshwater on the planet, is an important water source used for municipal water supply, irrigation, or industrial needs. For instance, California’s arid Central Valley relies on groundwater resources to produce a quarter of the United States’ food demand as farmers rely on this precious resource when surface water is scarce. Despite its importance, the nexus between groundwater dynamics and climate drivers remains difficult to quantify, model, and predict because of the lack of a comprehensive observation network. In this study, machine learning techniques were used to predict groundwater levels with a 3-month forecasting horizon for the Sacramento River Basin. For this, publicly available meteorological and hydrological datasets and in-situ well-level measurements were used. Time series, ensemble-based, and deep-learning models including transformers were all tested, with an ensemble-based, XGBoost model, producing the best mean standard deviation percent error (MSPE) of 32.23% and a root mean squared error (RMSE) of 1.05 m (m) when using a 3- month forecasting horizon and when tested using a monthly rolling window over the years 2017–2020. The model proved to be better at predicting into wet months than the dry summer months and was found to be better at extracting seasonality than explaining well-level residuals, with well-specific features, as opposed to exogenous meteorological features specific to the hydrological unit of the well, ranking as the most important features to the model. Though other forecasting horizons were tested, a 3-month look-ahead window resulted in the best balance of precision and accuracy, where smaller forecasting horizons resulted in smaller RMSE but larger MSPE scores and vice-versa for larger forecasting horizons.http://www.sciencedirect.com/science/article/pii/S2589915523000147GroundwaterWeatherWellsCaliforniaXgboostSupervised learning
spellingShingle Gabriela May-Lagunes
Valerie Chau
Eric Ellestad
Leyla Greengard
Paolo D'Odorico
Puya Vahabi
Alberto Todeschini
Manuela Girotto
Forecasting groundwater levels using machine learning methods: The case of California’s Central Valley
Groundwater
Weather
Wells
California
Xgboost
Supervised learning
title Forecasting groundwater levels using machine learning methods: The case of California’s Central Valley
title_full Forecasting groundwater levels using machine learning methods: The case of California’s Central Valley
title_fullStr Forecasting groundwater levels using machine learning methods: The case of California’s Central Valley
title_full_unstemmed Forecasting groundwater levels using machine learning methods: The case of California’s Central Valley
title_short Forecasting groundwater levels using machine learning methods: The case of California’s Central Valley
title_sort forecasting groundwater levels using machine learning methods the case of california s central valley
topic Groundwater
Weather
Wells
California
Xgboost
Supervised learning
url http://www.sciencedirect.com/science/article/pii/S2589915523000147
work_keys_str_mv AT gabrielamaylagunes forecastinggroundwaterlevelsusingmachinelearningmethodsthecaseofcaliforniascentralvalley
AT valeriechau forecastinggroundwaterlevelsusingmachinelearningmethodsthecaseofcaliforniascentralvalley
AT ericellestad forecastinggroundwaterlevelsusingmachinelearningmethodsthecaseofcaliforniascentralvalley
AT leylagreengard forecastinggroundwaterlevelsusingmachinelearningmethodsthecaseofcaliforniascentralvalley
AT paolododorico forecastinggroundwaterlevelsusingmachinelearningmethodsthecaseofcaliforniascentralvalley
AT puyavahabi forecastinggroundwaterlevelsusingmachinelearningmethodsthecaseofcaliforniascentralvalley
AT albertotodeschini forecastinggroundwaterlevelsusingmachinelearningmethodsthecaseofcaliforniascentralvalley
AT manuelagirotto forecastinggroundwaterlevelsusingmachinelearningmethodsthecaseofcaliforniascentralvalley