Ensemble-based deep learning for estimating PM2.5 over California with multisource big data including wildfire smoke

Introduction: Estimating PM2.5 concentrations and their prediction uncertainties at a high spatiotemporal resolution is important for air pollution health effect studies. This is particularly challenging for California, which has high variability in natural (e.g, wildfires, dust) and anthropogenic e...

Full description

Bibliographic Details
Main Authors: Lianfa Li, Mariam Girguis, Frederick Lurmann, Nathan Pavlovic, Crystal McClure, Meredith Franklin, Jun Wu, Luke D. Oman, Carrie Breton, Frank Gilliland, Rima Habre
Format: Article
Language:English
Published: Elsevier 2020-12-01
Series:Environment International
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S0160412020320985
id doaj-10b913da1bb64b33ac9ad378ab75942c
record_format Article
collection DOAJ
language English
format Article
sources DOAJ
author Lianfa Li
Mariam Girguis
Frederick Lurmann
Nathan Pavlovic
Crystal McClure
Meredith Franklin
Jun Wu
Luke D. Oman
Carrie Breton
Frank Gilliland
Rima Habre
spellingShingle Lianfa Li
Mariam Girguis
Frederick Lurmann
Nathan Pavlovic
Crystal McClure
Meredith Franklin
Jun Wu
Luke D. Oman
Carrie Breton
Frank Gilliland
Rima Habre
Ensemble-based deep learning for estimating PM2.5 over California with multisource big data including wildfire smoke
Environment International
PM2.5
Machine learning
Air pollution exposure
Wildfires
Remote sensing
California
author_facet Lianfa Li
Mariam Girguis
Frederick Lurmann
Nathan Pavlovic
Crystal McClure
Meredith Franklin
Jun Wu
Luke D. Oman
Carrie Breton
Frank Gilliland
Rima Habre
author_sort Lianfa Li
title Ensemble-based deep learning for estimating PM2.5 over California with multisource big data including wildfire smoke
title_short Ensemble-based deep learning for estimating PM2.5 over California with multisource big data including wildfire smoke
title_full Ensemble-based deep learning for estimating PM2.5 over California with multisource big data including wildfire smoke
title_fullStr Ensemble-based deep learning for estimating PM2.5 over California with multisource big data including wildfire smoke
title_full_unstemmed Ensemble-based deep learning for estimating PM2.5 over California with multisource big data including wildfire smoke
title_sort ensemble-based deep learning for estimating pm2.5 over california with multisource big data including wildfire smoke
publisher Elsevier
series Environment International
issn 0160-4120
publishDate 2020-12-01
description Introduction: Estimating PM2.5 concentrations and their prediction uncertainties at a high spatiotemporal resolution is important for air pollution health effect studies. This is particularly challenging for California, which has high variability in natural (e.g, wildfires, dust) and anthropogenic emissions, meteorology, topography (e.g. desert surfaces, mountains, snow cover) and land use. Methods: Using ensemble-based deep learning with big data fused from multiple sources we developed a PM2.5 prediction model with uncertainty estimates at a high spatial (1 km × 1 km) and temporal (weekly) resolution for a 10-year time span (2008–2017). We leveraged autoencoder-based full residual deep networks to model complex nonlinear interrelationships among PM2.5 emission, transport and dispersion factors and other influential features. These included remote sensing data (MAIAC aerosol optical depth (AOD), normalized difference vegetation index, impervious surface), MERRA-2 GMI Replay Simulation (M2GMI) output, wildfire smoke plume dispersion, meteorology, land cover, traffic, elevation, and spatiotemporal trends (geo-coordinates, temporal basis functions, time index). As one of the primary predictors of interest with substantial missing data in California related to bright surfaces, cloud cover and other known interferences, missing MAIAC AOD observations were imputed and adjusted for relative humidity and vertical distribution. Wildfire smoke contribution to PM2.5 was also calculated through HYSPLIT dispersion modeling of smoke emissions derived from MODIS fire radiative power using the Fire Energetics and Emissions Research version 1.0 model. Results: Ensemble deep learning to predict PM2.5 achieved an overall mean training RMSE of 1.54 μg/m3 (R2: 0.94) and test RMSE of 2.29 μg/m3 (R2: 0.87). The top predictors included M2GMI carbon monoxide mixing ratio in the bottom layer, temporal basis functions, spatial location, air temperature, MAIAC AOD, and PM2.5 sea salt mass concentration. In an independent test using three long-term AQS sites and one short-term non-AQS site, our model achieved a high correlation (>0.8) and a low RMSE (<3 μg/m3). Statewide predictions indicated that our model can capture the spatial distribution and temporal peaks in wildfire-related PM2.5. The coefficient of variation indicated highest uncertainty over deciduous and mixed forests and open water land covers. Conclusion: Our method can be generalized to other regions, including those having a mix of major urban areas, deserts, intensive smoke events, snow cover and complex terrains, where PM2.5 has previously been challenging to predict. Prediction uncertainty estimates can also inform further model development and measurement error evaluations in exposure and health studies.
topic PM2.5
Machine learning
Air pollution exposure
Wildfires
Remote sensing
California
url http://www.sciencedirect.com/science/article/pii/S0160412020320985
work_keys_str_mv AT lianfali ensemblebaseddeeplearningforestimatingpm25overcaliforniawithmultisourcebigdataincludingwildfiresmoke
AT mariamgirguis ensemblebaseddeeplearningforestimatingpm25overcaliforniawithmultisourcebigdataincludingwildfiresmoke
AT fredericklurmann ensemblebaseddeeplearningforestimatingpm25overcaliforniawithmultisourcebigdataincludingwildfiresmoke
AT nathanpavlovic ensemblebaseddeeplearningforestimatingpm25overcaliforniawithmultisourcebigdataincludingwildfiresmoke
AT crystalmcclure ensemblebaseddeeplearningforestimatingpm25overcaliforniawithmultisourcebigdataincludingwildfiresmoke
AT meredithfranklin ensemblebaseddeeplearningforestimatingpm25overcaliforniawithmultisourcebigdataincludingwildfiresmoke
AT junwu ensemblebaseddeeplearningforestimatingpm25overcaliforniawithmultisourcebigdataincludingwildfiresmoke
AT lukedoman ensemblebaseddeeplearningforestimatingpm25overcaliforniawithmultisourcebigdataincludingwildfiresmoke
AT carriebreton ensemblebaseddeeplearningforestimatingpm25overcaliforniawithmultisourcebigdataincludingwildfiresmoke
AT frankgilliland ensemblebaseddeeplearningforestimatingpm25overcaliforniawithmultisourcebigdataincludingwildfiresmoke
AT rimahabre ensemblebaseddeeplearningforestimatingpm25overcaliforniawithmultisourcebigdataincludingwildfiresmoke
_version_ 1724484225881604096
spelling doaj-10b913da1bb64b33ac9ad378ab75942c2020-11-25T03:52:07ZengElsevierEnvironment International0160-41202020-12-01145106143Ensemble-based deep learning for estimating PM2.5 over California with multisource big data including wildfire smokeLianfa Li0Mariam Girguis1Frederick Lurmann2Nathan Pavlovic3Crystal McClure4Meredith Franklin5Jun Wu6Luke D. Oman7Carrie Breton8Frank Gilliland9Rima Habre10Department of Preventive Medicine, University of Southern California, Los Angeles, CA, USA; State Key Laboratory of Resources and Environmental Information System, Institute of Geographical Sciences and Natural Resources, Chinese Academy of Sciences, Beijing, China; Corresponding authors at: Division of Environmental Health, USC Keck School of Medicine, 2001 N. Soto Street, Suite 102, Los Angeles, CA 90089, USA.Department of Preventive Medicine, University of Southern California, Los Angeles, CA, USASonoma Technology, Inc., Petaluma, CA, USASonoma Technology, Inc., Petaluma, CA, USASonoma Technology, Inc., Petaluma, CA, USADepartment of Preventive Medicine, University of Southern California, Los Angeles, CA, USAProgram in Public Health, Susan and Henry Samueli College of Health Sciences, University of California, Irvine, CA, USAGoddard Space Flight Center, National Aeronautics and Space Administration, Greenbelt, MD, USADepartment of Preventive Medicine, University of Southern California, Los Angeles, CA, USADepartment of Preventive Medicine, University of Southern California, Los Angeles, CA, USADepartment of Preventive Medicine, University of Southern California, Los Angeles, CA, USA; Corresponding authors at: Division of Environmental Health, USC Keck School of Medicine, 2001 N. Soto Street, Suite 102, Los Angeles, CA 90089, USA.Introduction: Estimating PM2.5 concentrations and their prediction uncertainties at a high spatiotemporal resolution is important for air pollution health effect studies. This is particularly challenging for California, which has high variability in natural (e.g, wildfires, dust) and anthropogenic emissions, meteorology, topography (e.g. desert surfaces, mountains, snow cover) and land use. Methods: Using ensemble-based deep learning with big data fused from multiple sources we developed a PM2.5 prediction model with uncertainty estimates at a high spatial (1 km × 1 km) and temporal (weekly) resolution for a 10-year time span (2008–2017). We leveraged autoencoder-based full residual deep networks to model complex nonlinear interrelationships among PM2.5 emission, transport and dispersion factors and other influential features. These included remote sensing data (MAIAC aerosol optical depth (AOD), normalized difference vegetation index, impervious surface), MERRA-2 GMI Replay Simulation (M2GMI) output, wildfire smoke plume dispersion, meteorology, land cover, traffic, elevation, and spatiotemporal trends (geo-coordinates, temporal basis functions, time index). As one of the primary predictors of interest with substantial missing data in California related to bright surfaces, cloud cover and other known interferences, missing MAIAC AOD observations were imputed and adjusted for relative humidity and vertical distribution. Wildfire smoke contribution to PM2.5 was also calculated through HYSPLIT dispersion modeling of smoke emissions derived from MODIS fire radiative power using the Fire Energetics and Emissions Research version 1.0 model. Results: Ensemble deep learning to predict PM2.5 achieved an overall mean training RMSE of 1.54 μg/m3 (R2: 0.94) and test RMSE of 2.29 μg/m3 (R2: 0.87). The top predictors included M2GMI carbon monoxide mixing ratio in the bottom layer, temporal basis functions, spatial location, air temperature, MAIAC AOD, and PM2.5 sea salt mass concentration. In an independent test using three long-term AQS sites and one short-term non-AQS site, our model achieved a high correlation (>0.8) and a low RMSE (<3 μg/m3). Statewide predictions indicated that our model can capture the spatial distribution and temporal peaks in wildfire-related PM2.5. The coefficient of variation indicated highest uncertainty over deciduous and mixed forests and open water land covers. Conclusion: Our method can be generalized to other regions, including those having a mix of major urban areas, deserts, intensive smoke events, snow cover and complex terrains, where PM2.5 has previously been challenging to predict. Prediction uncertainty estimates can also inform further model development and measurement error evaluations in exposure and health studies.http://www.sciencedirect.com/science/article/pii/S0160412020320985PM2.5Machine learningAir pollution exposureWildfiresRemote sensingCalifornia