An ensemble-based model of PM2.5 concentration across the contiguous United States with high spatiotemporal resolution

Various approaches have been proposed to model PM2.5 in the recent decade, with satellite-derived aerosol optical depth, land-use variables, chemical transport model predictions, and several meteorological variables as major predictor variables. Our study used an ensemble model that integrated multi...

Full description

Bibliographic Details
Main Authors: Qian Di, Heresh Amini, Liuhua Shi, Itai Kloog, Rachel Silvern, James Kelly, M. Benjamin Sabath, Christine Choirat, Petros Koutrakis, Alexei Lyapustin, Yujie Wang, Loretta J. Mickley, Joel Schwartz
Format: Article
Language:English
Published: Elsevier 2019-09-01
Series:Environment International
Online Access:http://www.sciencedirect.com/science/article/pii/S0160412019300650
id doaj-b0907a6f08bc44f3ac0719554df4e069
record_format Article
collection DOAJ
language English
format Article
sources DOAJ
author Qian Di
Heresh Amini
Liuhua Shi
Itai Kloog
Rachel Silvern
James Kelly
M. Benjamin Sabath
Christine Choirat
Petros Koutrakis
Alexei Lyapustin
Yujie Wang
Loretta J. Mickley
Joel Schwartz
spellingShingle Qian Di
Heresh Amini
Liuhua Shi
Itai Kloog
Rachel Silvern
James Kelly
M. Benjamin Sabath
Christine Choirat
Petros Koutrakis
Alexei Lyapustin
Yujie Wang
Loretta J. Mickley
Joel Schwartz
An ensemble-based model of PM2.5 concentration across the contiguous United States with high spatiotemporal resolution
Environment International
author_facet Qian Di
Heresh Amini
Liuhua Shi
Itai Kloog
Rachel Silvern
James Kelly
M. Benjamin Sabath
Christine Choirat
Petros Koutrakis
Alexei Lyapustin
Yujie Wang
Loretta J. Mickley
Joel Schwartz
author_sort Qian Di
title An ensemble-based model of PM2.5 concentration across the contiguous United States with high spatiotemporal resolution
title_short An ensemble-based model of PM2.5 concentration across the contiguous United States with high spatiotemporal resolution
title_full An ensemble-based model of PM2.5 concentration across the contiguous United States with high spatiotemporal resolution
title_fullStr An ensemble-based model of PM2.5 concentration across the contiguous United States with high spatiotemporal resolution
title_full_unstemmed An ensemble-based model of PM2.5 concentration across the contiguous United States with high spatiotemporal resolution
title_sort ensemble-based model of pm2.5 concentration across the contiguous united states with high spatiotemporal resolution
publisher Elsevier
series Environment International
issn 0160-4120
publishDate 2019-09-01
description Various approaches have been proposed to model PM2.5 in the recent decade, with satellite-derived aerosol optical depth, land-use variables, chemical transport model predictions, and several meteorological variables as major predictor variables. Our study used an ensemble model that integrated multiple machine learning algorithms and predictor variables to estimate daily PM2.5 at a resolution of 1 km × 1 km across the contiguous United States. We used a generalized additive model that accounted for geographic difference to combine PM2.5 estimates from neural network, random forest, and gradient boosting. The three machine learning algorithms were based on multiple predictor variables, including satellite data, meteorological variables, land-use variables, elevation, chemical transport model predictions, several reanalysis datasets, and others. The model training results from 2000 to 2015 indicated good model performance with a 10-fold cross-validated R2 of 0.86 for daily PM2.5 predictions. For annual PM2.5 estimates, the cross-validated R2 was 0.89. Our model demonstrated good performance up to 60 μg/m3. Using trained PM2.5 model and predictor variables, we predicted daily PM2.5 from 2000 to 2015 at every 1 km × 1 km grid cell in the contiguous United States. We also used localized land-use variables within 1 km × 1 km grids to downscale PM2.5 predictions to 100 m × 100 m grid cells. To characterize uncertainty, we used meteorological variables, land-use variables, and elevation to model the monthly standard deviation of the difference between daily monitored and predicted PM2.5 for every 1 km × 1 km grid cell. This PM2.5 prediction dataset, including the downscaled and uncertainty predictions, allows epidemiologists to accurately estimate the adverse health effect of PM2.5. Compared with model performance of individual base learners, an ensemble model would achieve a better overall estimation. It is worth exploring other ensemble model formats to synthesize estimations from different models or from different groups to improve overall performance. Keywords: Fine particulate matter (PM2.5), Ensemble model, Neural network, Gradient boosting, Random forest
url http://www.sciencedirect.com/science/article/pii/S0160412019300650
work_keys_str_mv AT qiandi anensemblebasedmodelofpm25concentrationacrossthecontiguousunitedstateswithhighspatiotemporalresolution
AT hereshamini anensemblebasedmodelofpm25concentrationacrossthecontiguousunitedstateswithhighspatiotemporalresolution
AT liuhuashi anensemblebasedmodelofpm25concentrationacrossthecontiguousunitedstateswithhighspatiotemporalresolution
AT itaikloog anensemblebasedmodelofpm25concentrationacrossthecontiguousunitedstateswithhighspatiotemporalresolution
AT rachelsilvern anensemblebasedmodelofpm25concentrationacrossthecontiguousunitedstateswithhighspatiotemporalresolution
AT jameskelly anensemblebasedmodelofpm25concentrationacrossthecontiguousunitedstateswithhighspatiotemporalresolution
AT mbenjaminsabath anensemblebasedmodelofpm25concentrationacrossthecontiguousunitedstateswithhighspatiotemporalresolution
AT christinechoirat anensemblebasedmodelofpm25concentrationacrossthecontiguousunitedstateswithhighspatiotemporalresolution
AT petroskoutrakis anensemblebasedmodelofpm25concentrationacrossthecontiguousunitedstateswithhighspatiotemporalresolution
AT alexeilyapustin anensemblebasedmodelofpm25concentrationacrossthecontiguousunitedstateswithhighspatiotemporalresolution
AT yujiewang anensemblebasedmodelofpm25concentrationacrossthecontiguousunitedstateswithhighspatiotemporalresolution
AT lorettajmickley anensemblebasedmodelofpm25concentrationacrossthecontiguousunitedstateswithhighspatiotemporalresolution
AT joelschwartz anensemblebasedmodelofpm25concentrationacrossthecontiguousunitedstateswithhighspatiotemporalresolution
AT qiandi ensemblebasedmodelofpm25concentrationacrossthecontiguousunitedstateswithhighspatiotemporalresolution
AT hereshamini ensemblebasedmodelofpm25concentrationacrossthecontiguousunitedstateswithhighspatiotemporalresolution
AT liuhuashi ensemblebasedmodelofpm25concentrationacrossthecontiguousunitedstateswithhighspatiotemporalresolution
AT itaikloog ensemblebasedmodelofpm25concentrationacrossthecontiguousunitedstateswithhighspatiotemporalresolution
AT rachelsilvern ensemblebasedmodelofpm25concentrationacrossthecontiguousunitedstateswithhighspatiotemporalresolution
AT jameskelly ensemblebasedmodelofpm25concentrationacrossthecontiguousunitedstateswithhighspatiotemporalresolution
AT mbenjaminsabath ensemblebasedmodelofpm25concentrationacrossthecontiguousunitedstateswithhighspatiotemporalresolution
AT christinechoirat ensemblebasedmodelofpm25concentrationacrossthecontiguousunitedstateswithhighspatiotemporalresolution
AT petroskoutrakis ensemblebasedmodelofpm25concentrationacrossthecontiguousunitedstateswithhighspatiotemporalresolution
AT alexeilyapustin ensemblebasedmodelofpm25concentrationacrossthecontiguousunitedstateswithhighspatiotemporalresolution
AT yujiewang ensemblebasedmodelofpm25concentrationacrossthecontiguousunitedstateswithhighspatiotemporalresolution
AT lorettajmickley ensemblebasedmodelofpm25concentrationacrossthecontiguousunitedstateswithhighspatiotemporalresolution
AT joelschwartz ensemblebasedmodelofpm25concentrationacrossthecontiguousunitedstateswithhighspatiotemporalresolution
_version_ 1725068377688375296
spelling doaj-b0907a6f08bc44f3ac0719554df4e0692020-11-25T01:35:07ZengElsevierEnvironment International0160-41202019-09-01130An ensemble-based model of PM2.5 concentration across the contiguous United States with high spatiotemporal resolutionQian Di0Heresh Amini1Liuhua Shi2Itai Kloog3Rachel Silvern4James Kelly5M. Benjamin Sabath6Christine Choirat7Petros Koutrakis8Alexei Lyapustin9Yujie Wang10Loretta J. Mickley11Joel Schwartz12Department of Environmental Health, Harvard T.H. Chan School of Public Heath, Boston, MA, United States; Research Center for Public Health, Tsinghua University, Beijing, China; Corresponding author at: Research Center for Public Health, Tsinghua University, Beijing, China.Department of Environmental Health, Harvard T.H. Chan School of Public Heath, Boston, MA, United StatesDepartment of Environmental Health, Harvard T.H. Chan School of Public Heath, Boston, MA, United StatesDepartment of Geography and Environmental Development, Ben-Gurion University of the Negev, Beer Sheva, IsraelDepartment of Earth and Planetary Sciences, Harvard University, Cambridge, MA, United StatesU.S. Environmental Protection Agency, Office of Air Quality Planning & Standards, Research Triangle Park, NC, United StatesDepartment of Biostatistics, Harvard T.H. Chan School of Public Heath, Boston, MA, United StatesDepartment of Biostatistics, Harvard T.H. Chan School of Public Heath, Boston, MA, United StatesDepartment of Environmental Health, Harvard T.H. Chan School of Public Heath, Boston, MA, United StatesNASA Goddard Space Flight Center, Greenbelt, MD, United StatesUniversity of Maryland, Baltimore County, Baltimore, MD, United StatesJohn A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, United StatesDepartment of Environmental Health, Harvard T.H. Chan School of Public Heath, Boston, MA, United StatesVarious approaches have been proposed to model PM2.5 in the recent decade, with satellite-derived aerosol optical depth, land-use variables, chemical transport model predictions, and several meteorological variables as major predictor variables. Our study used an ensemble model that integrated multiple machine learning algorithms and predictor variables to estimate daily PM2.5 at a resolution of 1 km × 1 km across the contiguous United States. We used a generalized additive model that accounted for geographic difference to combine PM2.5 estimates from neural network, random forest, and gradient boosting. The three machine learning algorithms were based on multiple predictor variables, including satellite data, meteorological variables, land-use variables, elevation, chemical transport model predictions, several reanalysis datasets, and others. The model training results from 2000 to 2015 indicated good model performance with a 10-fold cross-validated R2 of 0.86 for daily PM2.5 predictions. For annual PM2.5 estimates, the cross-validated R2 was 0.89. Our model demonstrated good performance up to 60 μg/m3. Using trained PM2.5 model and predictor variables, we predicted daily PM2.5 from 2000 to 2015 at every 1 km × 1 km grid cell in the contiguous United States. We also used localized land-use variables within 1 km × 1 km grids to downscale PM2.5 predictions to 100 m × 100 m grid cells. To characterize uncertainty, we used meteorological variables, land-use variables, and elevation to model the monthly standard deviation of the difference between daily monitored and predicted PM2.5 for every 1 km × 1 km grid cell. This PM2.5 prediction dataset, including the downscaled and uncertainty predictions, allows epidemiologists to accurately estimate the adverse health effect of PM2.5. Compared with model performance of individual base learners, an ensemble model would achieve a better overall estimation. It is worth exploring other ensemble model formats to synthesize estimations from different models or from different groups to improve overall performance. Keywords: Fine particulate matter (PM2.5), Ensemble model, Neural network, Gradient boosting, Random foresthttp://www.sciencedirect.com/science/article/pii/S0160412019300650