PM<sub>2.5</sub> Prediction Based on Random Forest, XGBoost, and Deep Learning Using Multisource Remote Sensing Data

In recent years, air pollution has become an important public health concern. The high concentration of fine particulate matter with diameter less than 2.5 &#181;m (PM<sub>2.5</sub>) is known to be associated with lung cancer, cardiovascular disease, respiratory disease, and metaboli...

Full description

Bibliographic Details
Main Authors: Mehdi Zamani Joharestani, Chunxiang Cao, Xiliang Ni, Barjeece Bashir, Somayeh Talebiesfandarani
Format: Article
Language:English
Published: MDPI AG 2019-07-01
Series:Atmosphere
Subjects:
Online Access:https://www.mdpi.com/2073-4433/10/7/373
id doaj-4cbb58fae9a7449a898c87b9ceedb64a
record_format Article
spelling doaj-4cbb58fae9a7449a898c87b9ceedb64a2020-11-24T21:27:36ZengMDPI AGAtmosphere2073-44332019-07-0110737310.3390/atmos10070373atmos10070373PM<sub>2.5</sub> Prediction Based on Random Forest, XGBoost, and Deep Learning Using Multisource Remote Sensing DataMehdi Zamani Joharestani0Chunxiang Cao1Xiliang Ni2Barjeece Bashir3Somayeh Talebiesfandarani4State Key Laboratory of Remote Sensing Science, Institute of Remote Sensing and Digital Earth, Chinese Academy of Sciences, Beijing 100101, ChinaState Key Laboratory of Remote Sensing Science, Institute of Remote Sensing and Digital Earth, Chinese Academy of Sciences, Beijing 100101, ChinaState Key Laboratory of Remote Sensing Science, Institute of Remote Sensing and Digital Earth, Chinese Academy of Sciences, Beijing 100101, ChinaState Key Laboratory of Remote Sensing Science, Institute of Remote Sensing and Digital Earth, Chinese Academy of Sciences, Beijing 100101, ChinaState Key Laboratory of Remote Sensing Science, Institute of Remote Sensing and Digital Earth, Chinese Academy of Sciences, Beijing 100101, ChinaIn recent years, air pollution has become an important public health concern. The high concentration of fine particulate matter with diameter less than 2.5 &#181;m (PM<sub>2.5</sub>) is known to be associated with lung cancer, cardiovascular disease, respiratory disease, and metabolic disease. Predicting PM<sub>2.5</sub> concentrations can help governments warn people at high risk, thus mitigating the complications. Although attempts have been made to predict PM<sub>2.5</sub> concentrations, the factors influencing PM<sub>2.5</sub> prediction have not been investigated. In this work, we study feature importance for PM<sub>2.5</sub> prediction in Tehran&#8217;s urban area, implementing random forest, extreme gradient boosting, and deep learning machine learning (ML) approaches. We use 23 features, including satellite and meteorological data, ground-measured PM<sub>2.5</sub>, and geographical data, in the modeling. The best model performance obtained was R<sup>2</sup> = 0.81 (R = 0.9), MAE = 9.93 &#181;g/m<sup>3</sup>, and RMSE = 13.58 &#181;g/m<sup>3</sup> using the XGBoost approach, incorporating elimination of unimportant features. However, all three ML methods performed similarly and R<sup>2</sup> varied from 0.63 to 0.67, when Aerosol Optical Depth (AOD) at 3 km resolution was included, and 0.77 to 0.81, when AOD at 3 km resolution was excluded. Contrary to the PM<sub>2.5</sub> lag data, satellite-derived AODs did not improve model performance.https://www.mdpi.com/2073-4433/10/7/373PM<sub>2.5</sub>predictionXGBoostrandom forestdeep leaningfeature importance
collection DOAJ
language English
format Article
sources DOAJ
author Mehdi Zamani Joharestani
Chunxiang Cao
Xiliang Ni
Barjeece Bashir
Somayeh Talebiesfandarani
spellingShingle Mehdi Zamani Joharestani
Chunxiang Cao
Xiliang Ni
Barjeece Bashir
Somayeh Talebiesfandarani
PM<sub>2.5</sub> Prediction Based on Random Forest, XGBoost, and Deep Learning Using Multisource Remote Sensing Data
Atmosphere
PM<sub>2.5</sub>
prediction
XGBoost
random forest
deep leaning
feature importance
author_facet Mehdi Zamani Joharestani
Chunxiang Cao
Xiliang Ni
Barjeece Bashir
Somayeh Talebiesfandarani
author_sort Mehdi Zamani Joharestani
title PM<sub>2.5</sub> Prediction Based on Random Forest, XGBoost, and Deep Learning Using Multisource Remote Sensing Data
title_short PM<sub>2.5</sub> Prediction Based on Random Forest, XGBoost, and Deep Learning Using Multisource Remote Sensing Data
title_full PM<sub>2.5</sub> Prediction Based on Random Forest, XGBoost, and Deep Learning Using Multisource Remote Sensing Data
title_fullStr PM<sub>2.5</sub> Prediction Based on Random Forest, XGBoost, and Deep Learning Using Multisource Remote Sensing Data
title_full_unstemmed PM<sub>2.5</sub> Prediction Based on Random Forest, XGBoost, and Deep Learning Using Multisource Remote Sensing Data
title_sort pm<sub>2.5</sub> prediction based on random forest, xgboost, and deep learning using multisource remote sensing data
publisher MDPI AG
series Atmosphere
issn 2073-4433
publishDate 2019-07-01
description In recent years, air pollution has become an important public health concern. The high concentration of fine particulate matter with diameter less than 2.5 &#181;m (PM<sub>2.5</sub>) is known to be associated with lung cancer, cardiovascular disease, respiratory disease, and metabolic disease. Predicting PM<sub>2.5</sub> concentrations can help governments warn people at high risk, thus mitigating the complications. Although attempts have been made to predict PM<sub>2.5</sub> concentrations, the factors influencing PM<sub>2.5</sub> prediction have not been investigated. In this work, we study feature importance for PM<sub>2.5</sub> prediction in Tehran&#8217;s urban area, implementing random forest, extreme gradient boosting, and deep learning machine learning (ML) approaches. We use 23 features, including satellite and meteorological data, ground-measured PM<sub>2.5</sub>, and geographical data, in the modeling. The best model performance obtained was R<sup>2</sup> = 0.81 (R = 0.9), MAE = 9.93 &#181;g/m<sup>3</sup>, and RMSE = 13.58 &#181;g/m<sup>3</sup> using the XGBoost approach, incorporating elimination of unimportant features. However, all three ML methods performed similarly and R<sup>2</sup> varied from 0.63 to 0.67, when Aerosol Optical Depth (AOD) at 3 km resolution was included, and 0.77 to 0.81, when AOD at 3 km resolution was excluded. Contrary to the PM<sub>2.5</sub> lag data, satellite-derived AODs did not improve model performance.
topic PM<sub>2.5</sub>
prediction
XGBoost
random forest
deep leaning
feature importance
url https://www.mdpi.com/2073-4433/10/7/373
work_keys_str_mv AT mehdizamanijoharestani pmsub25subpredictionbasedonrandomforestxgboostanddeeplearningusingmultisourceremotesensingdata
AT chunxiangcao pmsub25subpredictionbasedonrandomforestxgboostanddeeplearningusingmultisourceremotesensingdata
AT xiliangni pmsub25subpredictionbasedonrandomforestxgboostanddeeplearningusingmultisourceremotesensingdata
AT barjeecebashir pmsub25subpredictionbasedonrandomforestxgboostanddeeplearningusingmultisourceremotesensingdata
AT somayehtalebiesfandarani pmsub25subpredictionbasedonrandomforestxgboostanddeeplearningusingmultisourceremotesensingdata
_version_ 1725974569352691712