PM<sub>2.5</sub> Prediction Based on Random Forest, XGBoost, and Deep Learning Using Multisource Remote Sensing Data
In recent years, air pollution has become an important public health concern. The high concentration of fine particulate matter with diameter less than 2.5 µm (PM<sub>2.5</sub>) is known to be associated with lung cancer, cardiovascular disease, respiratory disease, and metaboli...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2019-07-01
|
Series: | Atmosphere |
Subjects: | |
Online Access: | https://www.mdpi.com/2073-4433/10/7/373 |
id |
doaj-4cbb58fae9a7449a898c87b9ceedb64a |
---|---|
record_format |
Article |
spelling |
doaj-4cbb58fae9a7449a898c87b9ceedb64a2020-11-24T21:27:36ZengMDPI AGAtmosphere2073-44332019-07-0110737310.3390/atmos10070373atmos10070373PM<sub>2.5</sub> Prediction Based on Random Forest, XGBoost, and Deep Learning Using Multisource Remote Sensing DataMehdi Zamani Joharestani0Chunxiang Cao1Xiliang Ni2Barjeece Bashir3Somayeh Talebiesfandarani4State Key Laboratory of Remote Sensing Science, Institute of Remote Sensing and Digital Earth, Chinese Academy of Sciences, Beijing 100101, ChinaState Key Laboratory of Remote Sensing Science, Institute of Remote Sensing and Digital Earth, Chinese Academy of Sciences, Beijing 100101, ChinaState Key Laboratory of Remote Sensing Science, Institute of Remote Sensing and Digital Earth, Chinese Academy of Sciences, Beijing 100101, ChinaState Key Laboratory of Remote Sensing Science, Institute of Remote Sensing and Digital Earth, Chinese Academy of Sciences, Beijing 100101, ChinaState Key Laboratory of Remote Sensing Science, Institute of Remote Sensing and Digital Earth, Chinese Academy of Sciences, Beijing 100101, ChinaIn recent years, air pollution has become an important public health concern. The high concentration of fine particulate matter with diameter less than 2.5 µm (PM<sub>2.5</sub>) is known to be associated with lung cancer, cardiovascular disease, respiratory disease, and metabolic disease. Predicting PM<sub>2.5</sub> concentrations can help governments warn people at high risk, thus mitigating the complications. Although attempts have been made to predict PM<sub>2.5</sub> concentrations, the factors influencing PM<sub>2.5</sub> prediction have not been investigated. In this work, we study feature importance for PM<sub>2.5</sub> prediction in Tehran’s urban area, implementing random forest, extreme gradient boosting, and deep learning machine learning (ML) approaches. We use 23 features, including satellite and meteorological data, ground-measured PM<sub>2.5</sub>, and geographical data, in the modeling. The best model performance obtained was R<sup>2</sup> = 0.81 (R = 0.9), MAE = 9.93 µg/m<sup>3</sup>, and RMSE = 13.58 µg/m<sup>3</sup> using the XGBoost approach, incorporating elimination of unimportant features. However, all three ML methods performed similarly and R<sup>2</sup> varied from 0.63 to 0.67, when Aerosol Optical Depth (AOD) at 3 km resolution was included, and 0.77 to 0.81, when AOD at 3 km resolution was excluded. Contrary to the PM<sub>2.5</sub> lag data, satellite-derived AODs did not improve model performance.https://www.mdpi.com/2073-4433/10/7/373PM<sub>2.5</sub>predictionXGBoostrandom forestdeep leaningfeature importance |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Mehdi Zamani Joharestani Chunxiang Cao Xiliang Ni Barjeece Bashir Somayeh Talebiesfandarani |
spellingShingle |
Mehdi Zamani Joharestani Chunxiang Cao Xiliang Ni Barjeece Bashir Somayeh Talebiesfandarani PM<sub>2.5</sub> Prediction Based on Random Forest, XGBoost, and Deep Learning Using Multisource Remote Sensing Data Atmosphere PM<sub>2.5</sub> prediction XGBoost random forest deep leaning feature importance |
author_facet |
Mehdi Zamani Joharestani Chunxiang Cao Xiliang Ni Barjeece Bashir Somayeh Talebiesfandarani |
author_sort |
Mehdi Zamani Joharestani |
title |
PM<sub>2.5</sub> Prediction Based on Random Forest, XGBoost, and Deep Learning Using Multisource Remote Sensing Data |
title_short |
PM<sub>2.5</sub> Prediction Based on Random Forest, XGBoost, and Deep Learning Using Multisource Remote Sensing Data |
title_full |
PM<sub>2.5</sub> Prediction Based on Random Forest, XGBoost, and Deep Learning Using Multisource Remote Sensing Data |
title_fullStr |
PM<sub>2.5</sub> Prediction Based on Random Forest, XGBoost, and Deep Learning Using Multisource Remote Sensing Data |
title_full_unstemmed |
PM<sub>2.5</sub> Prediction Based on Random Forest, XGBoost, and Deep Learning Using Multisource Remote Sensing Data |
title_sort |
pm<sub>2.5</sub> prediction based on random forest, xgboost, and deep learning using multisource remote sensing data |
publisher |
MDPI AG |
series |
Atmosphere |
issn |
2073-4433 |
publishDate |
2019-07-01 |
description |
In recent years, air pollution has become an important public health concern. The high concentration of fine particulate matter with diameter less than 2.5 µm (PM<sub>2.5</sub>) is known to be associated with lung cancer, cardiovascular disease, respiratory disease, and metabolic disease. Predicting PM<sub>2.5</sub> concentrations can help governments warn people at high risk, thus mitigating the complications. Although attempts have been made to predict PM<sub>2.5</sub> concentrations, the factors influencing PM<sub>2.5</sub> prediction have not been investigated. In this work, we study feature importance for PM<sub>2.5</sub> prediction in Tehran’s urban area, implementing random forest, extreme gradient boosting, and deep learning machine learning (ML) approaches. We use 23 features, including satellite and meteorological data, ground-measured PM<sub>2.5</sub>, and geographical data, in the modeling. The best model performance obtained was R<sup>2</sup> = 0.81 (R = 0.9), MAE = 9.93 µg/m<sup>3</sup>, and RMSE = 13.58 µg/m<sup>3</sup> using the XGBoost approach, incorporating elimination of unimportant features. However, all three ML methods performed similarly and R<sup>2</sup> varied from 0.63 to 0.67, when Aerosol Optical Depth (AOD) at 3 km resolution was included, and 0.77 to 0.81, when AOD at 3 km resolution was excluded. Contrary to the PM<sub>2.5</sub> lag data, satellite-derived AODs did not improve model performance. |
topic |
PM<sub>2.5</sub> prediction XGBoost random forest deep leaning feature importance |
url |
https://www.mdpi.com/2073-4433/10/7/373 |
work_keys_str_mv |
AT mehdizamanijoharestani pmsub25subpredictionbasedonrandomforestxgboostanddeeplearningusingmultisourceremotesensingdata AT chunxiangcao pmsub25subpredictionbasedonrandomforestxgboostanddeeplearningusingmultisourceremotesensingdata AT xiliangni pmsub25subpredictionbasedonrandomforestxgboostanddeeplearningusingmultisourceremotesensingdata AT barjeecebashir pmsub25subpredictionbasedonrandomforestxgboostanddeeplearningusingmultisourceremotesensingdata AT somayehtalebiesfandarani pmsub25subpredictionbasedonrandomforestxgboostanddeeplearningusingmultisourceremotesensingdata |
_version_ |
1725974569352691712 |