Python Web Crawler Technology Applied to Dynamic Data Analysis of PM2.5 on the Government Open Data Platform

碩士 === 義守大學 === 資訊管理學系 === 106 === The problem of air pollution has become progressively worse. Consequently, air quality issue is a hot topic nowadays. Particulate Matter 2.5 (Aerodynamic Diameter ≤2.5 μm; PM2.5), which is one of the elements of ambient urban air pollution, has been gradually empha...

Full description

Bibliographic Details
Main Authors: Zhe-Zhang Zhang, 張哲章
Other Authors: Jenn-Long Liu
Format: Others
Language:zh-TW
Published: 2018
Online Access:http://ndltd.ncl.edu.tw/handle/a2bjgn
id ndltd-TW-106ISU05396115
record_format oai_dc
spelling ndltd-TW-106ISU053961152019-11-28T05:22:20Z http://ndltd.ncl.edu.tw/handle/a2bjgn Python Web Crawler Technology Applied to Dynamic Data Analysis of PM2.5 on the Government Open Data Platform 應用Python網路爬蟲技術於政府開放資料平台PM2.5即時動態資料分析 Zhe-Zhang Zhang 張哲章 碩士 義守大學 資訊管理學系 106 The problem of air pollution has become progressively worse. Consequently, air quality issue is a hot topic nowadays. Particulate Matter 2.5 (Aerodynamic Diameter ≤2.5 μm; PM2.5), which is one of the elements of ambient urban air pollution, has been gradually emphasized a hazard to human health. At present, the government is trying to solve the problem of excessive concentration of PM2.5, and the public also wants to understand the immediate regional air conditions. This research used Web Crawler in Python to obtain PM2.5 real-time data from government open data portal, then stored them in Mongo database. Moreover, we also used the Python to back up the data in CSV file format to prevent data loss and provide relevant researchers as a variety of data type options in the future besides. By applying R to connect Mongo database, we could immediately present a dynamic analysis of the data we obtained, including boxplot, pie chart, histogram, broken-line graph, scatter plot, and map. The charts could help people quickly and clearly grasp the key points of the data, especially the map which is most useful to the public to instantly understand the current PM2.5 concentration in all the regions of Taiwan. When the analysis is completed, these charts are automatically converted into image and stored in the corresponding time folder. Afterwards, we set up the system to be automated to crawl, store, analyze and visualize in every hour. We could obtain a huge data set after a long period of accumulated, and perform more statistics and analysis on larger time units. To complete the information that had not been collected before, we additionally import the 2017 full-time data provided by the Environmental Protection Administration and use Power BI to analyze the distribution of PM2.5 data for the entire year of 2017. Jenn-Long Liu 劉振隆 2018 學位論文 ; thesis 136 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 義守大學 === 資訊管理學系 === 106 === The problem of air pollution has become progressively worse. Consequently, air quality issue is a hot topic nowadays. Particulate Matter 2.5 (Aerodynamic Diameter ≤2.5 μm; PM2.5), which is one of the elements of ambient urban air pollution, has been gradually emphasized a hazard to human health. At present, the government is trying to solve the problem of excessive concentration of PM2.5, and the public also wants to understand the immediate regional air conditions. This research used Web Crawler in Python to obtain PM2.5 real-time data from government open data portal, then stored them in Mongo database. Moreover, we also used the Python to back up the data in CSV file format to prevent data loss and provide relevant researchers as a variety of data type options in the future besides. By applying R to connect Mongo database, we could immediately present a dynamic analysis of the data we obtained, including boxplot, pie chart, histogram, broken-line graph, scatter plot, and map. The charts could help people quickly and clearly grasp the key points of the data, especially the map which is most useful to the public to instantly understand the current PM2.5 concentration in all the regions of Taiwan. When the analysis is completed, these charts are automatically converted into image and stored in the corresponding time folder. Afterwards, we set up the system to be automated to crawl, store, analyze and visualize in every hour. We could obtain a huge data set after a long period of accumulated, and perform more statistics and analysis on larger time units. To complete the information that had not been collected before, we additionally import the 2017 full-time data provided by the Environmental Protection Administration and use Power BI to analyze the distribution of PM2.5 data for the entire year of 2017.
author2 Jenn-Long Liu
author_facet Jenn-Long Liu
Zhe-Zhang Zhang
張哲章
author Zhe-Zhang Zhang
張哲章
spellingShingle Zhe-Zhang Zhang
張哲章
Python Web Crawler Technology Applied to Dynamic Data Analysis of PM2.5 on the Government Open Data Platform
author_sort Zhe-Zhang Zhang
title Python Web Crawler Technology Applied to Dynamic Data Analysis of PM2.5 on the Government Open Data Platform
title_short Python Web Crawler Technology Applied to Dynamic Data Analysis of PM2.5 on the Government Open Data Platform
title_full Python Web Crawler Technology Applied to Dynamic Data Analysis of PM2.5 on the Government Open Data Platform
title_fullStr Python Web Crawler Technology Applied to Dynamic Data Analysis of PM2.5 on the Government Open Data Platform
title_full_unstemmed Python Web Crawler Technology Applied to Dynamic Data Analysis of PM2.5 on the Government Open Data Platform
title_sort python web crawler technology applied to dynamic data analysis of pm2.5 on the government open data platform
publishDate 2018
url http://ndltd.ncl.edu.tw/handle/a2bjgn
work_keys_str_mv AT zhezhangzhang pythonwebcrawlertechnologyappliedtodynamicdataanalysisofpm25onthegovernmentopendataplatform
AT zhāngzhézhāng pythonwebcrawlertechnologyappliedtodynamicdataanalysisofpm25onthegovernmentopendataplatform
AT zhezhangzhang yīngyòngpythonwǎnglùpáchóngjìshùyúzhèngfǔkāifàngzīliàopíngtáipm25jíshídòngtàizīliàofēnxī
AT zhāngzhézhāng yīngyòngpythonwǎnglùpáchóngjìshùyúzhèngfǔkāifàngzīliàopíngtáipm25jíshídòngtàizīliàofēnxī
_version_ 1719297716722335744