Python Web Crawler Technology Applied to Dynamic Data Analysis of PM2.5 on the Government Open Data Platform
碩士 === 義守大學 === 資訊管理學系 === 106 === The problem of air pollution has become progressively worse. Consequently, air quality issue is a hot topic nowadays. Particulate Matter 2.5 (Aerodynamic Diameter ≤2.5 μm; PM2.5), which is one of the elements of ambient urban air pollution, has been gradually empha...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2018
|
Online Access: | http://ndltd.ncl.edu.tw/handle/a2bjgn |
id |
ndltd-TW-106ISU05396115 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-106ISU053961152019-11-28T05:22:20Z http://ndltd.ncl.edu.tw/handle/a2bjgn Python Web Crawler Technology Applied to Dynamic Data Analysis of PM2.5 on the Government Open Data Platform 應用Python網路爬蟲技術於政府開放資料平台PM2.5即時動態資料分析 Zhe-Zhang Zhang 張哲章 碩士 義守大學 資訊管理學系 106 The problem of air pollution has become progressively worse. Consequently, air quality issue is a hot topic nowadays. Particulate Matter 2.5 (Aerodynamic Diameter ≤2.5 μm; PM2.5), which is one of the elements of ambient urban air pollution, has been gradually emphasized a hazard to human health. At present, the government is trying to solve the problem of excessive concentration of PM2.5, and the public also wants to understand the immediate regional air conditions. This research used Web Crawler in Python to obtain PM2.5 real-time data from government open data portal, then stored them in Mongo database. Moreover, we also used the Python to back up the data in CSV file format to prevent data loss and provide relevant researchers as a variety of data type options in the future besides. By applying R to connect Mongo database, we could immediately present a dynamic analysis of the data we obtained, including boxplot, pie chart, histogram, broken-line graph, scatter plot, and map. The charts could help people quickly and clearly grasp the key points of the data, especially the map which is most useful to the public to instantly understand the current PM2.5 concentration in all the regions of Taiwan. When the analysis is completed, these charts are automatically converted into image and stored in the corresponding time folder. Afterwards, we set up the system to be automated to crawl, store, analyze and visualize in every hour. We could obtain a huge data set after a long period of accumulated, and perform more statistics and analysis on larger time units. To complete the information that had not been collected before, we additionally import the 2017 full-time data provided by the Environmental Protection Administration and use Power BI to analyze the distribution of PM2.5 data for the entire year of 2017. Jenn-Long Liu 劉振隆 2018 學位論文 ; thesis 136 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 義守大學 === 資訊管理學系 === 106 === The problem of air pollution has become progressively worse. Consequently, air quality issue is a hot topic nowadays. Particulate Matter 2.5 (Aerodynamic Diameter ≤2.5 μm; PM2.5), which is one of the elements of ambient urban air pollution, has been gradually emphasized a hazard to human health. At present, the government is trying to solve the problem of excessive concentration of PM2.5, and the public also wants to understand the immediate regional air conditions. This research used Web Crawler in Python to obtain PM2.5 real-time data from government open data portal, then stored them in Mongo database. Moreover, we also used the Python to back up the data in CSV file format to prevent data loss and provide relevant researchers as a variety of data type options in the future besides. By applying R to connect Mongo database, we could immediately present a dynamic analysis of the data we obtained, including boxplot, pie chart, histogram, broken-line graph, scatter plot, and map. The charts could help people quickly and clearly grasp the key points of the data, especially the map which is most useful to the public to instantly understand the current PM2.5 concentration in all the regions of Taiwan. When the analysis is completed, these charts are automatically converted into image and stored in the corresponding time folder. Afterwards, we set up the system to be automated to crawl, store, analyze and visualize in every hour. We could obtain a huge data set after a long period of accumulated, and perform more statistics and analysis on larger time units. To complete the information that had not been collected before, we additionally import the 2017 full-time data provided by the Environmental Protection Administration and use Power BI to analyze the distribution of PM2.5 data for the entire year of 2017.
|
author2 |
Jenn-Long Liu |
author_facet |
Jenn-Long Liu Zhe-Zhang Zhang 張哲章 |
author |
Zhe-Zhang Zhang 張哲章 |
spellingShingle |
Zhe-Zhang Zhang 張哲章 Python Web Crawler Technology Applied to Dynamic Data Analysis of PM2.5 on the Government Open Data Platform |
author_sort |
Zhe-Zhang Zhang |
title |
Python Web Crawler Technology Applied to Dynamic Data Analysis of PM2.5 on the Government Open Data Platform |
title_short |
Python Web Crawler Technology Applied to Dynamic Data Analysis of PM2.5 on the Government Open Data Platform |
title_full |
Python Web Crawler Technology Applied to Dynamic Data Analysis of PM2.5 on the Government Open Data Platform |
title_fullStr |
Python Web Crawler Technology Applied to Dynamic Data Analysis of PM2.5 on the Government Open Data Platform |
title_full_unstemmed |
Python Web Crawler Technology Applied to Dynamic Data Analysis of PM2.5 on the Government Open Data Platform |
title_sort |
python web crawler technology applied to dynamic data analysis of pm2.5 on the government open data platform |
publishDate |
2018 |
url |
http://ndltd.ncl.edu.tw/handle/a2bjgn |
work_keys_str_mv |
AT zhezhangzhang pythonwebcrawlertechnologyappliedtodynamicdataanalysisofpm25onthegovernmentopendataplatform AT zhāngzhézhāng pythonwebcrawlertechnologyappliedtodynamicdataanalysisofpm25onthegovernmentopendataplatform AT zhezhangzhang yīngyòngpythonwǎnglùpáchóngjìshùyúzhèngfǔkāifàngzīliàopíngtáipm25jíshídòngtàizīliàofēnxī AT zhāngzhézhāng yīngyòngpythonwǎnglùpáchóngjìshùyúzhèngfǔkāifàngzīliàopíngtáipm25jíshídòngtàizīliàofēnxī |
_version_ |
1719297716722335744 |