Using Open Data and Decision Tree Algorithm to Explore Factors Associated with Unemployment Rate around the World
碩士 === 國立臺北護理健康大學 === 資訊管理研究所 === 107 === Unemployment is closely related to national livelihoods, so it is concerned by governments. Without a source of income, unemployed workers would gradually become out of touch with society and unemployment even further causes their mental stress and disease....
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2019
|
Online Access: | http://ndltd.ncl.edu.tw/handle/2kn93w |
id |
ndltd-TW-107NTCN0396010 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-107NTCN03960102019-10-24T05:20:15Z http://ndltd.ncl.edu.tw/handle/2kn93w Using Open Data and Decision Tree Algorithm to Explore Factors Associated with Unemployment Rate around the World 運用公開資料以決策樹演算法探討全球各國失業率之相關因素 SU, YU-SHENG 蘇育陞 碩士 國立臺北護理健康大學 資訊管理研究所 107 Unemployment is closely related to national livelihoods, so it is concerned by governments. Without a source of income, unemployed workers would gradually become out of touch with society and unemployment even further causes their mental stress and disease. In addition, unemployment is an economic indicator that governments of all countries value. When a country's unemployment rate is lowered, it helps to increase the country's gross domestic product. Therefore, this study analyzes the factors that may affect the unemployment rate and explores the characteristics of countries with low unemployment. Based on the UN's 17 Sustainable Development Goals, this study downloads datasets from a global open source platform to identify variables that may be related to unemployment, utilizing a total of 1 dependent variable (unemployment rate), and 13 independent variables (fertility rate, population aged 65 and over, average education years, female labor participation rate, gender inequality index, gross domestic product, trade rate, inflation rate, electricity supply rate, internet use rate, rural population, population density, and human development index) for research. After data pre-processing, a total of 112 countries' variable data were collected. Finally, the collated data is analyzed by correlation coefficient, linear regression and decision tree, and the results are discussed. From the correlation coefficient analysis, it can be seen that there are high correlations among many independent variables, but there is no high degree of correlation between any independent variable and the unemployment rate. It can be known that using only a simple linear model cannot reflect the level of unemployment. Therefore, it is inferred that poor and wealthy countries may have different factors for reaching low unemployment rate. In addition, before linear regression modeling, all countries should be divided into two groups by the amount of GDP per capita 5,000 US dollars. For countries with GDP above $5000, the results of multiple linear regression are not significant, indicating that linear regression cannot be used to model rich country’s unemployment rate; for countries with GDP below $5000, the results of multiple linear regression are significant, and 3 independent variables contribute 54% explanatory power. Then, in the analysis of decision tree algorithm, the 112 countries were still divided into two groups of less than 5,000 dollars and more than 5,000 dollars. The decision tree algorithm is used to analyze the countries with higher GDP, and result shows the countries with lower unemployment rate have the characteristics of longer average education years, lower proportion of rural population and less aging population. In contrast, the decision tree algorithm is used to analyze the countries with lower GDP, and result shows the countries with lower unemployment rate embrace the characteristics of lower human development index and higher female labor participation rate. It is proved that there are indeed different factors that contribute to low unemployment in poor and wealthy countries. The models obtained by the above three methods all have errors and can only show a general tendency. Accurate unemployment rate is closely associated with national policies or the degree of coincidence between education and industry, and even related to culture and religion. This part is difficult to obtain from quantitative data. Therefore, in addition to strengthening the factors obtained from the above statistical analysis, countries should also adopt policies that suit their own countries to reduce unemployment. JIANG, WEY-WEN 江蔚文 2019 學位論文 ; thesis 114 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立臺北護理健康大學 === 資訊管理研究所 === 107 === Unemployment is closely related to national livelihoods, so it is concerned by governments. Without a source of income, unemployed workers would gradually become out of touch with society and unemployment even further causes their mental stress and disease. In addition, unemployment is an economic indicator that governments of all countries value. When a country's unemployment rate is lowered, it helps to increase the country's gross domestic product. Therefore, this study analyzes the factors that may affect the unemployment rate and explores the characteristics of countries with low unemployment.
Based on the UN's 17 Sustainable Development Goals, this study downloads datasets from a global open source platform to identify variables that may be related to unemployment, utilizing a total of 1 dependent variable (unemployment rate), and 13 independent variables (fertility rate, population aged 65 and over, average education years, female labor participation rate, gender inequality index, gross domestic product, trade rate, inflation rate, electricity supply rate, internet use rate, rural population, population density, and human development index) for research. After data pre-processing, a total of 112 countries' variable data were collected. Finally, the collated data is analyzed by correlation coefficient, linear regression and decision tree, and the results are discussed.
From the correlation coefficient analysis, it can be seen that there are high correlations among many independent variables, but there is no high degree of correlation between any independent variable and the unemployment rate. It can be known that using only a simple linear model cannot reflect the level of unemployment. Therefore, it is inferred that poor and wealthy countries may have different factors for reaching low unemployment rate. In addition, before linear regression modeling, all countries should be divided into two groups by the amount of GDP per capita 5,000 US dollars. For countries with GDP above $5000, the results of multiple linear regression are not significant, indicating that linear regression cannot be used to model rich country’s unemployment rate; for countries with GDP below $5000, the results of multiple linear regression are significant, and 3 independent variables contribute 54% explanatory power. Then, in the analysis of decision tree algorithm, the 112 countries were still divided into two groups of less than 5,000 dollars and more than 5,000 dollars. The decision tree algorithm is used to analyze the countries with higher GDP, and result shows the countries with lower unemployment rate have the characteristics of longer average education years, lower proportion of rural population and less aging population. In contrast, the decision tree algorithm is used to analyze the countries with lower GDP, and result shows the countries with lower unemployment rate embrace the characteristics of lower human development index and higher female labor participation rate. It is proved that there are indeed different factors that contribute to low unemployment in poor and wealthy countries. The models obtained by the above three methods all have errors and can only show a general tendency. Accurate unemployment rate is closely associated with national policies or the degree of coincidence between education and industry, and even related to culture and religion. This part is difficult to obtain from quantitative data. Therefore, in addition to strengthening the factors obtained from the above statistical analysis, countries should also adopt policies that suit their own countries to reduce unemployment.
|
author2 |
JIANG, WEY-WEN |
author_facet |
JIANG, WEY-WEN SU, YU-SHENG 蘇育陞 |
author |
SU, YU-SHENG 蘇育陞 |
spellingShingle |
SU, YU-SHENG 蘇育陞 Using Open Data and Decision Tree Algorithm to Explore Factors Associated with Unemployment Rate around the World |
author_sort |
SU, YU-SHENG |
title |
Using Open Data and Decision Tree Algorithm to Explore Factors Associated with Unemployment Rate around the World |
title_short |
Using Open Data and Decision Tree Algorithm to Explore Factors Associated with Unemployment Rate around the World |
title_full |
Using Open Data and Decision Tree Algorithm to Explore Factors Associated with Unemployment Rate around the World |
title_fullStr |
Using Open Data and Decision Tree Algorithm to Explore Factors Associated with Unemployment Rate around the World |
title_full_unstemmed |
Using Open Data and Decision Tree Algorithm to Explore Factors Associated with Unemployment Rate around the World |
title_sort |
using open data and decision tree algorithm to explore factors associated with unemployment rate around the world |
publishDate |
2019 |
url |
http://ndltd.ncl.edu.tw/handle/2kn93w |
work_keys_str_mv |
AT suyusheng usingopendataanddecisiontreealgorithmtoexplorefactorsassociatedwithunemploymentratearoundtheworld AT sūyùshēng usingopendataanddecisiontreealgorithmtoexplorefactorsassociatedwithunemploymentratearoundtheworld AT suyusheng yùnyònggōngkāizīliàoyǐjuécèshùyǎnsuànfǎtàntǎoquánqiúgèguóshīyèlǜzhīxiāngguānyīnsù AT sūyùshēng yùnyònggōngkāizīliàoyǐjuécèshùyǎnsuànfǎtàntǎoquánqiúgèguóshīyèlǜzhīxiāngguānyīnsù |
_version_ |
1719276962824847360 |