Using Open Data and Decision Tree Algorithm to Explore Factors Associated with Unemployment Rate around the World

碩士 === 國立臺北護理健康大學 === 資訊管理研究所 === 107 === Unemployment is closely related to national livelihoods, so it is concerned by governments. Without a source of income, unemployed workers would gradually become out of touch with society and unemployment even further causes their mental stress and disease....

Full description

Bibliographic Details
Main Authors: SU, YU-SHENG, 蘇育陞
Other Authors: JIANG, WEY-WEN
Format: Others
Language:zh-TW
Published: 2019
Online Access:http://ndltd.ncl.edu.tw/handle/2kn93w
id ndltd-TW-107NTCN0396010
record_format oai_dc
spelling ndltd-TW-107NTCN03960102019-10-24T05:20:15Z http://ndltd.ncl.edu.tw/handle/2kn93w Using Open Data and Decision Tree Algorithm to Explore Factors Associated with Unemployment Rate around the World 運用公開資料以決策樹演算法探討全球各國失業率之相關因素 SU, YU-SHENG 蘇育陞 碩士 國立臺北護理健康大學 資訊管理研究所 107 Unemployment is closely related to national livelihoods, so it is concerned by governments. Without a source of income, unemployed workers would gradually become out of touch with society and unemployment even further causes their mental stress and disease. In addition, unemployment is an economic indicator that governments of all countries value. When a country's unemployment rate is lowered, it helps to increase the country's gross domestic product. Therefore, this study analyzes the factors that may affect the unemployment rate and explores the characteristics of countries with low unemployment. Based on the UN's 17 Sustainable Development Goals, this study downloads datasets from a global open source platform to identify variables that may be related to unemployment, utilizing a total of 1 dependent variable (unemployment rate), and 13 independent variables (fertility rate, population aged 65 and over, average education years, female labor participation rate, gender inequality index, gross domestic product, trade rate, inflation rate, electricity supply rate, internet use rate, rural population, population density, and human development index) for research. After data pre-processing, a total of 112 countries' variable data were collected. Finally, the collated data is analyzed by correlation coefficient, linear regression and decision tree, and the results are discussed. From the correlation coefficient analysis, it can be seen that there are high correlations among many independent variables, but there is no high degree of correlation between any independent variable and the unemployment rate. It can be known that using only a simple linear model cannot reflect the level of unemployment. Therefore, it is inferred that poor and wealthy countries may have different factors for reaching low unemployment rate. In addition, before linear regression modeling, all countries should be divided into two groups by the amount of GDP per capita 5,000 US dollars. For countries with GDP above $5000, the results of multiple linear regression are not significant, indicating that linear regression cannot be used to model rich country’s unemployment rate; for countries with GDP below $5000, the results of multiple linear regression are significant, and 3 independent variables contribute 54% explanatory power. Then, in the analysis of decision tree algorithm, the 112 countries were still divided into two groups of less than 5,000 dollars and more than 5,000 dollars. The decision tree algorithm is used to analyze the countries with higher GDP, and result shows the countries with lower unemployment rate have the characteristics of longer average education years, lower proportion of rural population and less aging population. In contrast, the decision tree algorithm is used to analyze the countries with lower GDP, and result shows the countries with lower unemployment rate embrace the characteristics of lower human development index and higher female labor participation rate. It is proved that there are indeed different factors that contribute to low unemployment in poor and wealthy countries. The models obtained by the above three methods all have errors and can only show a general tendency. Accurate unemployment rate is closely associated with national policies or the degree of coincidence between education and industry, and even related to culture and religion. This part is difficult to obtain from quantitative data. Therefore, in addition to strengthening the factors obtained from the above statistical analysis, countries should also adopt policies that suit their own countries to reduce unemployment. JIANG, WEY-WEN 江蔚文 2019 學位論文 ; thesis 114 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立臺北護理健康大學 === 資訊管理研究所 === 107 === Unemployment is closely related to national livelihoods, so it is concerned by governments. Without a source of income, unemployed workers would gradually become out of touch with society and unemployment even further causes their mental stress and disease. In addition, unemployment is an economic indicator that governments of all countries value. When a country's unemployment rate is lowered, it helps to increase the country's gross domestic product. Therefore, this study analyzes the factors that may affect the unemployment rate and explores the characteristics of countries with low unemployment. Based on the UN's 17 Sustainable Development Goals, this study downloads datasets from a global open source platform to identify variables that may be related to unemployment, utilizing a total of 1 dependent variable (unemployment rate), and 13 independent variables (fertility rate, population aged 65 and over, average education years, female labor participation rate, gender inequality index, gross domestic product, trade rate, inflation rate, electricity supply rate, internet use rate, rural population, population density, and human development index) for research. After data pre-processing, a total of 112 countries' variable data were collected. Finally, the collated data is analyzed by correlation coefficient, linear regression and decision tree, and the results are discussed. From the correlation coefficient analysis, it can be seen that there are high correlations among many independent variables, but there is no high degree of correlation between any independent variable and the unemployment rate. It can be known that using only a simple linear model cannot reflect the level of unemployment. Therefore, it is inferred that poor and wealthy countries may have different factors for reaching low unemployment rate. In addition, before linear regression modeling, all countries should be divided into two groups by the amount of GDP per capita 5,000 US dollars. For countries with GDP above $5000, the results of multiple linear regression are not significant, indicating that linear regression cannot be used to model rich country’s unemployment rate; for countries with GDP below $5000, the results of multiple linear regression are significant, and 3 independent variables contribute 54% explanatory power. Then, in the analysis of decision tree algorithm, the 112 countries were still divided into two groups of less than 5,000 dollars and more than 5,000 dollars. The decision tree algorithm is used to analyze the countries with higher GDP, and result shows the countries with lower unemployment rate have the characteristics of longer average education years, lower proportion of rural population and less aging population. In contrast, the decision tree algorithm is used to analyze the countries with lower GDP, and result shows the countries with lower unemployment rate embrace the characteristics of lower human development index and higher female labor participation rate. It is proved that there are indeed different factors that contribute to low unemployment in poor and wealthy countries. The models obtained by the above three methods all have errors and can only show a general tendency. Accurate unemployment rate is closely associated with national policies or the degree of coincidence between education and industry, and even related to culture and religion. This part is difficult to obtain from quantitative data. Therefore, in addition to strengthening the factors obtained from the above statistical analysis, countries should also adopt policies that suit their own countries to reduce unemployment.
author2 JIANG, WEY-WEN
author_facet JIANG, WEY-WEN
SU, YU-SHENG
蘇育陞
author SU, YU-SHENG
蘇育陞
spellingShingle SU, YU-SHENG
蘇育陞
Using Open Data and Decision Tree Algorithm to Explore Factors Associated with Unemployment Rate around the World
author_sort SU, YU-SHENG
title Using Open Data and Decision Tree Algorithm to Explore Factors Associated with Unemployment Rate around the World
title_short Using Open Data and Decision Tree Algorithm to Explore Factors Associated with Unemployment Rate around the World
title_full Using Open Data and Decision Tree Algorithm to Explore Factors Associated with Unemployment Rate around the World
title_fullStr Using Open Data and Decision Tree Algorithm to Explore Factors Associated with Unemployment Rate around the World
title_full_unstemmed Using Open Data and Decision Tree Algorithm to Explore Factors Associated with Unemployment Rate around the World
title_sort using open data and decision tree algorithm to explore factors associated with unemployment rate around the world
publishDate 2019
url http://ndltd.ncl.edu.tw/handle/2kn93w
work_keys_str_mv AT suyusheng usingopendataanddecisiontreealgorithmtoexplorefactorsassociatedwithunemploymentratearoundtheworld
AT sūyùshēng usingopendataanddecisiontreealgorithmtoexplorefactorsassociatedwithunemploymentratearoundtheworld
AT suyusheng yùnyònggōngkāizīliàoyǐjuécèshùyǎnsuànfǎtàntǎoquánqiúgèguóshīyèlǜzhīxiāngguānyīnsù
AT sūyùshēng yùnyònggōngkāizīliàoyǐjuécèshùyǎnsuànfǎtàntǎoquánqiúgèguóshīyèlǜzhīxiāngguānyīnsù
_version_ 1719276962824847360