Hierarchical Classification and Regression with Feature Selection

碩士 === 國立中央大學 === 資訊管理學系 === 107 === The vision of the big data era is to find the suitable sorting method and extract valuable information from numerous and messy data. With the increase of the quantity and complexity of data, data scientists no longer focus on the strengths and weaknesses of model...

Full description

Bibliographic Details
Main Authors: Chi-Wei Yeh, 葉奇瑋
Other Authors: Shih-Wen Ke
Format: Others
Language:zh-TW
Published: 2019
Online Access:http://ndltd.ncl.edu.tw/handle/zsq7g5
Description
Summary:碩士 === 國立中央大學 === 資訊管理學系 === 107 === The vision of the big data era is to find the suitable sorting method and extract valuable information from numerous and messy data. With the increase of the quantity and complexity of data, data scientists no longer focus on the strengths and weaknesses of model training but concentrate on using different calculation methods or operating architectures to find clues in the data, and finally they look forward to find ways to improve numerical prediction accuracy from these findings. In numerical prediction of data sets, the common prediction model construction methods in regression training uses linear regression, neural network and support vector regression. In order to pursue better numerical prediction results in the model, adjusting the parameters in the models is necessary. Apart from this, we use feature selection to select less relevant or redundant features and clustering to organize data into different groups. In this study, the hierarchical structure is used as an experimental prototype and extended further to deal with datasets which have a large number of features and amounts. In order to have obvious comparison results from the experimental results, our study combine different clustering, feature selection and regression algorithms to train models and process numerical prediction errors from different fields, quantities and features datasets. Our study find out that there is improvement of root mean square and mean absolute error by using hierarchical classification and regression with feature selection than using only regression or hierarchical classification and regression from analyzing and comparing multiple algorithms experimental results. In addition, our study also find out that the hierarchical structure using clustering (K-means and C-means), feature selection (Mutual Information and Information Gain) and regression (Multi-layer Perceptron) have better average performance in different datasets.