Transformation of Ordinal Variables with Applications in Decision Trees

碩士 === 淡江大學 === 統計學系碩士班 === 99 === In empirical data mining analysis, we need to handle ordinal-scale variables frequently. Also, many ordinal variables are often generated by researchers from continuous variables for convenience by grouping observed values into intervals, but some of the informatio...

Full description

Bibliographic Details
Main Authors: Yu-Pang Chen, 陳宇邦
Other Authors: Ching-Hsiang Chen
Format: Others
Language:zh-TW
Published: 2011
Online Access:http://ndltd.ncl.edu.tw/handle/14433948388229019567
Description
Summary:碩士 === 淡江大學 === 統計學系碩士班 === 99 === In empirical data mining analysis, we need to handle ordinal-scale variables frequently. Also, many ordinal variables are often generated by researchers from continuous variables for convenience by grouping observed values into intervals, but some of the information contained in the original continuous variable will be lost. On the othe hand, when analyzing ordinal variables with numeric coding, people used to treat them as continuous variables, regardless of their differences in the amount of information.   We propose a transformation method of ordinal variables into quasi-continuous variables by means of surrogate variables, concept of coordinates, and Euclidean distances. Our method expects less information loss than the traditional practice which uses only ordinal information. Our transformation method is then applied to three decision tree algorithm: CART, C4.5, and QUEST. With several real-world data sets, our study shows that the transformed Quasi-continuous variables can efficiently enhance classification accuracy rate of these decision trees.