Transformation techniques in data mining

Transforming data is essential within data mining as a precursor to many applications such as rule induction and Multivariate Adaptive Regression Splines. The problems arising from the use of categorical valued data in rule induction are reduced confidence (accuracy), support and coverage. We introd...

Full description

Bibliographic Details
Main Author: Burgess, Martin
Published: University of East Anglia 2004
Subjects:
Online Access:http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.410093
Description
Summary:Transforming data is essential within data mining as a precursor to many applications such as rule induction and Multivariate Adaptive Regression Splines. The problems arising from the use of categorical valued data in rule induction are reduced confidence (accuracy), support and coverage. We introduce a technique called arcsin transformation where categorical valued data is replaced with numeric values. This technique has been used on a number of databases and has shown to be highly effective. Multivariate Adaptive Regression Splines, MARS, is a regression tool which attempts to approximate complex relationships by a series of linear regressions on different intervals of the explanatory variable ranges. Like regression methods in general, we need to know what assumptions are made and how the violation of these may disrupt performance. The two key assumptions with most regression models including MARS are additivity of effects and homoscedasticity. If any of these assumptions are not satisfied in terms of the original observations, y;, a non-linear transformation may improve matters. We use the Box-Cox transformation in which the continuous dependent variable (with non-negative responses) in a linear regression setting, might induce the regression assumptions given previously. The assumptions stated are discussed in detail using a variety of tests. The results show that on seven databases examined, an improvement has been made on six, where the models produced were