Robust Test for Batch-to-batch Variable Selection

碩士 === 國立臺灣大學 === 工業工程學研究所 === 92 === When the variable selection is used in regression, the selection reliability is greatly affected by the number of candidate variables as compared to the sample size. However, very often we could only collect limited data for analysis, while there are a large num...

Full description

Bibliographic Details
Main Authors: Chia-Lung Lin, 林嘉龍
Other Authors: Argon Chen
Format: Others
Language:en_US
Published: 2004
Online Access:http://ndltd.ncl.edu.tw/handle/92845445734717940761
Description
Summary:碩士 === 國立臺灣大學 === 工業工程學研究所 === 92 === When the variable selection is used in regression, the selection reliability is greatly affected by the number of candidate variables as compared to the sample size. However, very often we could only collect limited data for analysis, while there are a large number of possible independent variables. In the forward selection procedure, problems arise when the sample size n is very smaller than the number of variables p. Under the conventional F-test selecting criterion, noise variables are often mistakenly selected if the sample size is relatively small or the number of candidate variables is relatively large. The number of selected variables is also limited by the sample size. A new test statistic, named MaxF with a known null distribution will be proposed in this study. The test statistic can improve the reliability of the forward selection procedure and can be numerically calculated. Based on the new criteria, an extended selection procedure is developed to overcome the limitation of sample size and to continuously select significant variables into different batches. After batch-to-batch selection, we propose dependency analysis methodologies to figure out the inter-relationships among batches of selected variables. The proposed test statistic is examined by simulated data under various scenarios with different sample size and number of candidate variables. The dependency analysis methodologies will handle more complex simulation cases. The approach is also demonstrated and tested through a semiconductor yield data and gene express cases.