Weighted Mean Squared Deviation Feature Screening for Binary Features

In this study, we propose a novel model-free feature screening method for ultrahigh dimensional binary features of binary classification, called weighted mean squared deviation (WMSD). Compared to Chi-square statistic and mutual information, WMSD provides more opportunities to the binary features wi...

Full description

Bibliographic Details
Main Authors: Gaizhen Wang, Guoyu Guan
Format: Article
Language:English
Published: MDPI AG 2020-03-01
Series:Entropy
Subjects:
Online Access:https://www.mdpi.com/1099-4300/22/3/335
Description
Summary:In this study, we propose a novel model-free feature screening method for ultrahigh dimensional binary features of binary classification, called weighted mean squared deviation (WMSD). Compared to Chi-square statistic and mutual information, WMSD provides more opportunities to the binary features with probabilities near 0.5. In addition, the asymptotic properties of the proposed method are theoretically investigated under the assumption <inline-formula> <math display="inline"> <semantics> <mrow> <mo form="prefix">log</mo> <mi>p</mi> <mo>=</mo> <mi>o</mi> <mo stretchy="false">(</mo> <mi>n</mi> <mo stretchy="false">)</mo> </mrow> </semantics> </math> </inline-formula>. The number of features is practically selected by a Pearson correlation coefficient method according to the property of power-law distribution. Lastly, an empirical study of Chinese text classification illustrates that the proposed method performs well when the dimension of selected features is relatively small.
ISSN:1099-4300