Robust Sparse Logistic Regression With the <inline-formula> <tex-math notation="LaTeX">$L_{q}$ </tex-math></inline-formula> (<inline-formula> <tex-math notation="LaTeX">$0 &lt; \text{q} &lt; 1$ </tex-math></inline-formula>) Regularization for Feature Selection Using Gene Expression Data

Microarray technology is a popular technique that has been extensively applied in cancer diagnosis. Many studies have used high-dimensional microarray data to identify informative features to classify the types of cancer, yet numerous irrelevant features that exist in microarray data may introduce t...

Full description

Bibliographic Details
Main Authors: Ziyi Yang, Yong Liang, Hui Zhang, Hua Chai, Bowen Zhang, Cheng Peng
Format: Article
Language:English
Published: IEEE 2018-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8528834/
Description
Summary:Microarray technology is a popular technique that has been extensively applied in cancer diagnosis. Many studies have used high-dimensional microarray data to identify informative features to classify the types of cancer, yet numerous irrelevant features that exist in microarray data may introduce the noise and decrease classification accuracy. Regularization techniques are common methods for feature selection, which can be used to reduce irrelevant features and avoid overfitting. In recent years, different regularization methods have been proposed. Theoretically, the L<sub>q</sub> (0 &lt;; q &lt;; 1) type penalty function with the lower value of q would acquire better sparse solutions. In addition, the loss function in most regression models is based on least-squares minimization. However, the least-square method is sensitive to noise and has poor robustness, especially when the error has a heavy-tailed distribution. It is well known that the least absolute deviation regression is the most common method for the robust regression, which can overcome the big noise problem. In general, there is a high level of noise in microarray data, which deter the development of microarray technology. To solve the above-mentioned problems, we propose a robust logistic regression based on the L<sub>q</sub> (0 &lt;; q &lt;; 1) regularization approach, which is a feasible and effective approach for feature selection in microarray classification. The L<sub>q</sub> (0 &lt;; q &lt;; 1) regularization leads to a non-convex optimization problem that is difficult to be solved. In this paper, we utilize a genetic algorithm based on the global search strategy to obtain an optimal solution.
ISSN:2169-3536