Robust Sparse Logistic Regression With the <inline-formula> <tex-math notation="LaTeX">$L_{q}$ </tex-math></inline-formula> (<inline-formula> <tex-math notation="LaTeX">$0 &lt; \text{q} &lt; 1$ </tex-math></inline-formula>) Regularization for Feature Selection Using Gene Expression Data

Microarray technology is a popular technique that has been extensively applied in cancer diagnosis. Many studies have used high-dimensional microarray data to identify informative features to classify the types of cancer, yet numerous irrelevant features that exist in microarray data may introduce t...

Full description

Bibliographic Details
Main Authors: Ziyi Yang, Yong Liang, Hui Zhang, Hua Chai, Bowen Zhang, Cheng Peng
Format: Article
Language:English
Published: IEEE 2018-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8528834/
id doaj-8fdf519d126a4653b2c744b60bcae28b
record_format Article
spelling doaj-8fdf519d126a4653b2c744b60bcae28b2021-03-29T21:37:17ZengIEEEIEEE Access2169-35362018-01-016685866859510.1109/ACCESS.2018.28801988528834Robust Sparse Logistic Regression With the <inline-formula> <tex-math notation="LaTeX">$L_{q}$ </tex-math></inline-formula> (<inline-formula> <tex-math notation="LaTeX">$0 &lt; \text{q} &lt; 1$ </tex-math></inline-formula>) Regularization for Feature Selection Using Gene Expression DataZiyi Yang0https://orcid.org/0000-0002-3550-9283Yong Liang1https://orcid.org/0000-0002-2858-2810Hui Zhang2https://orcid.org/0000-0003-1715-4332Hua Chai3Bowen Zhang4https://orcid.org/0000-0002-3581-9476Cheng Peng5https://orcid.org/0000-0002-4605-7534Faculty of Information Technology, Macau University of Science and Technology, Taipa, MacauFaculty of Information Technology, Macau University of Science and Technology, Taipa, MacauFaculty of Information Technology, Macau University of Science and Technology, Taipa, MacauFaculty of Information Technology, Macau University of Science and Technology, Taipa, MacauFaculty of Computer Science and Technology, Harbin Institute of Technology, Harbin, ChinaFaculty of Information Technology, Macau University of Science and Technology, Taipa, MacauMicroarray technology is a popular technique that has been extensively applied in cancer diagnosis. Many studies have used high-dimensional microarray data to identify informative features to classify the types of cancer, yet numerous irrelevant features that exist in microarray data may introduce the noise and decrease classification accuracy. Regularization techniques are common methods for feature selection, which can be used to reduce irrelevant features and avoid overfitting. In recent years, different regularization methods have been proposed. Theoretically, the L<sub>q</sub> (0 &lt;; q &lt;; 1) type penalty function with the lower value of q would acquire better sparse solutions. In addition, the loss function in most regression models is based on least-squares minimization. However, the least-square method is sensitive to noise and has poor robustness, especially when the error has a heavy-tailed distribution. It is well known that the least absolute deviation regression is the most common method for the robust regression, which can overcome the big noise problem. In general, there is a high level of noise in microarray data, which deter the development of microarray technology. To solve the above-mentioned problems, we propose a robust logistic regression based on the L<sub>q</sub> (0 &lt;; q &lt;; 1) regularization approach, which is a feasible and effective approach for feature selection in microarray classification. The L<sub>q</sub> (0 &lt;; q &lt;; 1) regularization leads to a non-convex optimization problem that is difficult to be solved. In this paper, we utilize a genetic algorithm based on the global search strategy to obtain an optimal solution.https://ieeexplore.ieee.org/document/8528834/Robust logistic regressionfeature selection<italic xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">L</italic>q regularizationgenetic algorithm
collection DOAJ
language English
format Article
sources DOAJ
author Ziyi Yang
Yong Liang
Hui Zhang
Hua Chai
Bowen Zhang
Cheng Peng
spellingShingle Ziyi Yang
Yong Liang
Hui Zhang
Hua Chai
Bowen Zhang
Cheng Peng
Robust Sparse Logistic Regression With the <inline-formula> <tex-math notation="LaTeX">$L_{q}$ </tex-math></inline-formula> (<inline-formula> <tex-math notation="LaTeX">$0 &lt; \text{q} &lt; 1$ </tex-math></inline-formula>) Regularization for Feature Selection Using Gene Expression Data
IEEE Access
Robust logistic regression
feature selection
<italic xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">L</italic>q regularization
genetic algorithm
author_facet Ziyi Yang
Yong Liang
Hui Zhang
Hua Chai
Bowen Zhang
Cheng Peng
author_sort Ziyi Yang
title Robust Sparse Logistic Regression With the <inline-formula> <tex-math notation="LaTeX">$L_{q}$ </tex-math></inline-formula> (<inline-formula> <tex-math notation="LaTeX">$0 &lt; \text{q} &lt; 1$ </tex-math></inline-formula>) Regularization for Feature Selection Using Gene Expression Data
title_short Robust Sparse Logistic Regression With the <inline-formula> <tex-math notation="LaTeX">$L_{q}$ </tex-math></inline-formula> (<inline-formula> <tex-math notation="LaTeX">$0 &lt; \text{q} &lt; 1$ </tex-math></inline-formula>) Regularization for Feature Selection Using Gene Expression Data
title_full Robust Sparse Logistic Regression With the <inline-formula> <tex-math notation="LaTeX">$L_{q}$ </tex-math></inline-formula> (<inline-formula> <tex-math notation="LaTeX">$0 &lt; \text{q} &lt; 1$ </tex-math></inline-formula>) Regularization for Feature Selection Using Gene Expression Data
title_fullStr Robust Sparse Logistic Regression With the <inline-formula> <tex-math notation="LaTeX">$L_{q}$ </tex-math></inline-formula> (<inline-formula> <tex-math notation="LaTeX">$0 &lt; \text{q} &lt; 1$ </tex-math></inline-formula>) Regularization for Feature Selection Using Gene Expression Data
title_full_unstemmed Robust Sparse Logistic Regression With the <inline-formula> <tex-math notation="LaTeX">$L_{q}$ </tex-math></inline-formula> (<inline-formula> <tex-math notation="LaTeX">$0 &lt; \text{q} &lt; 1$ </tex-math></inline-formula>) Regularization for Feature Selection Using Gene Expression Data
title_sort robust sparse logistic regression with the <inline-formula> <tex-math notation="latex">$l_{q}$ </tex-math></inline-formula> (<inline-formula> <tex-math notation="latex">$0 &lt; \text{q} &lt; 1$ </tex-math></inline-formula>) regularization for feature selection using gene expression data
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2018-01-01
description Microarray technology is a popular technique that has been extensively applied in cancer diagnosis. Many studies have used high-dimensional microarray data to identify informative features to classify the types of cancer, yet numerous irrelevant features that exist in microarray data may introduce the noise and decrease classification accuracy. Regularization techniques are common methods for feature selection, which can be used to reduce irrelevant features and avoid overfitting. In recent years, different regularization methods have been proposed. Theoretically, the L<sub>q</sub> (0 &lt;; q &lt;; 1) type penalty function with the lower value of q would acquire better sparse solutions. In addition, the loss function in most regression models is based on least-squares minimization. However, the least-square method is sensitive to noise and has poor robustness, especially when the error has a heavy-tailed distribution. It is well known that the least absolute deviation regression is the most common method for the robust regression, which can overcome the big noise problem. In general, there is a high level of noise in microarray data, which deter the development of microarray technology. To solve the above-mentioned problems, we propose a robust logistic regression based on the L<sub>q</sub> (0 &lt;; q &lt;; 1) regularization approach, which is a feasible and effective approach for feature selection in microarray classification. The L<sub>q</sub> (0 &lt;; q &lt;; 1) regularization leads to a non-convex optimization problem that is difficult to be solved. In this paper, we utilize a genetic algorithm based on the global search strategy to obtain an optimal solution.
topic Robust logistic regression
feature selection
<italic xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">L</italic>q regularization
genetic algorithm
url https://ieeexplore.ieee.org/document/8528834/
work_keys_str_mv AT ziyiyang robustsparselogisticregressionwiththeinlineformulatexmathnotationlatexlqtexmathinlineformulainlineformulatexmathnotationlatex0lttextqlt1texmathinlineformularegularizationforfeatureselectionusinggeneexpressiondata
AT yongliang robustsparselogisticregressionwiththeinlineformulatexmathnotationlatexlqtexmathinlineformulainlineformulatexmathnotationlatex0lttextqlt1texmathinlineformularegularizationforfeatureselectionusinggeneexpressiondata
AT huizhang robustsparselogisticregressionwiththeinlineformulatexmathnotationlatexlqtexmathinlineformulainlineformulatexmathnotationlatex0lttextqlt1texmathinlineformularegularizationforfeatureselectionusinggeneexpressiondata
AT huachai robustsparselogisticregressionwiththeinlineformulatexmathnotationlatexlqtexmathinlineformulainlineformulatexmathnotationlatex0lttextqlt1texmathinlineformularegularizationforfeatureselectionusinggeneexpressiondata
AT bowenzhang robustsparselogisticregressionwiththeinlineformulatexmathnotationlatexlqtexmathinlineformulainlineformulatexmathnotationlatex0lttextqlt1texmathinlineformularegularizationforfeatureselectionusinggeneexpressiondata
AT chengpeng robustsparselogisticregressionwiththeinlineformulatexmathnotationlatexlqtexmathinlineformulainlineformulatexmathnotationlatex0lttextqlt1texmathinlineformularegularizationforfeatureselectionusinggeneexpressiondata
_version_ 1724192532898775040