Robust Sparse Logistic Regression With the <inline-formula> <tex-math notation="LaTeX">$L_{q}$ </tex-math></inline-formula> (<inline-formula> <tex-math notation="LaTeX">$0 < \text{q} < 1$ </tex-math></inline-formula>) Regularization for Feature Selection Using Gene Expression Data
Microarray technology is a popular technique that has been extensively applied in cancer diagnosis. Many studies have used high-dimensional microarray data to identify informative features to classify the types of cancer, yet numerous irrelevant features that exist in microarray data may introduce t...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2018-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/8528834/ |
id |
doaj-8fdf519d126a4653b2c744b60bcae28b |
---|---|
record_format |
Article |
spelling |
doaj-8fdf519d126a4653b2c744b60bcae28b2021-03-29T21:37:17ZengIEEEIEEE Access2169-35362018-01-016685866859510.1109/ACCESS.2018.28801988528834Robust Sparse Logistic Regression With the <inline-formula> <tex-math notation="LaTeX">$L_{q}$ </tex-math></inline-formula> (<inline-formula> <tex-math notation="LaTeX">$0 < \text{q} < 1$ </tex-math></inline-formula>) Regularization for Feature Selection Using Gene Expression DataZiyi Yang0https://orcid.org/0000-0002-3550-9283Yong Liang1https://orcid.org/0000-0002-2858-2810Hui Zhang2https://orcid.org/0000-0003-1715-4332Hua Chai3Bowen Zhang4https://orcid.org/0000-0002-3581-9476Cheng Peng5https://orcid.org/0000-0002-4605-7534Faculty of Information Technology, Macau University of Science and Technology, Taipa, MacauFaculty of Information Technology, Macau University of Science and Technology, Taipa, MacauFaculty of Information Technology, Macau University of Science and Technology, Taipa, MacauFaculty of Information Technology, Macau University of Science and Technology, Taipa, MacauFaculty of Computer Science and Technology, Harbin Institute of Technology, Harbin, ChinaFaculty of Information Technology, Macau University of Science and Technology, Taipa, MacauMicroarray technology is a popular technique that has been extensively applied in cancer diagnosis. Many studies have used high-dimensional microarray data to identify informative features to classify the types of cancer, yet numerous irrelevant features that exist in microarray data may introduce the noise and decrease classification accuracy. Regularization techniques are common methods for feature selection, which can be used to reduce irrelevant features and avoid overfitting. In recent years, different regularization methods have been proposed. Theoretically, the L<sub>q</sub> (0 <; q <; 1) type penalty function with the lower value of q would acquire better sparse solutions. In addition, the loss function in most regression models is based on least-squares minimization. However, the least-square method is sensitive to noise and has poor robustness, especially when the error has a heavy-tailed distribution. It is well known that the least absolute deviation regression is the most common method for the robust regression, which can overcome the big noise problem. In general, there is a high level of noise in microarray data, which deter the development of microarray technology. To solve the above-mentioned problems, we propose a robust logistic regression based on the L<sub>q</sub> (0 <; q <; 1) regularization approach, which is a feasible and effective approach for feature selection in microarray classification. The L<sub>q</sub> (0 <; q <; 1) regularization leads to a non-convex optimization problem that is difficult to be solved. In this paper, we utilize a genetic algorithm based on the global search strategy to obtain an optimal solution.https://ieeexplore.ieee.org/document/8528834/Robust logistic regressionfeature selection<italic xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">L</italic>q regularizationgenetic algorithm |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Ziyi Yang Yong Liang Hui Zhang Hua Chai Bowen Zhang Cheng Peng |
spellingShingle |
Ziyi Yang Yong Liang Hui Zhang Hua Chai Bowen Zhang Cheng Peng Robust Sparse Logistic Regression With the <inline-formula> <tex-math notation="LaTeX">$L_{q}$ </tex-math></inline-formula> (<inline-formula> <tex-math notation="LaTeX">$0 < \text{q} < 1$ </tex-math></inline-formula>) Regularization for Feature Selection Using Gene Expression Data IEEE Access Robust logistic regression feature selection <italic xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">L</italic>q regularization genetic algorithm |
author_facet |
Ziyi Yang Yong Liang Hui Zhang Hua Chai Bowen Zhang Cheng Peng |
author_sort |
Ziyi Yang |
title |
Robust Sparse Logistic Regression With the <inline-formula> <tex-math notation="LaTeX">$L_{q}$ </tex-math></inline-formula> (<inline-formula> <tex-math notation="LaTeX">$0 < \text{q} < 1$ </tex-math></inline-formula>) Regularization for Feature Selection Using Gene Expression Data |
title_short |
Robust Sparse Logistic Regression With the <inline-formula> <tex-math notation="LaTeX">$L_{q}$ </tex-math></inline-formula> (<inline-formula> <tex-math notation="LaTeX">$0 < \text{q} < 1$ </tex-math></inline-formula>) Regularization for Feature Selection Using Gene Expression Data |
title_full |
Robust Sparse Logistic Regression With the <inline-formula> <tex-math notation="LaTeX">$L_{q}$ </tex-math></inline-formula> (<inline-formula> <tex-math notation="LaTeX">$0 < \text{q} < 1$ </tex-math></inline-formula>) Regularization for Feature Selection Using Gene Expression Data |
title_fullStr |
Robust Sparse Logistic Regression With the <inline-formula> <tex-math notation="LaTeX">$L_{q}$ </tex-math></inline-formula> (<inline-formula> <tex-math notation="LaTeX">$0 < \text{q} < 1$ </tex-math></inline-formula>) Regularization for Feature Selection Using Gene Expression Data |
title_full_unstemmed |
Robust Sparse Logistic Regression With the <inline-formula> <tex-math notation="LaTeX">$L_{q}$ </tex-math></inline-formula> (<inline-formula> <tex-math notation="LaTeX">$0 < \text{q} < 1$ </tex-math></inline-formula>) Regularization for Feature Selection Using Gene Expression Data |
title_sort |
robust sparse logistic regression with the <inline-formula> <tex-math notation="latex">$l_{q}$ </tex-math></inline-formula> (<inline-formula> <tex-math notation="latex">$0 < \text{q} < 1$ </tex-math></inline-formula>) regularization for feature selection using gene expression data |
publisher |
IEEE |
series |
IEEE Access |
issn |
2169-3536 |
publishDate |
2018-01-01 |
description |
Microarray technology is a popular technique that has been extensively applied in cancer diagnosis. Many studies have used high-dimensional microarray data to identify informative features to classify the types of cancer, yet numerous irrelevant features that exist in microarray data may introduce the noise and decrease classification accuracy. Regularization techniques are common methods for feature selection, which can be used to reduce irrelevant features and avoid overfitting. In recent years, different regularization methods have been proposed. Theoretically, the L<sub>q</sub> (0 <; q <; 1) type penalty function with the lower value of q would acquire better sparse solutions. In addition, the loss function in most regression models is based on least-squares minimization. However, the least-square method is sensitive to noise and has poor robustness, especially when the error has a heavy-tailed distribution. It is well known that the least absolute deviation regression is the most common method for the robust regression, which can overcome the big noise problem. In general, there is a high level of noise in microarray data, which deter the development of microarray technology. To solve the above-mentioned problems, we propose a robust logistic regression based on the L<sub>q</sub> (0 <; q <; 1) regularization approach, which is a feasible and effective approach for feature selection in microarray classification. The L<sub>q</sub> (0 <; q <; 1) regularization leads to a non-convex optimization problem that is difficult to be solved. In this paper, we utilize a genetic algorithm based on the global search strategy to obtain an optimal solution. |
topic |
Robust logistic regression feature selection <italic xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">L</italic>q regularization genetic algorithm |
url |
https://ieeexplore.ieee.org/document/8528834/ |
work_keys_str_mv |
AT ziyiyang robustsparselogisticregressionwiththeinlineformulatexmathnotationlatexlqtexmathinlineformulainlineformulatexmathnotationlatex0lttextqlt1texmathinlineformularegularizationforfeatureselectionusinggeneexpressiondata AT yongliang robustsparselogisticregressionwiththeinlineformulatexmathnotationlatexlqtexmathinlineformulainlineformulatexmathnotationlatex0lttextqlt1texmathinlineformularegularizationforfeatureselectionusinggeneexpressiondata AT huizhang robustsparselogisticregressionwiththeinlineformulatexmathnotationlatexlqtexmathinlineformulainlineformulatexmathnotationlatex0lttextqlt1texmathinlineformularegularizationforfeatureselectionusinggeneexpressiondata AT huachai robustsparselogisticregressionwiththeinlineformulatexmathnotationlatexlqtexmathinlineformulainlineformulatexmathnotationlatex0lttextqlt1texmathinlineformularegularizationforfeatureselectionusinggeneexpressiondata AT bowenzhang robustsparselogisticregressionwiththeinlineformulatexmathnotationlatexlqtexmathinlineformulainlineformulatexmathnotationlatex0lttextqlt1texmathinlineformularegularizationforfeatureselectionusinggeneexpressiondata AT chengpeng robustsparselogisticregressionwiththeinlineformulatexmathnotationlatexlqtexmathinlineformulainlineformulatexmathnotationlatex0lttextqlt1texmathinlineformularegularizationforfeatureselectionusinggeneexpressiondata |
_version_ |
1724192532898775040 |