Network-Constrained Group Lasso for High-Dimensional Multinomial Classification with Application to Cancer Subtype Prediction

Classic multinomial logit model, commonly used in multiclass regression problem, is restricted to few predictors and does not take into account the relationship among variables. It has limited use for genomic data, where the number of genomic features far exceeds the sample size. Genomic features su...

Full description

Bibliographic Details
Main Authors:	Xinyu Tian, Xuefeng Wang, Jun Chen
Format:	Article
Language:	English
Published:	SAGE Publishing 2014-01-01
Series:	Cancer Informatics
Online Access:	https://doi.org/10.4137/CIN.S17686

id	doaj-8c178b24766e4ea0ab098c00424b0c20
record_format	Article
spelling	doaj-8c178b24766e4ea0ab098c00424b0c202020-11-25T02:48:37ZengSAGE PublishingCancer Informatics1176-93512014-01-0113s610.4137/CIN.S17686Network-Constrained Group Lasso for High-Dimensional Multinomial Classification with Application to Cancer Subtype PredictionXinyu Tian0Xuefeng Wang1Jun Chen2 Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, NY, USA. Department of Preventive Medicine, Stony Brook University, Stony Brook, NY, USA. Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, MN, USA.Classic multinomial logit model, commonly used in multiclass regression problem, is restricted to few predictors and does not take into account the relationship among variables. It has limited use for genomic data, where the number of genomic features far exceeds the sample size. Genomic features such as gene expressions are usually related by an underlying biological network. Efficient use of the network information is important to improve classification performance as well as the biological interpretability. We proposed a multinomial logit model that is capable of addressing both the high dimensionality of predictors and the underlying network information. Group lasso was used to induce model sparsity, and a network-constraint was imposed to induce the smoothness of the coefficients with respect to the underlying network structure. To deal with the non-smoothness of the objective function in optimization, we developed a proximal gradient algorithm for efficient computation. The proposed model was compared to models with no prior structure information in both simulations and a problem of cancer subtype prediction with real TCGA (the cancer genome atlas) gene expression data. The network-constrained mode outperformed the traditional ones in both cases.https://doi.org/10.4137/CIN.S17686
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Xinyu Tian Xuefeng Wang Jun Chen
spellingShingle	Xinyu Tian Xuefeng Wang Jun Chen Network-Constrained Group Lasso for High-Dimensional Multinomial Classification with Application to Cancer Subtype Prediction Cancer Informatics
author_facet	Xinyu Tian Xuefeng Wang Jun Chen
author_sort	Xinyu Tian
title	Network-Constrained Group Lasso for High-Dimensional Multinomial Classification with Application to Cancer Subtype Prediction
title_short	Network-Constrained Group Lasso for High-Dimensional Multinomial Classification with Application to Cancer Subtype Prediction
title_full	Network-Constrained Group Lasso for High-Dimensional Multinomial Classification with Application to Cancer Subtype Prediction
title_fullStr	Network-Constrained Group Lasso for High-Dimensional Multinomial Classification with Application to Cancer Subtype Prediction
title_full_unstemmed	Network-Constrained Group Lasso for High-Dimensional Multinomial Classification with Application to Cancer Subtype Prediction
title_sort	network-constrained group lasso for high-dimensional multinomial classification with application to cancer subtype prediction
publisher	SAGE Publishing
series	Cancer Informatics
issn	1176-9351
publishDate	2014-01-01
description	Classic multinomial logit model, commonly used in multiclass regression problem, is restricted to few predictors and does not take into account the relationship among variables. It has limited use for genomic data, where the number of genomic features far exceeds the sample size. Genomic features such as gene expressions are usually related by an underlying biological network. Efficient use of the network information is important to improve classification performance as well as the biological interpretability. We proposed a multinomial logit model that is capable of addressing both the high dimensionality of predictors and the underlying network information. Group lasso was used to induce model sparsity, and a network-constraint was imposed to induce the smoothness of the coefficients with respect to the underlying network structure. To deal with the non-smoothness of the objective function in optimization, we developed a proximal gradient algorithm for efficient computation. The proposed model was compared to models with no prior structure information in both simulations and a problem of cancer subtype prediction with real TCGA (the cancer genome atlas) gene expression data. The network-constrained mode outperformed the traditional ones in both cases.
url	https://doi.org/10.4137/CIN.S17686
work_keys_str_mv	AT xinyutian networkconstrainedgrouplassoforhighdimensionalmultinomialclassificationwithapplicationtocancersubtypeprediction AT xuefengwang networkconstrainedgrouplassoforhighdimensionalmultinomialclassificationwithapplicationtocancersubtypeprediction AT junchen networkconstrainedgrouplassoforhighdimensionalmultinomialclassificationwithapplicationtocancersubtypeprediction
_version_	1724747561481273344

Network-Constrained Group Lasso for High-Dimensional Multinomial Classification with Application to Cancer Subtype Prediction

Similar Items