New Developments in Sparse PLS Regression

Methods based on partial least squares (PLS) regression, which has recently gained much attention in the analysis of high-dimensional genomic datasets, have been developed since the early 2000s for performing variable selection. Most of these techniques rely on tuning parameters that are often deter...

Full description

Bibliographic Details
Main Authors: Jérémy Magnanensi, Myriam Maumy-Bertrand, Nicolas Meyer, Frédéric Bertrand
Format: Article
Language:English
Published: Frontiers Media S.A. 2021-07-01
Series:Frontiers in Applied Mathematics and Statistics
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fams.2021.693126/full
id doaj-0a8ab28962a346b5bc74e0bb4189d93c
record_format Article
spelling doaj-0a8ab28962a346b5bc74e0bb4189d93c2021-07-16T11:27:55ZengFrontiers Media S.A.Frontiers in Applied Mathematics and Statistics2297-46872021-07-01710.3389/fams.2021.693126693126New Developments in Sparse PLS RegressionJérémy Magnanensi0Jérémy Magnanensi1Myriam Maumy-Bertrand2Myriam Maumy-Bertrand3Nicolas Meyer4Frédéric Bertrand5Frédéric Bertrand6IRMA, CNRS UMR 7501, Labex IRMIA, Université de Strasbourg and CNRS, Strasbourg, FranceLaboratoire de Biostatistique et Informatique Médicale, Faculté de Médecine, EA3430, Université de Strasbourg and CNRS, Strasbourg, FranceIRMA, CNRS UMR 7501, Labex IRMIA, Université de Strasbourg and CNRS, Strasbourg, FranceLIST3N, Université de Technologie de Troyes, Troyes, FranceLaboratoire de Biostatistique et Informatique Médicale, Faculté de Médecine, EA3430, Université de Strasbourg and CNRS, Strasbourg, FranceIRMA, CNRS UMR 7501, Labex IRMIA, Université de Strasbourg and CNRS, Strasbourg, FranceLIST3N, Université de Technologie de Troyes, Troyes, FranceMethods based on partial least squares (PLS) regression, which has recently gained much attention in the analysis of high-dimensional genomic datasets, have been developed since the early 2000s for performing variable selection. Most of these techniques rely on tuning parameters that are often determined by cross-validation (CV) based methods, which raises essential stability issues. To overcome this, we have developed a new dynamic bootstrap-based method for significant predictor selection, suitable for both PLS regression and its incorporation into generalized linear models (GPLS). It relies on establishing bootstrap confidence intervals, which allows testing of the significance of predictors at preset type I risk α, and avoids CV. We have also developed adapted versions of sparse PLS (SPLS) and sparse GPLS regression (SGPLS), using a recently introduced non-parametric bootstrap-based technique to determine the numbers of components. We compare their variable selection reliability and stability concerning tuning parameters determination and their predictive ability, using simulated data for PLS and real microarray gene expression data for PLS-logistic classification. We observe that our new dynamic bootstrap-based method has the property of best separating random noise in y from the relevant information with respect to other methods, leading to better accuracy and predictive abilities, especially for non-negligible noise levels.https://www.frontiersin.org/articles/10.3389/fams.2021.693126/fullvariable selectionpartial least squaressparse partial least squaresgeneralized partial least squaresbootstrapstability
collection DOAJ
language English
format Article
sources DOAJ
author Jérémy Magnanensi
Jérémy Magnanensi
Myriam Maumy-Bertrand
Myriam Maumy-Bertrand
Nicolas Meyer
Frédéric Bertrand
Frédéric Bertrand
spellingShingle Jérémy Magnanensi
Jérémy Magnanensi
Myriam Maumy-Bertrand
Myriam Maumy-Bertrand
Nicolas Meyer
Frédéric Bertrand
Frédéric Bertrand
New Developments in Sparse PLS Regression
Frontiers in Applied Mathematics and Statistics
variable selection
partial least squares
sparse partial least squares
generalized partial least squares
bootstrap
stability
author_facet Jérémy Magnanensi
Jérémy Magnanensi
Myriam Maumy-Bertrand
Myriam Maumy-Bertrand
Nicolas Meyer
Frédéric Bertrand
Frédéric Bertrand
author_sort Jérémy Magnanensi
title New Developments in Sparse PLS Regression
title_short New Developments in Sparse PLS Regression
title_full New Developments in Sparse PLS Regression
title_fullStr New Developments in Sparse PLS Regression
title_full_unstemmed New Developments in Sparse PLS Regression
title_sort new developments in sparse pls regression
publisher Frontiers Media S.A.
series Frontiers in Applied Mathematics and Statistics
issn 2297-4687
publishDate 2021-07-01
description Methods based on partial least squares (PLS) regression, which has recently gained much attention in the analysis of high-dimensional genomic datasets, have been developed since the early 2000s for performing variable selection. Most of these techniques rely on tuning parameters that are often determined by cross-validation (CV) based methods, which raises essential stability issues. To overcome this, we have developed a new dynamic bootstrap-based method for significant predictor selection, suitable for both PLS regression and its incorporation into generalized linear models (GPLS). It relies on establishing bootstrap confidence intervals, which allows testing of the significance of predictors at preset type I risk α, and avoids CV. We have also developed adapted versions of sparse PLS (SPLS) and sparse GPLS regression (SGPLS), using a recently introduced non-parametric bootstrap-based technique to determine the numbers of components. We compare their variable selection reliability and stability concerning tuning parameters determination and their predictive ability, using simulated data for PLS and real microarray gene expression data for PLS-logistic classification. We observe that our new dynamic bootstrap-based method has the property of best separating random noise in y from the relevant information with respect to other methods, leading to better accuracy and predictive abilities, especially for non-negligible noise levels.
topic variable selection
partial least squares
sparse partial least squares
generalized partial least squares
bootstrap
stability
url https://www.frontiersin.org/articles/10.3389/fams.2021.693126/full
work_keys_str_mv AT jeremymagnanensi newdevelopmentsinsparseplsregression
AT jeremymagnanensi newdevelopmentsinsparseplsregression
AT myriammaumybertrand newdevelopmentsinsparseplsregression
AT myriammaumybertrand newdevelopmentsinsparseplsregression
AT nicolasmeyer newdevelopmentsinsparseplsregression
AT fredericbertrand newdevelopmentsinsparseplsregression
AT fredericbertrand newdevelopmentsinsparseplsregression
_version_ 1721297608038678528