Sparse sliced inverse regression for high dimensional data analysis

Background: Dimension reduction and variable selection play a critical role in the analysis of contemporary high-dimensional data. The semi-parametric multi-index model often serves as a reasonable model for analysis of such high-dimensional data. The sliced inverse regression (SIR) method, which ca...

Full description

Bibliographic Details
Main Authors: Hilafu, H. (Author), Safo, S.E (Author)
Format: Article
Language:English
Published: BioMed Central Ltd 2022
Subjects:
Online Access:View Fulltext in Publisher
LEADER 02839nam a2200481Ia 4500
001 10.1186-s12859-022-04700-3
008 220706s2022 CNT 000 0 und d
020 |a 14712105 (ISSN) 
245 1 0 |a Sparse sliced inverse regression for high dimensional data analysis 
260 0 |b BioMed Central Ltd  |c 2022 
856 |z View Fulltext in Publisher  |u https://doi.org/10.1186/s12859-022-04700-3 
520 3 |a Background: Dimension reduction and variable selection play a critical role in the analysis of contemporary high-dimensional data. The semi-parametric multi-index model often serves as a reasonable model for analysis of such high-dimensional data. The sliced inverse regression (SIR) method, which can be formulated as a generalized eigenvalue decomposition problem, offers a model-free estimation approach for the indices in the semi-parametric multi-index model. Obtaining sparse estimates of the eigenvectors that constitute the basis matrix that is used to construct the indices is desirable to facilitate variable selection, which in turn facilitates interpretability and model parsimony. Results: To this end, we propose a group-Dantzig selector type formulation that induces row-sparsity to the sliced inverse regression dimension reduction vectors. Extensive simulation studies are carried out to assess the performance of the proposed method, and compare it with other state of the art methods in the literature. Conclusion: The proposed method is shown to yield competitive estimation, prediction, and variable selection performance. Three real data applications, including a metabolomics depression study, are presented to demonstrate the method’s effectiveness in practice. © 2022, The Author(s). 
650 0 4 |a article 
650 0 4 |a Clustering algorithms 
650 0 4 |a data analysis 
650 0 4 |a decomposition 
650 0 4 |a Dimension reduction 
650 0 4 |a dimensionality reduction 
650 0 4 |a discriminant analysis 
650 0 4 |a Discriminant analysis 
650 0 4 |a Eigenvalues and eigenfunctions 
650 0 4 |a Generalized eigenvalue decomposition 
650 0 4 |a Generalized eigenvalue decomposition 
650 0 4 |a Generalized eigenvalues 
650 0 4 |a High dimensional data 
650 0 4 |a High-dimensional data 
650 0 4 |a Inverse problems 
650 0 4 |a Linear discriminant analysis 
650 0 4 |a Linear discriminant analyze 
650 0 4 |a metabolomics 
650 0 4 |a Multi-index 
650 0 4 |a prediction 
650 0 4 |a Regression analysis 
650 0 4 |a Semiparametric 
650 0 4 |a Semiparametric model 
650 0 4 |a Semi-parametric modeling 
650 0 4 |a simulation 
650 0 4 |a Sliced inverse regression 
650 0 4 |a Sliced inverse regressions 
650 0 4 |a Variables selections 
700 1 0 |a Hilafu, H.  |e author 
700 1 0 |a Safo, S.E.  |e author 
773 |t BMC Bioinformatics