|
|
|
|
LEADER |
03334nam a2200445Ia 4500 |
001 |
10.1186-s12859-021-04053-3 |
008 |
220427s2021 CNT 000 0 und d |
020 |
|
|
|a 14712105 (ISSN)
|
245 |
1 |
0 |
|a Improved two-stage model averaging for high-dimensional linear regression, with application to Riboflavin data analysis
|
260 |
|
0 |
|b BioMed Central Ltd
|c 2021
|
856 |
|
|
|z View Fulltext in Publisher
|u https://doi.org/10.1186/s12859-021-04053-3
|
520 |
3 |
|
|a Background: Model averaging has attracted increasing attention in recent years for the analysis of high-dimensional data. By weighting several competing statistical models suitably, model averaging attempts to achieve stable and improved prediction. In this paper, we develop a two-stage model averaging procedure to enhance accuracy and stability in prediction for high-dimensional linear regression. First we employ a high-dimensional variable selection method such as LASSO to screen redundant predictors and construct a class of candidate models, then we apply the jackknife cross-validation to optimize model weights for averaging. Results: In simulation studies, the proposed technique outperforms commonly used alternative methods under high-dimensional regression setting, in terms of minimizing the mean of the squared prediction error. We apply the proposed method to a riboflavin data, the result show that such method is quite efficient in forecasting the riboflavin production rate, when there are thousands of genes and only tens of subjects. Conclusions: Compared with a recent high-dimensional model averaging procedure (Ando and Li in J Am Stat Assoc 109:254–65, 2014), the proposed approach enjoys three appealing features thus has better predictive performance: (1) More suitable methods are applied for model constructing and weighting. (2) Computational flexibility is retained since each candidate model and its corresponding weight are determined in the low-dimensional setting and the quadratic programming is utilized in the cross-validation. (3) Model selection and averaging are combined in the procedure thus it makes full use of the strengths of both techniques. As a consequence, the proposed method can achieve stable and accurate predictions in high-dimensional linear models, and can greatly help practical researchers analyze genetic data in medical research. © 2021, The Author(s).
|
650 |
0 |
4 |
|a Accurate prediction
|
650 |
0 |
4 |
|a Clustering algorithms
|
650 |
0 |
4 |
|a computer simulation
|
650 |
0 |
4 |
|a Computer Simulation
|
650 |
0 |
4 |
|a Corresponding weights
|
650 |
0 |
4 |
|a Cross-validation
|
650 |
0 |
4 |
|a data analysis
|
650 |
0 |
4 |
|a Data Analysis
|
650 |
0 |
4 |
|a Forecasting
|
650 |
0 |
4 |
|a High dimensional data
|
650 |
0 |
4 |
|a High-dimensional models
|
650 |
0 |
4 |
|a High-dimensional regression
|
650 |
0 |
4 |
|a High-dimensional regressions
|
650 |
0 |
4 |
|a Jackknife
|
650 |
0 |
4 |
|a Linear Models
|
650 |
0 |
4 |
|a Model averaging
|
650 |
0 |
4 |
|a Models, Statistical
|
650 |
0 |
4 |
|a Predictive performance
|
650 |
0 |
4 |
|a Quadratic programming
|
650 |
0 |
4 |
|a Regression analysis
|
650 |
0 |
4 |
|a riboflavin
|
650 |
0 |
4 |
|a Riboflavin
|
650 |
0 |
4 |
|a Squared prediction errors
|
650 |
0 |
4 |
|a statistical model
|
650 |
0 |
4 |
|a Variable selection
|
650 |
0 |
4 |
|a Variable selection methods
|
700 |
1 |
|
|a Pan, J.
|e author
|
773 |
|
|
|t BMC Bioinformatics
|