Discriminant Analysis and Support Vector Regression in High Dimensions: Sharp Performance Analysis and Optimal Designs

Machine learning is emerging as a powerful tool to data science and is being applied in almost all subjects. In many applications, the number of features is com- parable to the number of samples, and both grow large. This setting is usually named the high-dimensional regime. In this regime, new chal...

Full description

Bibliographic Details
Main Author: Sifaou, Houssem
Other Authors: Alouini, Mohamed-Slim
Language:en
Published: 2021
Subjects:
Online Access:Sifaou, H. (2021). Discriminant Analysis and Support Vector Regression in High Dimensions: Sharp Performance Analysis and Optimal Designs. KAUST Research Repository. https://doi.org/10.25781/KAUST-7TA37
http://hdl.handle.net/10754/668982
id ndltd-kaust.edu.sa-oai-repository.kaust.edu.sa-10754-668982
record_format oai_dc
spelling ndltd-kaust.edu.sa-oai-repository.kaust.edu.sa-10754-6689822021-04-29T05:07:13Z Discriminant Analysis and Support Vector Regression in High Dimensions: Sharp Performance Analysis and Optimal Designs Sifaou, Houssem Alouini, Mohamed-Slim Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division Shihada, Basem Zhang, Xiangliang Kammoun, Abla McKay, Matthew High dimensions Discriminant analysis Support vector regression Asymptotic analysis Machine learning is emerging as a powerful tool to data science and is being applied in almost all subjects. In many applications, the number of features is com- parable to the number of samples, and both grow large. This setting is usually named the high-dimensional regime. In this regime, new challenges arise when it comes to the application of machine learning. In this work, we conduct a high-dimensional performance analysis of some popular classification and regression techniques. In a first part, discriminant analysis classifiers are considered. A major challenge towards the use of these classifiers in practice is that they depend on the inverse of covariance matrices that need to be estimated from training data. Several estimators for the inverse of the covariance matrices can be used. The most common ones are estimators based on the regularization approach. In this thesis, we propose new estimators that are shown to yield better performance. The main principle of our proposed approach is the design of an optimized inverse covariance matrix estimator based on the assumption that the covariance matrix is a low-rank perturbation of a scaled identity matrix. We show that not only the proposed classifiers are easier to implement but also, outperform the classical regularization-based discriminant analysis classifiers. In a second part, we carry out a high-dimensional statistical analysis of linear support vector regression. Under some plausible assumptions on the statistical dis- tribution of the data, we characterize the feasibility condition for the hard support vector regression and, when feasible, derive an asymptotic approximation for its risk. Similarly, we study the test risk for the soft support vector regression as a function of its parameters. The analysis is then extended to the case of kernel support vector regression under generalized linear models assumption. Based on our analysis, we illustrate that adding more samples may be harmful to the test performance of these regression algorithms, while it is always beneficial when the parameters are optimally selected. Our results pave the way to understand the effect of the underlying hyper- parameters and provide insights on how to optimally choose the kernel function. 2021-04-27T09:18:18Z 2021-04-27T09:18:18Z 2021-04 Dissertation Sifaou, H. (2021). Discriminant Analysis and Support Vector Regression in High Dimensions: Sharp Performance Analysis and Optimal Designs. KAUST Research Repository. https://doi.org/10.25781/KAUST-7TA37 10.25781/KAUST-7TA37 http://hdl.handle.net/10754/668982 en 2022-04-25 At the time of archiving, the student author of this dissertation opted to temporarily restrict access to it. The full text of this dissertation will become available to the public after the expiration of the embargo on 2022-04-25.
collection NDLTD
language en
sources NDLTD
topic High dimensions
Discriminant analysis
Support vector regression
Asymptotic analysis
spellingShingle High dimensions
Discriminant analysis
Support vector regression
Asymptotic analysis
Sifaou, Houssem
Discriminant Analysis and Support Vector Regression in High Dimensions: Sharp Performance Analysis and Optimal Designs
description Machine learning is emerging as a powerful tool to data science and is being applied in almost all subjects. In many applications, the number of features is com- parable to the number of samples, and both grow large. This setting is usually named the high-dimensional regime. In this regime, new challenges arise when it comes to the application of machine learning. In this work, we conduct a high-dimensional performance analysis of some popular classification and regression techniques. In a first part, discriminant analysis classifiers are considered. A major challenge towards the use of these classifiers in practice is that they depend on the inverse of covariance matrices that need to be estimated from training data. Several estimators for the inverse of the covariance matrices can be used. The most common ones are estimators based on the regularization approach. In this thesis, we propose new estimators that are shown to yield better performance. The main principle of our proposed approach is the design of an optimized inverse covariance matrix estimator based on the assumption that the covariance matrix is a low-rank perturbation of a scaled identity matrix. We show that not only the proposed classifiers are easier to implement but also, outperform the classical regularization-based discriminant analysis classifiers. In a second part, we carry out a high-dimensional statistical analysis of linear support vector regression. Under some plausible assumptions on the statistical dis- tribution of the data, we characterize the feasibility condition for the hard support vector regression and, when feasible, derive an asymptotic approximation for its risk. Similarly, we study the test risk for the soft support vector regression as a function of its parameters. The analysis is then extended to the case of kernel support vector regression under generalized linear models assumption. Based on our analysis, we illustrate that adding more samples may be harmful to the test performance of these regression algorithms, while it is always beneficial when the parameters are optimally selected. Our results pave the way to understand the effect of the underlying hyper- parameters and provide insights on how to optimally choose the kernel function.
author2 Alouini, Mohamed-Slim
author_facet Alouini, Mohamed-Slim
Sifaou, Houssem
author Sifaou, Houssem
author_sort Sifaou, Houssem
title Discriminant Analysis and Support Vector Regression in High Dimensions: Sharp Performance Analysis and Optimal Designs
title_short Discriminant Analysis and Support Vector Regression in High Dimensions: Sharp Performance Analysis and Optimal Designs
title_full Discriminant Analysis and Support Vector Regression in High Dimensions: Sharp Performance Analysis and Optimal Designs
title_fullStr Discriminant Analysis and Support Vector Regression in High Dimensions: Sharp Performance Analysis and Optimal Designs
title_full_unstemmed Discriminant Analysis and Support Vector Regression in High Dimensions: Sharp Performance Analysis and Optimal Designs
title_sort discriminant analysis and support vector regression in high dimensions: sharp performance analysis and optimal designs
publishDate 2021
url Sifaou, H. (2021). Discriminant Analysis and Support Vector Regression in High Dimensions: Sharp Performance Analysis and Optimal Designs. KAUST Research Repository. https://doi.org/10.25781/KAUST-7TA37
http://hdl.handle.net/10754/668982
work_keys_str_mv AT sifaouhoussem discriminantanalysisandsupportvectorregressioninhighdimensionssharpperformanceanalysisandoptimaldesigns
_version_ 1719399500541329408