Three Aspects of Biostatistical Learning Theory

In the present dissertation we consider three classical problems in biostatistics and statistical learning - classification, variable selection and statistical inference. Chapter 2 is dedicated to multi-class classification. We characterize a class of loss functions which we deem relaxed Fisher co...

Full description

Bibliographic Details
Main Author:	Neykov, Matey
Other Authors:	Cai, Tianxi
Format:	Others
Language:	en
Published:	Harvard University 2015
Subjects:	Statistics
Online Access:	http://nrs.harvard.edu/urn-3:HUL.InstRepos:17467395

id	ndltd-harvard.edu-oai-dash.harvard.edu-1-17467395
record_format	oai_dc
spelling	ndltd-harvard.edu-oai-dash.harvard.edu-1-174673952017-07-27T15:51:49ZThree Aspects of Biostatistical Learning TheoryNeykov, MateyStatisticsIn the present dissertation we consider three classical problems in biostatistics and statistical learning - classification, variable selection and statistical inference. Chapter 2 is dedicated to multi-class classification. We characterize a class of loss functions which we deem relaxed Fisher consistent, whose local minimizers not only recover the Bayes rule but also the exact conditional class probabilities. Our class encompasses previously studied classes of loss-functions, and includes non-convex functions, which are known to be less susceptible to outliers. We propose a generic greedy functional gradient-descent minimization algorithm for boosting weak learners, which works with any loss function in our class. We show that the boosting algorithm achieves geometric rate of convergence in the case of a convex loss. In addition we provide numerical studies and a real data example which serve to illustrate that the algorithm performs well in practice. In Chapter 3, we provide insights on the behavior of sliced inverse regression in a high-dimensional setting under a single index model. We analyze two algorithms: a thresholding based algorithm known as diagonal thresholding and an L1 penalization algorithm - semidefinite programming, and show that they achieve optimal (up to a constant) sample size in terms of support recovery in the case of standard Gaussian predictors. In addition, we look into the performance of the linear regression LASSO in single index models with correlated Gaussian designs. We show that under certain restrictions on the covariance and signal, the linear regression LASSO can also enjoy optimal sample size in terms of support recovery. Our analysis extends existing results on LASSO's variable selection capabilities for linear models. Chapter 4 develops general inferential framework for testing and constructing confidence intervals for high-dimensional estimating equations. Such framework has a variety of applications and allows us to provide tests and confidence regions for parameters estimated by algorithms such as the Dantzig Selector, CLIME and LDP among others, non of which has been previously equipped with inferential procedures.BiostatisticsCai, TianxiLiu, Jun S.2015-07-17T17:59:19Z2015-052015-05-1520152015-07-17T17:59:19ZThesis or Dissertationtextapplication/pdfNeykov, Matey. 2015. Three Aspects of Biostatistical Learning Theory. Doctoral dissertation, Harvard University, Graduate School of Arts & Sciences.http://nrs.harvard.edu/urn-3:HUL.InstRepos:174673950000-0002-3320-3889enopenhttp://nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of-use#LAAHarvard University
collection	NDLTD
language	en
format	Others
sources	NDLTD
topic	Statistics
spellingShingle	Statistics Neykov, Matey Three Aspects of Biostatistical Learning Theory
description	In the present dissertation we consider three classical problems in biostatistics and statistical learning - classification, variable selection and statistical inference. Chapter 2 is dedicated to multi-class classification. We characterize a class of loss functions which we deem relaxed Fisher consistent, whose local minimizers not only recover the Bayes rule but also the exact conditional class probabilities. Our class encompasses previously studied classes of loss-functions, and includes non-convex functions, which are known to be less susceptible to outliers. We propose a generic greedy functional gradient-descent minimization algorithm for boosting weak learners, which works with any loss function in our class. We show that the boosting algorithm achieves geometric rate of convergence in the case of a convex loss. In addition we provide numerical studies and a real data example which serve to illustrate that the algorithm performs well in practice. In Chapter 3, we provide insights on the behavior of sliced inverse regression in a high-dimensional setting under a single index model. We analyze two algorithms: a thresholding based algorithm known as diagonal thresholding and an L1 penalization algorithm - semidefinite programming, and show that they achieve optimal (up to a constant) sample size in terms of support recovery in the case of standard Gaussian predictors. In addition, we look into the performance of the linear regression LASSO in single index models with correlated Gaussian designs. We show that under certain restrictions on the covariance and signal, the linear regression LASSO can also enjoy optimal sample size in terms of support recovery. Our analysis extends existing results on LASSO's variable selection capabilities for linear models. Chapter 4 develops general inferential framework for testing and constructing confidence intervals for high-dimensional estimating equations. Such framework has a variety of applications and allows us to provide tests and confidence regions for parameters estimated by algorithms such as the Dantzig Selector, CLIME and LDP among others, non of which has been previously equipped with inferential procedures. === Biostatistics
author2	Cai, Tianxi
author_facet	Cai, Tianxi Neykov, Matey
author	Neykov, Matey
author_sort	Neykov, Matey
title	Three Aspects of Biostatistical Learning Theory
title_short	Three Aspects of Biostatistical Learning Theory
title_full	Three Aspects of Biostatistical Learning Theory
title_fullStr	Three Aspects of Biostatistical Learning Theory
title_full_unstemmed	Three Aspects of Biostatistical Learning Theory
title_sort	three aspects of biostatistical learning theory
publisher	Harvard University
publishDate	2015
url	http://nrs.harvard.edu/urn-3:HUL.InstRepos:17467395
work_keys_str_mv	AT neykovmatey threeaspectsofbiostatisticallearningtheory
_version_	1718507127452991488

Three Aspects of Biostatistical Learning Theory

Similar Items