Efficient algorithm for testing goodness-of-fit for classification of high dimensional data

Let us have a sample satisfying d-dimensional Gaussian mixture model (d is supposed to be large). The problem of classification of the sample is considered. Because of large dimension it is natural to project the sample to k-dimensional (k = 1,  2, . . .) linear subspaces using projection pursuit m...

Full description

Bibliographic Details
Main Author: Gintautas Jakimauskas
Format: Article
Language:English
Published: Vilnius University Press 2009-12-01
Series:Lietuvos Matematikos Rinkinys
Subjects:
Online Access:https://www.journals.vu.lt/LMR/article/view/17982
id doaj-9d6c4612ae3d41e4aecd9bc551a8d9b5
record_format Article
spelling doaj-9d6c4612ae3d41e4aecd9bc551a8d9b52020-11-25T03:04:30ZengVilnius University PressLietuvos Matematikos Rinkinys0132-28182335-898X2009-12-0150proc. LMS10.15388/LMR.2009.52Efficient algorithm for testing goodness-of-fit for classification of high dimensional dataGintautas Jakimauskas0Institute of Mathematics and Informatics Let us have a sample satisfying d-dimensional Gaussian mixture model (d is supposed to be large). The problem of classification of the sample is considered. Because of large dimension it is natural to project the sample to k-dimensional (k = 1,  2, . . .) linear subspaces using projection pursuit method which gives the best selection of these subspaces. Having an estimate of the discriminant subspace we can perform classification using projected sample thus avoiding ’curse of dimensionality’.  An essential step in this method is testing goodness-of-fit of the estimated d-dimensional model assuming that distribution on the complement space is standard Gaussian. We present a simple, data-driven and computationally efficient procedure for testing goodness-of-fit. The procedure is based on well-known interpretation of testing goodness-of-fit as the classification problem, a special sequential data partition procedure, randomization and resampling, elements of sequentialtesting.Monte-Carlosimulations are used to assess the performance of the procedure. https://www.journals.vu.lt/LMR/article/view/17982Gausian mixture modelgoodness-of-fit
collection DOAJ
language English
format Article
sources DOAJ
author Gintautas Jakimauskas
spellingShingle Gintautas Jakimauskas
Efficient algorithm for testing goodness-of-fit for classification of high dimensional data
Lietuvos Matematikos Rinkinys
Gausian mixture model
goodness-of-fit
author_facet Gintautas Jakimauskas
author_sort Gintautas Jakimauskas
title Efficient algorithm for testing goodness-of-fit for classification of high dimensional data
title_short Efficient algorithm for testing goodness-of-fit for classification of high dimensional data
title_full Efficient algorithm for testing goodness-of-fit for classification of high dimensional data
title_fullStr Efficient algorithm for testing goodness-of-fit for classification of high dimensional data
title_full_unstemmed Efficient algorithm for testing goodness-of-fit for classification of high dimensional data
title_sort efficient algorithm for testing goodness-of-fit for classification of high dimensional data
publisher Vilnius University Press
series Lietuvos Matematikos Rinkinys
issn 0132-2818
2335-898X
publishDate 2009-12-01
description Let us have a sample satisfying d-dimensional Gaussian mixture model (d is supposed to be large). The problem of classification of the sample is considered. Because of large dimension it is natural to project the sample to k-dimensional (k = 1,  2, . . .) linear subspaces using projection pursuit method which gives the best selection of these subspaces. Having an estimate of the discriminant subspace we can perform classification using projected sample thus avoiding ’curse of dimensionality’.  An essential step in this method is testing goodness-of-fit of the estimated d-dimensional model assuming that distribution on the complement space is standard Gaussian. We present a simple, data-driven and computationally efficient procedure for testing goodness-of-fit. The procedure is based on well-known interpretation of testing goodness-of-fit as the classification problem, a special sequential data partition procedure, randomization and resampling, elements of sequentialtesting.Monte-Carlosimulations are used to assess the performance of the procedure.
topic Gausian mixture model
goodness-of-fit
url https://www.journals.vu.lt/LMR/article/view/17982
work_keys_str_mv AT gintautasjakimauskas efficientalgorithmfortestinggoodnessoffitforclassificationofhighdimensionaldata
_version_ 1724681473699610624