Efficient algorithm for testing goodness-of-fit for classification of high dimensional data
Let us have a sample satisfying d-dimensional Gaussian mixture model (d is supposed to be large). The problem of classification of the sample is considered. Because of large dimension it is natural to project the sample to k-dimensional (k = 1, 2, . . .) linear subspaces using projection pursuit m...
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
Vilnius University Press
2009-12-01
|
Series: | Lietuvos Matematikos Rinkinys |
Subjects: | |
Online Access: | https://www.journals.vu.lt/LMR/article/view/17982 |
id |
doaj-9d6c4612ae3d41e4aecd9bc551a8d9b5 |
---|---|
record_format |
Article |
spelling |
doaj-9d6c4612ae3d41e4aecd9bc551a8d9b52020-11-25T03:04:30ZengVilnius University PressLietuvos Matematikos Rinkinys0132-28182335-898X2009-12-0150proc. LMS10.15388/LMR.2009.52Efficient algorithm for testing goodness-of-fit for classification of high dimensional dataGintautas Jakimauskas0Institute of Mathematics and Informatics Let us have a sample satisfying d-dimensional Gaussian mixture model (d is supposed to be large). The problem of classification of the sample is considered. Because of large dimension it is natural to project the sample to k-dimensional (k = 1, 2, . . .) linear subspaces using projection pursuit method which gives the best selection of these subspaces. Having an estimate of the discriminant subspace we can perform classification using projected sample thus avoiding ’curse of dimensionality’. An essential step in this method is testing goodness-of-fit of the estimated d-dimensional model assuming that distribution on the complement space is standard Gaussian. We present a simple, data-driven and computationally efficient procedure for testing goodness-of-fit. The procedure is based on well-known interpretation of testing goodness-of-fit as the classification problem, a special sequential data partition procedure, randomization and resampling, elements of sequentialtesting.Monte-Carlosimulations are used to assess the performance of the procedure. https://www.journals.vu.lt/LMR/article/view/17982Gausian mixture modelgoodness-of-fit |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Gintautas Jakimauskas |
spellingShingle |
Gintautas Jakimauskas Efficient algorithm for testing goodness-of-fit for classification of high dimensional data Lietuvos Matematikos Rinkinys Gausian mixture model goodness-of-fit |
author_facet |
Gintautas Jakimauskas |
author_sort |
Gintautas Jakimauskas |
title |
Efficient algorithm for testing goodness-of-fit for classification of high dimensional data |
title_short |
Efficient algorithm for testing goodness-of-fit for classification of high dimensional data |
title_full |
Efficient algorithm for testing goodness-of-fit for classification of high dimensional data |
title_fullStr |
Efficient algorithm for testing goodness-of-fit for classification of high dimensional data |
title_full_unstemmed |
Efficient algorithm for testing goodness-of-fit for classification of high dimensional data |
title_sort |
efficient algorithm for testing goodness-of-fit for classification of high dimensional data |
publisher |
Vilnius University Press |
series |
Lietuvos Matematikos Rinkinys |
issn |
0132-2818 2335-898X |
publishDate |
2009-12-01 |
description |
Let us have a sample satisfying d-dimensional Gaussian mixture model (d is supposed to be large). The problem of classification of the sample is considered. Because of large dimension it is natural to project the sample to k-dimensional (k = 1, 2, . . .) linear subspaces using projection pursuit method which gives the best selection of these subspaces. Having an estimate of the discriminant subspace we can perform classification using projected sample thus avoiding ’curse of dimensionality’. An essential step in this method is testing goodness-of-fit of the estimated d-dimensional model assuming that distribution on the complement space is standard Gaussian. We present a simple, data-driven and computationally efficient procedure for testing goodness-of-fit. The procedure is based on well-known interpretation of testing goodness-of-fit as the classification problem, a special sequential data partition procedure, randomization and resampling, elements of sequentialtesting.Monte-Carlosimulations are used to assess the performance of the procedure.
|
topic |
Gausian mixture model goodness-of-fit |
url |
https://www.journals.vu.lt/LMR/article/view/17982 |
work_keys_str_mv |
AT gintautasjakimauskas efficientalgorithmfortestinggoodnessoffitforclassificationofhighdimensionaldata |
_version_ |
1724681473699610624 |