Upsampling for Improved Multidimensional Attribute Space Clustering of Multifield Data

Clustering algorithms in the high-dimensional space require many data to perform reliably and robustly. For multivariate volume data, it is possible to interpolate between the data points in the high-dimensional attribute space based on their spatial relationship in the volumetric domain (or physica...

Full description

Bibliographic Details
Main Authors: Vladimir Molchanov, Lars Linsen
Format: Article
Language:English
Published: MDPI AG 2018-06-01
Series:Information
Subjects:
Online Access:http://www.mdpi.com/2078-2489/9/7/156
id doaj-c925cced37de4291bf5b099baedd392e
record_format Article
spelling doaj-c925cced37de4291bf5b099baedd392e2020-11-24T23:14:19ZengMDPI AGInformation2078-24892018-06-019715610.3390/info9070156info9070156Upsampling for Improved Multidimensional Attribute Space Clustering of Multifield DataVladimir Molchanov0Lars Linsen1Department of Mathematics and Informatics, Westfälische Wilhelms-Universität Münster, 48149 Münster, GermanyDepartment of Mathematics and Informatics, Westfälische Wilhelms-Universität Münster, 48149 Münster, GermanyClustering algorithms in the high-dimensional space require many data to perform reliably and robustly. For multivariate volume data, it is possible to interpolate between the data points in the high-dimensional attribute space based on their spatial relationship in the volumetric domain (or physical space). Thus, sufficiently high number of data points can be generated, overcoming the curse of dimensionality for this particular type of multidimensional data. We applies this idea to a histogram-based clustering algorithm. We created a uniform partition of the attribute space in multidimensional bins and computed a histogram indicating the number of data samples belonging to each bin. Without interpolation, the analysis was highly sensitive to the histogram cell sizes, yielding inaccurate clustering for improper choices: Large histogram cells result in no cluster separation, while clusters fall apart for small cells. Using an interpolation in physical space, we could refine the data by generating additional samples. The depth of the refinement scheme was chosen according to the local data point distribution in attribute space and the histogram’s bin size. In the case of field discontinuities representing sharp material boundaries in the volume data, the interpolation can be adapted to locally make use of a nearest-neighbor interpolation scheme that avoids averaging values across the sharp boundary. Consequently, we could generate a density computation, where clusters stay connected even when using very small bin sizes. We exploited this result to create a robust hierarchical cluster tree, apply our technique to several datasets, and compare the cluster trees before and after interpolation.http://www.mdpi.com/2078-2489/9/7/156multi-dimensional data visualizationmulti-field dataclustering
collection DOAJ
language English
format Article
sources DOAJ
author Vladimir Molchanov
Lars Linsen
spellingShingle Vladimir Molchanov
Lars Linsen
Upsampling for Improved Multidimensional Attribute Space Clustering of Multifield Data
Information
multi-dimensional data visualization
multi-field data
clustering
author_facet Vladimir Molchanov
Lars Linsen
author_sort Vladimir Molchanov
title Upsampling for Improved Multidimensional Attribute Space Clustering of Multifield Data
title_short Upsampling for Improved Multidimensional Attribute Space Clustering of Multifield Data
title_full Upsampling for Improved Multidimensional Attribute Space Clustering of Multifield Data
title_fullStr Upsampling for Improved Multidimensional Attribute Space Clustering of Multifield Data
title_full_unstemmed Upsampling for Improved Multidimensional Attribute Space Clustering of Multifield Data
title_sort upsampling for improved multidimensional attribute space clustering of multifield data
publisher MDPI AG
series Information
issn 2078-2489
publishDate 2018-06-01
description Clustering algorithms in the high-dimensional space require many data to perform reliably and robustly. For multivariate volume data, it is possible to interpolate between the data points in the high-dimensional attribute space based on their spatial relationship in the volumetric domain (or physical space). Thus, sufficiently high number of data points can be generated, overcoming the curse of dimensionality for this particular type of multidimensional data. We applies this idea to a histogram-based clustering algorithm. We created a uniform partition of the attribute space in multidimensional bins and computed a histogram indicating the number of data samples belonging to each bin. Without interpolation, the analysis was highly sensitive to the histogram cell sizes, yielding inaccurate clustering for improper choices: Large histogram cells result in no cluster separation, while clusters fall apart for small cells. Using an interpolation in physical space, we could refine the data by generating additional samples. The depth of the refinement scheme was chosen according to the local data point distribution in attribute space and the histogram’s bin size. In the case of field discontinuities representing sharp material boundaries in the volume data, the interpolation can be adapted to locally make use of a nearest-neighbor interpolation scheme that avoids averaging values across the sharp boundary. Consequently, we could generate a density computation, where clusters stay connected even when using very small bin sizes. We exploited this result to create a robust hierarchical cluster tree, apply our technique to several datasets, and compare the cluster trees before and after interpolation.
topic multi-dimensional data visualization
multi-field data
clustering
url http://www.mdpi.com/2078-2489/9/7/156
work_keys_str_mv AT vladimirmolchanov upsamplingforimprovedmultidimensionalattributespaceclusteringofmultifielddata
AT larslinsen upsamplingforimprovedmultidimensionalattributespaceclusteringofmultifielddata
_version_ 1725595069431414784