Upsampling for Improved Multidimensional Attribute Space Clustering of Multifield Data
Clustering algorithms in the high-dimensional space require many data to perform reliably and robustly. For multivariate volume data, it is possible to interpolate between the data points in the high-dimensional attribute space based on their spatial relationship in the volumetric domain (or physica...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2018-06-01
|
Series: | Information |
Subjects: | |
Online Access: | http://www.mdpi.com/2078-2489/9/7/156 |
id |
doaj-c925cced37de4291bf5b099baedd392e |
---|---|
record_format |
Article |
spelling |
doaj-c925cced37de4291bf5b099baedd392e2020-11-24T23:14:19ZengMDPI AGInformation2078-24892018-06-019715610.3390/info9070156info9070156Upsampling for Improved Multidimensional Attribute Space Clustering of Multifield DataVladimir Molchanov0Lars Linsen1Department of Mathematics and Informatics, Westfälische Wilhelms-Universität Münster, 48149 Münster, GermanyDepartment of Mathematics and Informatics, Westfälische Wilhelms-Universität Münster, 48149 Münster, GermanyClustering algorithms in the high-dimensional space require many data to perform reliably and robustly. For multivariate volume data, it is possible to interpolate between the data points in the high-dimensional attribute space based on their spatial relationship in the volumetric domain (or physical space). Thus, sufficiently high number of data points can be generated, overcoming the curse of dimensionality for this particular type of multidimensional data. We applies this idea to a histogram-based clustering algorithm. We created a uniform partition of the attribute space in multidimensional bins and computed a histogram indicating the number of data samples belonging to each bin. Without interpolation, the analysis was highly sensitive to the histogram cell sizes, yielding inaccurate clustering for improper choices: Large histogram cells result in no cluster separation, while clusters fall apart for small cells. Using an interpolation in physical space, we could refine the data by generating additional samples. The depth of the refinement scheme was chosen according to the local data point distribution in attribute space and the histogram’s bin size. In the case of field discontinuities representing sharp material boundaries in the volume data, the interpolation can be adapted to locally make use of a nearest-neighbor interpolation scheme that avoids averaging values across the sharp boundary. Consequently, we could generate a density computation, where clusters stay connected even when using very small bin sizes. We exploited this result to create a robust hierarchical cluster tree, apply our technique to several datasets, and compare the cluster trees before and after interpolation.http://www.mdpi.com/2078-2489/9/7/156multi-dimensional data visualizationmulti-field dataclustering |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Vladimir Molchanov Lars Linsen |
spellingShingle |
Vladimir Molchanov Lars Linsen Upsampling for Improved Multidimensional Attribute Space Clustering of Multifield Data Information multi-dimensional data visualization multi-field data clustering |
author_facet |
Vladimir Molchanov Lars Linsen |
author_sort |
Vladimir Molchanov |
title |
Upsampling for Improved Multidimensional Attribute Space Clustering of Multifield Data |
title_short |
Upsampling for Improved Multidimensional Attribute Space Clustering of Multifield Data |
title_full |
Upsampling for Improved Multidimensional Attribute Space Clustering of Multifield Data |
title_fullStr |
Upsampling for Improved Multidimensional Attribute Space Clustering of Multifield Data |
title_full_unstemmed |
Upsampling for Improved Multidimensional Attribute Space Clustering of Multifield Data |
title_sort |
upsampling for improved multidimensional attribute space clustering of multifield data |
publisher |
MDPI AG |
series |
Information |
issn |
2078-2489 |
publishDate |
2018-06-01 |
description |
Clustering algorithms in the high-dimensional space require many data to perform reliably and robustly. For multivariate volume data, it is possible to interpolate between the data points in the high-dimensional attribute space based on their spatial relationship in the volumetric domain (or physical space). Thus, sufficiently high number of data points can be generated, overcoming the curse of dimensionality for this particular type of multidimensional data. We applies this idea to a histogram-based clustering algorithm. We created a uniform partition of the attribute space in multidimensional bins and computed a histogram indicating the number of data samples belonging to each bin. Without interpolation, the analysis was highly sensitive to the histogram cell sizes, yielding inaccurate clustering for improper choices: Large histogram cells result in no cluster separation, while clusters fall apart for small cells. Using an interpolation in physical space, we could refine the data by generating additional samples. The depth of the refinement scheme was chosen according to the local data point distribution in attribute space and the histogram’s bin size. In the case of field discontinuities representing sharp material boundaries in the volume data, the interpolation can be adapted to locally make use of a nearest-neighbor interpolation scheme that avoids averaging values across the sharp boundary. Consequently, we could generate a density computation, where clusters stay connected even when using very small bin sizes. We exploited this result to create a robust hierarchical cluster tree, apply our technique to several datasets, and compare the cluster trees before and after interpolation. |
topic |
multi-dimensional data visualization multi-field data clustering |
url |
http://www.mdpi.com/2078-2489/9/7/156 |
work_keys_str_mv |
AT vladimirmolchanov upsamplingforimprovedmultidimensionalattributespaceclusteringofmultifielddata AT larslinsen upsamplingforimprovedmultidimensionalattributespaceclusteringofmultifielddata |
_version_ |
1725595069431414784 |