A Matrix Iteration Algorithm With Pruning for Pinpointing Multivariate Correlations From High Dimensional Data Sets
There are a few dependent multivariate relationships among high dimensional data sets. Then how to identify these dependent variables from high dimensional data sets is an important issue for data analysis. Now, the most frequently used method is the enumeration method, that is all multivariate rela...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2021-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9300253/ |
id |
doaj-f7586a93cff240b9b1d03d1efca4cb3f |
---|---|
record_format |
Article |
spelling |
doaj-f7586a93cff240b9b1d03d1efca4cb3f2021-03-30T15:16:45ZengIEEEIEEE Access2169-35362021-01-0195150516510.1109/ACCESS.2020.30461319300253A Matrix Iteration Algorithm With Pruning for Pinpointing Multivariate Correlations From High Dimensional Data SetsFubo Shao0https://orcid.org/0000-0001-7720-902XZhiqiang Hou1https://orcid.org/0000-0001-5925-8063Limin Jia2Zhe Zhang3State Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University, Beijing, ChinaChina Waterborne Transport Research Institute, Beijing, ChinaState Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University, Beijing, ChinaState Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University, Beijing, ChinaThere are a few dependent multivariate relationships among high dimensional data sets. Then how to identify these dependent variables from high dimensional data sets is an important issue for data analysis. Now, the most frequently used method is the enumeration method, that is all multivariate relationships in the high dimensional data sets are examined. However, the time complexity of the enumeration method is exponential (2<sup>n</sup>) and the calculation load is very heavy when the dimension is high. Aiming at solving this problem, the matrix iteration algorithm with pruning (MIP) is proposed for pinpointing multivariate dependent relationships in high dimensional data sets without examining all multivariate relationships. Some not dependent relationships are ignored without examined by the pruning process of the proposed MIP and the computing burden is reduced. The maximal information coefficient (MIC) is adopted as the measure of correlations in the proposed MIP algorithm due to the excellent properties, generality and equitability, of MIC. In the case of the data set with 5 variables, more than 50% multivariate relationships are pruned without examining. Numerical experiments also show that the calculating burden is greatly reduced. Compared to the enumeration method, 82.5% calculating time and 98.5% calculating times of multivariate relationships are saved for the data set with two dependent multivariate relationships among 30 variables in the experiment. The proposed MIP algorithm is effective for pinpointing multivariate dependent relationships from data sets with high dimensions.https://ieeexplore.ieee.org/document/9300253/Correlationhigh dimensionsmaximal information coefficientpruning algorithm |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Fubo Shao Zhiqiang Hou Limin Jia Zhe Zhang |
spellingShingle |
Fubo Shao Zhiqiang Hou Limin Jia Zhe Zhang A Matrix Iteration Algorithm With Pruning for Pinpointing Multivariate Correlations From High Dimensional Data Sets IEEE Access Correlation high dimensions maximal information coefficient pruning algorithm |
author_facet |
Fubo Shao Zhiqiang Hou Limin Jia Zhe Zhang |
author_sort |
Fubo Shao |
title |
A Matrix Iteration Algorithm With Pruning for Pinpointing Multivariate Correlations From High Dimensional Data Sets |
title_short |
A Matrix Iteration Algorithm With Pruning for Pinpointing Multivariate Correlations From High Dimensional Data Sets |
title_full |
A Matrix Iteration Algorithm With Pruning for Pinpointing Multivariate Correlations From High Dimensional Data Sets |
title_fullStr |
A Matrix Iteration Algorithm With Pruning for Pinpointing Multivariate Correlations From High Dimensional Data Sets |
title_full_unstemmed |
A Matrix Iteration Algorithm With Pruning for Pinpointing Multivariate Correlations From High Dimensional Data Sets |
title_sort |
matrix iteration algorithm with pruning for pinpointing multivariate correlations from high dimensional data sets |
publisher |
IEEE |
series |
IEEE Access |
issn |
2169-3536 |
publishDate |
2021-01-01 |
description |
There are a few dependent multivariate relationships among high dimensional data sets. Then how to identify these dependent variables from high dimensional data sets is an important issue for data analysis. Now, the most frequently used method is the enumeration method, that is all multivariate relationships in the high dimensional data sets are examined. However, the time complexity of the enumeration method is exponential (2<sup>n</sup>) and the calculation load is very heavy when the dimension is high. Aiming at solving this problem, the matrix iteration algorithm with pruning (MIP) is proposed for pinpointing multivariate dependent relationships in high dimensional data sets without examining all multivariate relationships. Some not dependent relationships are ignored without examined by the pruning process of the proposed MIP and the computing burden is reduced. The maximal information coefficient (MIC) is adopted as the measure of correlations in the proposed MIP algorithm due to the excellent properties, generality and equitability, of MIC. In the case of the data set with 5 variables, more than 50% multivariate relationships are pruned without examining. Numerical experiments also show that the calculating burden is greatly reduced. Compared to the enumeration method, 82.5% calculating time and 98.5% calculating times of multivariate relationships are saved for the data set with two dependent multivariate relationships among 30 variables in the experiment. The proposed MIP algorithm is effective for pinpointing multivariate dependent relationships from data sets with high dimensions. |
topic |
Correlation high dimensions maximal information coefficient pruning algorithm |
url |
https://ieeexplore.ieee.org/document/9300253/ |
work_keys_str_mv |
AT fuboshao amatrixiterationalgorithmwithpruningforpinpointingmultivariatecorrelationsfromhighdimensionaldatasets AT zhiqianghou amatrixiterationalgorithmwithpruningforpinpointingmultivariatecorrelationsfromhighdimensionaldatasets AT liminjia amatrixiterationalgorithmwithpruningforpinpointingmultivariatecorrelationsfromhighdimensionaldatasets AT zhezhang amatrixiterationalgorithmwithpruningforpinpointingmultivariatecorrelationsfromhighdimensionaldatasets AT fuboshao matrixiterationalgorithmwithpruningforpinpointingmultivariatecorrelationsfromhighdimensionaldatasets AT zhiqianghou matrixiterationalgorithmwithpruningforpinpointingmultivariatecorrelationsfromhighdimensionaldatasets AT liminjia matrixiterationalgorithmwithpruningforpinpointingmultivariatecorrelationsfromhighdimensionaldatasets AT zhezhang matrixiterationalgorithmwithpruningforpinpointingmultivariatecorrelationsfromhighdimensionaldatasets |
_version_ |
1724179821826670592 |