A Matrix Iteration Algorithm With Pruning for Pinpointing Multivariate Correlations From High Dimensional Data Sets

There are a few dependent multivariate relationships among high dimensional data sets. Then how to identify these dependent variables from high dimensional data sets is an important issue for data analysis. Now, the most frequently used method is the enumeration method, that is all multivariate rela...

Full description

Bibliographic Details
Main Authors: Fubo Shao, Zhiqiang Hou, Limin Jia, Zhe Zhang
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9300253/
id doaj-f7586a93cff240b9b1d03d1efca4cb3f
record_format Article
spelling doaj-f7586a93cff240b9b1d03d1efca4cb3f2021-03-30T15:16:45ZengIEEEIEEE Access2169-35362021-01-0195150516510.1109/ACCESS.2020.30461319300253A Matrix Iteration Algorithm With Pruning for Pinpointing Multivariate Correlations From High Dimensional Data SetsFubo Shao0https://orcid.org/0000-0001-7720-902XZhiqiang Hou1https://orcid.org/0000-0001-5925-8063Limin Jia2Zhe Zhang3State Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University, Beijing, ChinaChina Waterborne Transport Research Institute, Beijing, ChinaState Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University, Beijing, ChinaState Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University, Beijing, ChinaThere are a few dependent multivariate relationships among high dimensional data sets. Then how to identify these dependent variables from high dimensional data sets is an important issue for data analysis. Now, the most frequently used method is the enumeration method, that is all multivariate relationships in the high dimensional data sets are examined. However, the time complexity of the enumeration method is exponential (2<sup>n</sup>) and the calculation load is very heavy when the dimension is high. Aiming at solving this problem, the matrix iteration algorithm with pruning (MIP) is proposed for pinpointing multivariate dependent relationships in high dimensional data sets without examining all multivariate relationships. Some not dependent relationships are ignored without examined by the pruning process of the proposed MIP and the computing burden is reduced. The maximal information coefficient (MIC) is adopted as the measure of correlations in the proposed MIP algorithm due to the excellent properties, generality and equitability, of MIC. In the case of the data set with 5 variables, more than 50% multivariate relationships are pruned without examining. Numerical experiments also show that the calculating burden is greatly reduced. Compared to the enumeration method, 82.5% calculating time and 98.5% calculating times of multivariate relationships are saved for the data set with two dependent multivariate relationships among 30 variables in the experiment. The proposed MIP algorithm is effective for pinpointing multivariate dependent relationships from data sets with high dimensions.https://ieeexplore.ieee.org/document/9300253/Correlationhigh dimensionsmaximal information coefficientpruning algorithm
collection DOAJ
language English
format Article
sources DOAJ
author Fubo Shao
Zhiqiang Hou
Limin Jia
Zhe Zhang
spellingShingle Fubo Shao
Zhiqiang Hou
Limin Jia
Zhe Zhang
A Matrix Iteration Algorithm With Pruning for Pinpointing Multivariate Correlations From High Dimensional Data Sets
IEEE Access
Correlation
high dimensions
maximal information coefficient
pruning algorithm
author_facet Fubo Shao
Zhiqiang Hou
Limin Jia
Zhe Zhang
author_sort Fubo Shao
title A Matrix Iteration Algorithm With Pruning for Pinpointing Multivariate Correlations From High Dimensional Data Sets
title_short A Matrix Iteration Algorithm With Pruning for Pinpointing Multivariate Correlations From High Dimensional Data Sets
title_full A Matrix Iteration Algorithm With Pruning for Pinpointing Multivariate Correlations From High Dimensional Data Sets
title_fullStr A Matrix Iteration Algorithm With Pruning for Pinpointing Multivariate Correlations From High Dimensional Data Sets
title_full_unstemmed A Matrix Iteration Algorithm With Pruning for Pinpointing Multivariate Correlations From High Dimensional Data Sets
title_sort matrix iteration algorithm with pruning for pinpointing multivariate correlations from high dimensional data sets
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2021-01-01
description There are a few dependent multivariate relationships among high dimensional data sets. Then how to identify these dependent variables from high dimensional data sets is an important issue for data analysis. Now, the most frequently used method is the enumeration method, that is all multivariate relationships in the high dimensional data sets are examined. However, the time complexity of the enumeration method is exponential (2<sup>n</sup>) and the calculation load is very heavy when the dimension is high. Aiming at solving this problem, the matrix iteration algorithm with pruning (MIP) is proposed for pinpointing multivariate dependent relationships in high dimensional data sets without examining all multivariate relationships. Some not dependent relationships are ignored without examined by the pruning process of the proposed MIP and the computing burden is reduced. The maximal information coefficient (MIC) is adopted as the measure of correlations in the proposed MIP algorithm due to the excellent properties, generality and equitability, of MIC. In the case of the data set with 5 variables, more than 50% multivariate relationships are pruned without examining. Numerical experiments also show that the calculating burden is greatly reduced. Compared to the enumeration method, 82.5% calculating time and 98.5% calculating times of multivariate relationships are saved for the data set with two dependent multivariate relationships among 30 variables in the experiment. The proposed MIP algorithm is effective for pinpointing multivariate dependent relationships from data sets with high dimensions.
topic Correlation
high dimensions
maximal information coefficient
pruning algorithm
url https://ieeexplore.ieee.org/document/9300253/
work_keys_str_mv AT fuboshao amatrixiterationalgorithmwithpruningforpinpointingmultivariatecorrelationsfromhighdimensionaldatasets
AT zhiqianghou amatrixiterationalgorithmwithpruningforpinpointingmultivariatecorrelationsfromhighdimensionaldatasets
AT liminjia amatrixiterationalgorithmwithpruningforpinpointingmultivariatecorrelationsfromhighdimensionaldatasets
AT zhezhang amatrixiterationalgorithmwithpruningforpinpointingmultivariatecorrelationsfromhighdimensionaldatasets
AT fuboshao matrixiterationalgorithmwithpruningforpinpointingmultivariatecorrelationsfromhighdimensionaldatasets
AT zhiqianghou matrixiterationalgorithmwithpruningforpinpointingmultivariatecorrelationsfromhighdimensionaldatasets
AT liminjia matrixiterationalgorithmwithpruningforpinpointingmultivariatecorrelationsfromhighdimensionaldatasets
AT zhezhang matrixiterationalgorithmwithpruningforpinpointingmultivariatecorrelationsfromhighdimensionaldatasets
_version_ 1724179821826670592