Extracting Information from Interval Data Using Symbolic Principal Component Analysis

We introduce generic definitions of symbolic variance and covariance for random interval-valued variables, that lead to a unified and insightful interpretation of four known symbolic principal component estimation methods: CPCA, VPCA, CIPCA, and SymCovPCA. Moreover, we propose the use of truncated v...

Full description

Bibliographic Details
Main Authors: M. R. Oliveira, M. Vilela, A. Pacheco, Rui Valadas, Paulo Salvador
Format: Article
Language:English
Published: Austrian Statistical Society 2017-04-01
Series:Austrian Journal of Statistics
Online Access:http://www.ajs.or.at/index.php/ajs/article/view/673
Description
Summary:We introduce generic definitions of symbolic variance and covariance for random interval-valued variables, that lead to a unified and insightful interpretation of four known symbolic principal component estimation methods: CPCA, VPCA, CIPCA, and SymCovPCA. Moreover, we propose the use of truncated versions of symbolic principal components, that use a strict subset of the original symbolic variables, as a way to improve the interpretation of symbolic principal components. Furthermore, the analysis of a real dataset leads to a meaningful characterization of Internet traffic applications, while highligting similarities between the symbolic principal component estimation methods considered in the paper.
ISSN:1026-597X