Interactive Data Exploration of Distributed Raw Files: A Systematic Mapping Study

When exploring big amounts of data without a clear target, providing an interactive experience becomes really difficult, since this tentative inspection usually defeats any early decision on data structures or indexing strategies. This is also true in the physics domain, specifically in high-energy...

Full description

Bibliographic Details
Main Authors: Alejandro Alvarez-Ayllon, Manuel Palomo-Duarte, Juan-Manuel Dodero
Format: Article
Language:English
Published: IEEE 2019-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8540356/
id doaj-6cc5213b116c4310992485d1ee7e4914
record_format Article
spelling doaj-6cc5213b116c4310992485d1ee7e49142021-03-29T22:47:30ZengIEEEIEEE Access2169-35362019-01-017106911071710.1109/ACCESS.2018.28822448540356Interactive Data Exploration of Distributed Raw Files: A Systematic Mapping StudyAlejandro Alvarez-Ayllon0https://orcid.org/0000-0002-1353-7929Manuel Palomo-Duarte1Juan-Manuel Dodero2https://orcid.org/0000-0002-4105-5679Geneva Observatory, University of Geneva, Geneva, SwitzerlandDepartment of Computer Science and Engineering, University of Cádiz, Cádiz, SpainDepartment of Computer Science and Engineering, University of Cádiz, Cádiz, SpainWhen exploring big amounts of data without a clear target, providing an interactive experience becomes really difficult, since this tentative inspection usually defeats any early decision on data structures or indexing strategies. This is also true in the physics domain, specifically in high-energy physics, where the huge volume of data generated by the detectors are normally explored via C++ code using batch processing, which introduces a considerable latency. An interactive tool, when integrated into the existing data management systems, can add a great value to the usability of these platforms. Here, we intend to review the current state-of-the-art of interactive data exploration, aiming at satisfying three requirements: access to raw data files, stored in a distributed environment, and with a reasonably low latency. This paper follows the guidelines for systematic mapping studies, which is well suited for gathering and classifying available studies. We summarize the results after classifying the 242 papers that passed our inclusion criteria. While there are many proposed solutions that tackle the problem in different manners, there is little evidence available about their implementation in practice. Almost all of the solutions found by this paper cover a subset of our requirements, with only one partially satisfying the three. The solutions for data exploration abound. It is an active research area and, considering the continuous growth of data volume and variety, is only to become harder. There is a niche for research on a solution that covers our requirements, and the required building blocks are there.https://ieeexplore.ieee.org/document/8540356/Big data applicationsdata analysisdata engineeringdata explorationdatabase systemsinteractive systems
collection DOAJ
language English
format Article
sources DOAJ
author Alejandro Alvarez-Ayllon
Manuel Palomo-Duarte
Juan-Manuel Dodero
spellingShingle Alejandro Alvarez-Ayllon
Manuel Palomo-Duarte
Juan-Manuel Dodero
Interactive Data Exploration of Distributed Raw Files: A Systematic Mapping Study
IEEE Access
Big data applications
data analysis
data engineering
data exploration
database systems
interactive systems
author_facet Alejandro Alvarez-Ayllon
Manuel Palomo-Duarte
Juan-Manuel Dodero
author_sort Alejandro Alvarez-Ayllon
title Interactive Data Exploration of Distributed Raw Files: A Systematic Mapping Study
title_short Interactive Data Exploration of Distributed Raw Files: A Systematic Mapping Study
title_full Interactive Data Exploration of Distributed Raw Files: A Systematic Mapping Study
title_fullStr Interactive Data Exploration of Distributed Raw Files: A Systematic Mapping Study
title_full_unstemmed Interactive Data Exploration of Distributed Raw Files: A Systematic Mapping Study
title_sort interactive data exploration of distributed raw files: a systematic mapping study
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2019-01-01
description When exploring big amounts of data without a clear target, providing an interactive experience becomes really difficult, since this tentative inspection usually defeats any early decision on data structures or indexing strategies. This is also true in the physics domain, specifically in high-energy physics, where the huge volume of data generated by the detectors are normally explored via C++ code using batch processing, which introduces a considerable latency. An interactive tool, when integrated into the existing data management systems, can add a great value to the usability of these platforms. Here, we intend to review the current state-of-the-art of interactive data exploration, aiming at satisfying three requirements: access to raw data files, stored in a distributed environment, and with a reasonably low latency. This paper follows the guidelines for systematic mapping studies, which is well suited for gathering and classifying available studies. We summarize the results after classifying the 242 papers that passed our inclusion criteria. While there are many proposed solutions that tackle the problem in different manners, there is little evidence available about their implementation in practice. Almost all of the solutions found by this paper cover a subset of our requirements, with only one partially satisfying the three. The solutions for data exploration abound. It is an active research area and, considering the continuous growth of data volume and variety, is only to become harder. There is a niche for research on a solution that covers our requirements, and the required building blocks are there.
topic Big data applications
data analysis
data engineering
data exploration
database systems
interactive systems
url https://ieeexplore.ieee.org/document/8540356/
work_keys_str_mv AT alejandroalvarezayllon interactivedataexplorationofdistributedrawfilesasystematicmappingstudy
AT manuelpalomoduarte interactivedataexplorationofdistributedrawfilesasystematicmappingstudy
AT juanmanueldodero interactivedataexplorationofdistributedrawfilesasystematicmappingstudy
_version_ 1724190917656576000