Time Estimation and Resource Minimization Scheme for Apache Spark and Hadoop Big Data Systems With Failures

Apache Spark and Hadoop are open source frameworks for big data processing, which have been adopted by many companies. In order to implement a reliable big data system that can satisfy processing target completion times, accurate resource provisioning and job execution time estimations are needed. I...

Full description

Bibliographic Details
Main Authors: Jinbae Lee, Bobae Kim, Jong-Moon Chung
Format: Article
Language:English
Published: IEEE 2019-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8605312/
id doaj-384533356e154ee9adfce2a1f2aadfdb
record_format Article
spelling doaj-384533356e154ee9adfce2a1f2aadfdb2021-03-29T22:47:33ZengIEEEIEEE Access2169-35362019-01-0179658966610.1109/ACCESS.2019.28910018605312Time Estimation and Resource Minimization Scheme for Apache Spark and Hadoop Big Data Systems With FailuresJinbae Lee0Bobae Kim1Jong-Moon Chung2https://orcid.org/0000-0002-1652-6635School of Electrical and Electronic Engineering, Yonsei University, Seoul, South KoreaSchool of Electrical and Electronic Engineering, Yonsei University, Seoul, South KoreaSchool of Electrical and Electronic Engineering, Yonsei University, Seoul, South KoreaApache Spark and Hadoop are open source frameworks for big data processing, which have been adopted by many companies. In order to implement a reliable big data system that can satisfy processing target completion times, accurate resource provisioning and job execution time estimations are needed. In this paper, time estimation and resource minimization schemes for Spark and Hadoop systems are presented. The proposed models use the probability of failure in the estimations to more accurately formulate the characteristics of real big data operations. The experimental results show that the proposed Spark adaptive failure-compensation and Hadoop adaptive failure-compensation schemes improve the accuracy of resource provisions by considering failure events, which improves the scheduling success rate of big data processing tasks.https://ieeexplore.ieee.org/document/8605312/Big datafailure probabilityApache Sparkresilient distributed dataset (RDD)Apache HadoopMapReduce
collection DOAJ
language English
format Article
sources DOAJ
author Jinbae Lee
Bobae Kim
Jong-Moon Chung
spellingShingle Jinbae Lee
Bobae Kim
Jong-Moon Chung
Time Estimation and Resource Minimization Scheme for Apache Spark and Hadoop Big Data Systems With Failures
IEEE Access
Big data
failure probability
Apache Spark
resilient distributed dataset (RDD)
Apache Hadoop
MapReduce
author_facet Jinbae Lee
Bobae Kim
Jong-Moon Chung
author_sort Jinbae Lee
title Time Estimation and Resource Minimization Scheme for Apache Spark and Hadoop Big Data Systems With Failures
title_short Time Estimation and Resource Minimization Scheme for Apache Spark and Hadoop Big Data Systems With Failures
title_full Time Estimation and Resource Minimization Scheme for Apache Spark and Hadoop Big Data Systems With Failures
title_fullStr Time Estimation and Resource Minimization Scheme for Apache Spark and Hadoop Big Data Systems With Failures
title_full_unstemmed Time Estimation and Resource Minimization Scheme for Apache Spark and Hadoop Big Data Systems With Failures
title_sort time estimation and resource minimization scheme for apache spark and hadoop big data systems with failures
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2019-01-01
description Apache Spark and Hadoop are open source frameworks for big data processing, which have been adopted by many companies. In order to implement a reliable big data system that can satisfy processing target completion times, accurate resource provisioning and job execution time estimations are needed. In this paper, time estimation and resource minimization schemes for Spark and Hadoop systems are presented. The proposed models use the probability of failure in the estimations to more accurately formulate the characteristics of real big data operations. The experimental results show that the proposed Spark adaptive failure-compensation and Hadoop adaptive failure-compensation schemes improve the accuracy of resource provisions by considering failure events, which improves the scheduling success rate of big data processing tasks.
topic Big data
failure probability
Apache Spark
resilient distributed dataset (RDD)
Apache Hadoop
MapReduce
url https://ieeexplore.ieee.org/document/8605312/
work_keys_str_mv AT jinbaelee timeestimationandresourceminimizationschemeforapachesparkandhadoopbigdatasystemswithfailures
AT bobaekim timeestimationandresourceminimizationschemeforapachesparkandhadoopbigdatasystemswithfailures
AT jongmoonchung timeestimationandresourceminimizationschemeforapachesparkandhadoopbigdatasystemswithfailures
_version_ 1724190770180653056