Time Estimation and Resource Minimization Scheme for Apache Spark and Hadoop Big Data Systems With Failures
Apache Spark and Hadoop are open source frameworks for big data processing, which have been adopted by many companies. In order to implement a reliable big data system that can satisfy processing target completion times, accurate resource provisioning and job execution time estimations are needed. I...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2019-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/8605312/ |
id |
doaj-384533356e154ee9adfce2a1f2aadfdb |
---|---|
record_format |
Article |
spelling |
doaj-384533356e154ee9adfce2a1f2aadfdb2021-03-29T22:47:33ZengIEEEIEEE Access2169-35362019-01-0179658966610.1109/ACCESS.2019.28910018605312Time Estimation and Resource Minimization Scheme for Apache Spark and Hadoop Big Data Systems With FailuresJinbae Lee0Bobae Kim1Jong-Moon Chung2https://orcid.org/0000-0002-1652-6635School of Electrical and Electronic Engineering, Yonsei University, Seoul, South KoreaSchool of Electrical and Electronic Engineering, Yonsei University, Seoul, South KoreaSchool of Electrical and Electronic Engineering, Yonsei University, Seoul, South KoreaApache Spark and Hadoop are open source frameworks for big data processing, which have been adopted by many companies. In order to implement a reliable big data system that can satisfy processing target completion times, accurate resource provisioning and job execution time estimations are needed. In this paper, time estimation and resource minimization schemes for Spark and Hadoop systems are presented. The proposed models use the probability of failure in the estimations to more accurately formulate the characteristics of real big data operations. The experimental results show that the proposed Spark adaptive failure-compensation and Hadoop adaptive failure-compensation schemes improve the accuracy of resource provisions by considering failure events, which improves the scheduling success rate of big data processing tasks.https://ieeexplore.ieee.org/document/8605312/Big datafailure probabilityApache Sparkresilient distributed dataset (RDD)Apache HadoopMapReduce |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Jinbae Lee Bobae Kim Jong-Moon Chung |
spellingShingle |
Jinbae Lee Bobae Kim Jong-Moon Chung Time Estimation and Resource Minimization Scheme for Apache Spark and Hadoop Big Data Systems With Failures IEEE Access Big data failure probability Apache Spark resilient distributed dataset (RDD) Apache Hadoop MapReduce |
author_facet |
Jinbae Lee Bobae Kim Jong-Moon Chung |
author_sort |
Jinbae Lee |
title |
Time Estimation and Resource Minimization Scheme for Apache Spark and Hadoop Big Data Systems With Failures |
title_short |
Time Estimation and Resource Minimization Scheme for Apache Spark and Hadoop Big Data Systems With Failures |
title_full |
Time Estimation and Resource Minimization Scheme for Apache Spark and Hadoop Big Data Systems With Failures |
title_fullStr |
Time Estimation and Resource Minimization Scheme for Apache Spark and Hadoop Big Data Systems With Failures |
title_full_unstemmed |
Time Estimation and Resource Minimization Scheme for Apache Spark and Hadoop Big Data Systems With Failures |
title_sort |
time estimation and resource minimization scheme for apache spark and hadoop big data systems with failures |
publisher |
IEEE |
series |
IEEE Access |
issn |
2169-3536 |
publishDate |
2019-01-01 |
description |
Apache Spark and Hadoop are open source frameworks for big data processing, which have been adopted by many companies. In order to implement a reliable big data system that can satisfy processing target completion times, accurate resource provisioning and job execution time estimations are needed. In this paper, time estimation and resource minimization schemes for Spark and Hadoop systems are presented. The proposed models use the probability of failure in the estimations to more accurately formulate the characteristics of real big data operations. The experimental results show that the proposed Spark adaptive failure-compensation and Hadoop adaptive failure-compensation schemes improve the accuracy of resource provisions by considering failure events, which improves the scheduling success rate of big data processing tasks. |
topic |
Big data failure probability Apache Spark resilient distributed dataset (RDD) Apache Hadoop MapReduce |
url |
https://ieeexplore.ieee.org/document/8605312/ |
work_keys_str_mv |
AT jinbaelee timeestimationandresourceminimizationschemeforapachesparkandhadoopbigdatasystemswithfailures AT bobaekim timeestimationandresourceminimizationschemeforapachesparkandhadoopbigdatasystemswithfailures AT jongmoonchung timeestimationandresourceminimizationschemeforapachesparkandhadoopbigdatasystemswithfailures |
_version_ |
1724190770180653056 |