Hadoop Configuration Tuning With Ensemble Modeling and Metaheuristic Optimization

MapReduce is a popular programming model for big data processing. Although the distributed processing framework Hadoop greatly reduced the development complexity of MapReduce applications, fine tuning of the Hadoop systems for optimal performance remains a major challenge. Configuration tuning is on...

Full description

Bibliographic Details
Main Authors: Xingcheng Hua, Michael C. Huang, Peng Liu
Format: Article
Language:English
Published: IEEE 2018-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8416665/
id doaj-99ef53d7999246cd855dc51c3844cf24
record_format Article
spelling doaj-99ef53d7999246cd855dc51c3844cf242021-03-29T20:50:53ZengIEEEIEEE Access2169-35362018-01-016441614417410.1109/ACCESS.2018.28578528416665Hadoop Configuration Tuning With Ensemble Modeling and Metaheuristic OptimizationXingcheng Hua0https://orcid.org/0000-0003-2428-5787Michael C. Huang1Peng Liu2https://orcid.org/0000-0001-9107-6673College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, ChinaDepartment of Electrical and Computer Engineering, University of Rochester, Rochester, NY, USACollege of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, ChinaMapReduce is a popular programming model for big data processing. Although the distributed processing framework Hadoop greatly reduced the development complexity of MapReduce applications, fine tuning of the Hadoop systems for optimal performance remains a major challenge. Configuration tuning is one of the most effective means to improve the performance of MapReduce applications on Hadoop systems, which invariably adopt the default configuration. However, the huge Hadoop configuration parameter space makes it impractical to explore the parameter combinations exhaustively. In this paper, we propose HTune, an effective Hadoop configuration tuning approach for MapReduce applications. We design a nonintrusive performance profiler whose runtime overhead remains less than 2%, to capture the runtime details of the MapReduce applications and generate their performance evaluations. Based on the performance profiles, a two-level fusion model is constructed based on ensemble modeling for each application in the execution predictor, considering both Hadoop configuration, and input data size. Leveraging the execution predictor, a metaheuristic-based configuration optimizer is able to search for the optimal configuration for a given application. Experimental results demonstrate that the optimal Hadoop configuration is often application-specific and data-specific, and it is more suitable to take all relevant configuration parameters into consideration and optimize them together. H-Tune improves the performance of the MapReduce applications by factors of 1.5× and 9.6× on average, respectively, over the state-of-the-art approach and the default configuration.https://ieeexplore.ieee.org/document/8416665/Ensemble modelingHadoop configurationMapReducemetaheuristic optimizationperformance tuning
collection DOAJ
language English
format Article
sources DOAJ
author Xingcheng Hua
Michael C. Huang
Peng Liu
spellingShingle Xingcheng Hua
Michael C. Huang
Peng Liu
Hadoop Configuration Tuning With Ensemble Modeling and Metaheuristic Optimization
IEEE Access
Ensemble modeling
Hadoop configuration
MapReduce
metaheuristic optimization
performance tuning
author_facet Xingcheng Hua
Michael C. Huang
Peng Liu
author_sort Xingcheng Hua
title Hadoop Configuration Tuning With Ensemble Modeling and Metaheuristic Optimization
title_short Hadoop Configuration Tuning With Ensemble Modeling and Metaheuristic Optimization
title_full Hadoop Configuration Tuning With Ensemble Modeling and Metaheuristic Optimization
title_fullStr Hadoop Configuration Tuning With Ensemble Modeling and Metaheuristic Optimization
title_full_unstemmed Hadoop Configuration Tuning With Ensemble Modeling and Metaheuristic Optimization
title_sort hadoop configuration tuning with ensemble modeling and metaheuristic optimization
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2018-01-01
description MapReduce is a popular programming model for big data processing. Although the distributed processing framework Hadoop greatly reduced the development complexity of MapReduce applications, fine tuning of the Hadoop systems for optimal performance remains a major challenge. Configuration tuning is one of the most effective means to improve the performance of MapReduce applications on Hadoop systems, which invariably adopt the default configuration. However, the huge Hadoop configuration parameter space makes it impractical to explore the parameter combinations exhaustively. In this paper, we propose HTune, an effective Hadoop configuration tuning approach for MapReduce applications. We design a nonintrusive performance profiler whose runtime overhead remains less than 2%, to capture the runtime details of the MapReduce applications and generate their performance evaluations. Based on the performance profiles, a two-level fusion model is constructed based on ensemble modeling for each application in the execution predictor, considering both Hadoop configuration, and input data size. Leveraging the execution predictor, a metaheuristic-based configuration optimizer is able to search for the optimal configuration for a given application. Experimental results demonstrate that the optimal Hadoop configuration is often application-specific and data-specific, and it is more suitable to take all relevant configuration parameters into consideration and optimize them together. H-Tune improves the performance of the MapReduce applications by factors of 1.5× and 9.6× on average, respectively, over the state-of-the-art approach and the default configuration.
topic Ensemble modeling
Hadoop configuration
MapReduce
metaheuristic optimization
performance tuning
url https://ieeexplore.ieee.org/document/8416665/
work_keys_str_mv AT xingchenghua hadoopconfigurationtuningwithensemblemodelingandmetaheuristicoptimization
AT michaelchuang hadoopconfigurationtuningwithensemblemodelingandmetaheuristicoptimization
AT pengliu hadoopconfigurationtuningwithensemblemodelingandmetaheuristicoptimization
_version_ 1724194067488702464