Hadoop Configuration Tuning With Ensemble Modeling and Metaheuristic Optimization
MapReduce is a popular programming model for big data processing. Although the distributed processing framework Hadoop greatly reduced the development complexity of MapReduce applications, fine tuning of the Hadoop systems for optimal performance remains a major challenge. Configuration tuning is on...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2018-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/8416665/ |
id |
doaj-99ef53d7999246cd855dc51c3844cf24 |
---|---|
record_format |
Article |
spelling |
doaj-99ef53d7999246cd855dc51c3844cf242021-03-29T20:50:53ZengIEEEIEEE Access2169-35362018-01-016441614417410.1109/ACCESS.2018.28578528416665Hadoop Configuration Tuning With Ensemble Modeling and Metaheuristic OptimizationXingcheng Hua0https://orcid.org/0000-0003-2428-5787Michael C. Huang1Peng Liu2https://orcid.org/0000-0001-9107-6673College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, ChinaDepartment of Electrical and Computer Engineering, University of Rochester, Rochester, NY, USACollege of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, ChinaMapReduce is a popular programming model for big data processing. Although the distributed processing framework Hadoop greatly reduced the development complexity of MapReduce applications, fine tuning of the Hadoop systems for optimal performance remains a major challenge. Configuration tuning is one of the most effective means to improve the performance of MapReduce applications on Hadoop systems, which invariably adopt the default configuration. However, the huge Hadoop configuration parameter space makes it impractical to explore the parameter combinations exhaustively. In this paper, we propose HTune, an effective Hadoop configuration tuning approach for MapReduce applications. We design a nonintrusive performance profiler whose runtime overhead remains less than 2%, to capture the runtime details of the MapReduce applications and generate their performance evaluations. Based on the performance profiles, a two-level fusion model is constructed based on ensemble modeling for each application in the execution predictor, considering both Hadoop configuration, and input data size. Leveraging the execution predictor, a metaheuristic-based configuration optimizer is able to search for the optimal configuration for a given application. Experimental results demonstrate that the optimal Hadoop configuration is often application-specific and data-specific, and it is more suitable to take all relevant configuration parameters into consideration and optimize them together. H-Tune improves the performance of the MapReduce applications by factors of 1.5× and 9.6× on average, respectively, over the state-of-the-art approach and the default configuration.https://ieeexplore.ieee.org/document/8416665/Ensemble modelingHadoop configurationMapReducemetaheuristic optimizationperformance tuning |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Xingcheng Hua Michael C. Huang Peng Liu |
spellingShingle |
Xingcheng Hua Michael C. Huang Peng Liu Hadoop Configuration Tuning With Ensemble Modeling and Metaheuristic Optimization IEEE Access Ensemble modeling Hadoop configuration MapReduce metaheuristic optimization performance tuning |
author_facet |
Xingcheng Hua Michael C. Huang Peng Liu |
author_sort |
Xingcheng Hua |
title |
Hadoop Configuration Tuning With Ensemble Modeling and Metaheuristic Optimization |
title_short |
Hadoop Configuration Tuning With Ensemble Modeling and Metaheuristic Optimization |
title_full |
Hadoop Configuration Tuning With Ensemble Modeling and Metaheuristic Optimization |
title_fullStr |
Hadoop Configuration Tuning With Ensemble Modeling and Metaheuristic Optimization |
title_full_unstemmed |
Hadoop Configuration Tuning With Ensemble Modeling and Metaheuristic Optimization |
title_sort |
hadoop configuration tuning with ensemble modeling and metaheuristic optimization |
publisher |
IEEE |
series |
IEEE Access |
issn |
2169-3536 |
publishDate |
2018-01-01 |
description |
MapReduce is a popular programming model for big data processing. Although the distributed processing framework Hadoop greatly reduced the development complexity of MapReduce applications, fine tuning of the Hadoop systems for optimal performance remains a major challenge. Configuration tuning is one of the most effective means to improve the performance of MapReduce applications on Hadoop systems, which invariably adopt the default configuration. However, the huge Hadoop configuration parameter space makes it impractical to explore the parameter combinations exhaustively. In this paper, we propose HTune, an effective Hadoop configuration tuning approach for MapReduce applications. We design a nonintrusive performance profiler whose runtime overhead remains less than 2%, to capture the runtime details of the MapReduce applications and generate their performance evaluations. Based on the performance profiles, a two-level fusion model is constructed based on ensemble modeling for each application in the execution predictor, considering both Hadoop configuration, and input data size. Leveraging the execution predictor, a metaheuristic-based configuration optimizer is able to search for the optimal configuration for a given application. Experimental results demonstrate that the optimal Hadoop configuration is often application-specific and data-specific, and it is more suitable to take all relevant configuration parameters into consideration and optimize them together. H-Tune improves the performance of the MapReduce applications by factors of 1.5× and 9.6× on average, respectively, over the state-of-the-art approach and the default configuration. |
topic |
Ensemble modeling Hadoop configuration MapReduce metaheuristic optimization performance tuning |
url |
https://ieeexplore.ieee.org/document/8416665/ |
work_keys_str_mv |
AT xingchenghua hadoopconfigurationtuningwithensemblemodelingandmetaheuristicoptimization AT michaelchuang hadoopconfigurationtuningwithensemblemodelingandmetaheuristicoptimization AT pengliu hadoopconfigurationtuningwithensemblemodelingandmetaheuristicoptimization |
_version_ |
1724194067488702464 |