Hdconfigor: Automatically Tuning High Dimensional Configuration Parameters for Log Search Engines

Search engines are nowadays widely applied to store and analyze logs generated by large-scale distributed systems. To adapt to various workload scenarios, log search engines such as Elasticsearch usually expose a large number of performance-related configuration parameters. As manual configuring is...

Full description

Bibliographic Details
Main Authors: Hui Dou, Pengfei Chen, Zibin Zheng
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9079492/
id doaj-3e3bc7ed471c4b04bd70095c4be1393c
record_format Article
spelling doaj-3e3bc7ed471c4b04bd70095c4be1393c2021-03-30T01:46:20ZengIEEEIEEE Access2169-35362020-01-018806388065310.1109/ACCESS.2020.29907359079492Hdconfigor: Automatically Tuning High Dimensional Configuration Parameters for Log Search EnginesHui Dou0https://orcid.org/0000-0002-9242-1181Pengfei Chen1Zibin Zheng2https://orcid.org/0000-0001-7872-7718School of Data and Computer Science, Sun Yat-sen University, Guangzhou, ChinaSchool of Data and Computer Science, Sun Yat-sen University, Guangzhou, ChinaSchool of Data and Computer Science, Sun Yat-sen University, Guangzhou, ChinaSearch engines are nowadays widely applied to store and analyze logs generated by large-scale distributed systems. To adapt to various workload scenarios, log search engines such as Elasticsearch usually expose a large number of performance-related configuration parameters. As manual configuring is time consuming and labor intensive, automatically tuning configuration parameters to optimize performance has been an urgent need. However, it is challenging because: 1) Due to the complex implementation, the relationship between performance and configuration parameters is difficult to model and thus the objective function is actually a black box; 2) In addition to application parameters, JVM and kernel parameters are also closely related to the performance and together they construct a high dimensional configuration space; 3) To iteratively search for the best configuration, a tool is necessary to automatically deploy the newly generated configuration and launch tests to measure the corresponding performance. To address these challenges, this paper designs and implements HDConfigor, an automatic holistic configuration parameter tuning tool for log search engines. In order to solve the high dimensional optimization problem, we propose a modified Random EMbedding Bayesian Optimization algorithm (mREMBO) in HDConfigor which is a black-box approach. Instead of directly using a black-box optimization algorithm such as Bayesian optimization (BO), mREMBO first generates a lower dimensional embedded space through introducing a random embedding matrix and then performs BO in this embedded space. Therefore, HDConfigor is able to find a competitive configuration automatically and quickly. We evaluate HDConfigor in an Elasticsearch cluster with different workload scenarios. Experimental results show that compared with the default configuration, the best relative median indexing results achieved by mREMBO can reach $2.07\times $ . In addition, under the same number of trials, mREMBO is able to find a configuration with at least a further 10.31% improvement in throughput compared to Random search, Simulated Annealing and BO.https://ieeexplore.ieee.org/document/9079492/Log search engineconfiguration parameter tuningblack-box optimizationBayesian optimizationrandom embedding
collection DOAJ
language English
format Article
sources DOAJ
author Hui Dou
Pengfei Chen
Zibin Zheng
spellingShingle Hui Dou
Pengfei Chen
Zibin Zheng
Hdconfigor: Automatically Tuning High Dimensional Configuration Parameters for Log Search Engines
IEEE Access
Log search engine
configuration parameter tuning
black-box optimization
Bayesian optimization
random embedding
author_facet Hui Dou
Pengfei Chen
Zibin Zheng
author_sort Hui Dou
title Hdconfigor: Automatically Tuning High Dimensional Configuration Parameters for Log Search Engines
title_short Hdconfigor: Automatically Tuning High Dimensional Configuration Parameters for Log Search Engines
title_full Hdconfigor: Automatically Tuning High Dimensional Configuration Parameters for Log Search Engines
title_fullStr Hdconfigor: Automatically Tuning High Dimensional Configuration Parameters for Log Search Engines
title_full_unstemmed Hdconfigor: Automatically Tuning High Dimensional Configuration Parameters for Log Search Engines
title_sort hdconfigor: automatically tuning high dimensional configuration parameters for log search engines
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2020-01-01
description Search engines are nowadays widely applied to store and analyze logs generated by large-scale distributed systems. To adapt to various workload scenarios, log search engines such as Elasticsearch usually expose a large number of performance-related configuration parameters. As manual configuring is time consuming and labor intensive, automatically tuning configuration parameters to optimize performance has been an urgent need. However, it is challenging because: 1) Due to the complex implementation, the relationship between performance and configuration parameters is difficult to model and thus the objective function is actually a black box; 2) In addition to application parameters, JVM and kernel parameters are also closely related to the performance and together they construct a high dimensional configuration space; 3) To iteratively search for the best configuration, a tool is necessary to automatically deploy the newly generated configuration and launch tests to measure the corresponding performance. To address these challenges, this paper designs and implements HDConfigor, an automatic holistic configuration parameter tuning tool for log search engines. In order to solve the high dimensional optimization problem, we propose a modified Random EMbedding Bayesian Optimization algorithm (mREMBO) in HDConfigor which is a black-box approach. Instead of directly using a black-box optimization algorithm such as Bayesian optimization (BO), mREMBO first generates a lower dimensional embedded space through introducing a random embedding matrix and then performs BO in this embedded space. Therefore, HDConfigor is able to find a competitive configuration automatically and quickly. We evaluate HDConfigor in an Elasticsearch cluster with different workload scenarios. Experimental results show that compared with the default configuration, the best relative median indexing results achieved by mREMBO can reach $2.07\times $ . In addition, under the same number of trials, mREMBO is able to find a configuration with at least a further 10.31% improvement in throughput compared to Random search, Simulated Annealing and BO.
topic Log search engine
configuration parameter tuning
black-box optimization
Bayesian optimization
random embedding
url https://ieeexplore.ieee.org/document/9079492/
work_keys_str_mv AT huidou hdconfigorautomaticallytuninghighdimensionalconfigurationparametersforlogsearchengines
AT pengfeichen hdconfigorautomaticallytuninghighdimensionalconfigurationparametersforlogsearchengines
AT zibinzheng hdconfigorautomaticallytuninghighdimensionalconfigurationparametersforlogsearchengines
_version_ 1724186423106469888