Study of Performance Optimization Scheme for Hadoop MapReduce Architecture

博士 === 國防大學理工學院 === 國防科學研究所 === 104 === As the use of cloud computing increases rapidly, Big Data also continue to grow quickly. The performance of data processing for big data has become an important research issue. This thesis discusses performance measurement methods together with performance tun...

Full description

Bibliographic Details
Main Authors: LO,HSIANG-FU, 羅祥福
Other Authors: LIU,CHIANG-LUNG
Format: Others
Language:zh-TW
Published: 2016
Online Access:http://ndltd.ncl.edu.tw/handle/55720702234492522318
Description
Summary:博士 === 國防大學理工學院 === 國防科學研究所 === 104 === As the use of cloud computing increases rapidly, Big Data also continue to grow quickly. The performance of data processing for big data has become an important research issue. This thesis discusses performance measurement methods together with performance tuning scheme in Hadoop MapReduce and then correspondingly proposes the performance improvement methods. To design a performance measurement scheme for Hadoop information hiding applications, a Performance AnalysiS Scheme for MapReduce Information Hiding (PASS-MIH) model is proposed to analyze and measure the performance impact factors of Hadoop information hiding applications. Experimental results show that PASS-MIH model can estimate four levels of performance impact factors for MR-based LSB test case and gain 53.8% performance improvement rate while integrating an existing Hadoop parameter tuning method. In addition, a Comprehensive Performance Rating (CPR) model was used to identify nine principal components from workload history and Hadoop configuration that strongly impacted the Hadoop performance. Experimental results indicate that tuning principal components of Hadoop configurations can produce non-linear performance results. In addition, an ACO-based Hadoop Configuration Optimization (ACO-HCO) scheme is proposed to optimize the performance of Hadoop by automatically tuning its configuration parameter settings. ACO-HCO first employed gene expression programming technique to build an object function based on historical job running records, which represents a correlation among the Hadoop configuration parameters. It then employs ant colony optimization technique, which makes use of the objective function to search for optimal or near optimal parameter settings. Experimental results verify that ACO-HCO scheme enhances the performance of Hadoop significantly compared with the default settings. Moreover, it outperforms both rule-of-thumb settings and the Starfish model in Hadoop performance optimization.