Improving the Hadoop System Performance through Activity-Aware Data Access

碩士 === 輔仁大學 === 資訊工程學系碩士班 === 107 === As cloud computing is getting more and more popular, cloud systems have been widely adopted to store and share information among users. The Apache Hadoop is one the most popular cloud platforms in the cloud community. It could consist of a large number of comp...

Full description

Bibliographic Details
Main Authors: CHEN, YU-LIN, 陳宥霖
Other Authors: YEH, TSO-ZEN
Format: Others
Language:zh-TW
Published: 2019
Online Access:http://ndltd.ncl.edu.tw/handle/9nusyp
Description
Summary:碩士 === 輔仁大學 === 資訊工程學系碩士班 === 107 === As cloud computing is getting more and more popular, cloud systems have been widely adopted to store and share information among users. The Apache Hadoop is one the most popular cloud platforms in the cloud community. It could consist of a large number of computing nodes and keep data with replicas across its computing nodes. As a result, jobs based on the MapReduce model in Hadoop could be divided into smaller tasks and get distributed to multiple computing nodes to speed up their execution. However, the progress of MapReduce jobs can be delayed by accessing data from computing nodes if those nodes have heavy disk I/O during the data access. This research aims to mitigate the delay issue by helping MapReduce jobs to access data from computing nodes with less disk I/O instead of the busy ones. Consequently, the progress of MapReduce jobs could also be accelerated. Besides, through our approach, the real-time disk loading in the Hadoop cluster could also be more balanced as we always access data from disks with less disk activity.