Summary: | 碩士 === 輔仁大學 === 資訊工程學系碩士班 === 107 === As cloud computing is getting more and more popular, cloud systems have been widely adopted to store and share information among users.
The Apache Hadoop is one the most popular cloud platforms in the cloud community. It could consist of a large number of computing nodes and keep data with replicas across its computing nodes. As a result, jobs based on the MapReduce model in Hadoop could be divided into smaller tasks and get distributed to multiple computing nodes to speed up their execution. However, the progress of MapReduce jobs can be delayed by accessing data from computing nodes if those nodes have heavy disk I/O during the data access.
This research aims to mitigate the delay issue by helping MapReduce jobs to access data from computing nodes with less disk I/O instead of the busy ones. Consequently, the progress of MapReduce jobs could also be accelerated. Besides, through our approach, the real-time disk loading in the Hadoop cluster could also be more balanced as we always access data from disks with less disk activity.
|