A Novel Task Scheduling Policy for Hadoop YARN

碩士 === 長庚大學 === 資訊工程學系 === 104 === In the second version of Hadoop MapReduce, YARN (Yet Another Resource Negotiator) was proposed to enhance the performance of the second version Hadoop MapReduce. One of the essential improvement is a dependency between the tasks, which is no longer required in the...

Full description

Bibliographic Details
Main Author: Muhammad Febrian Ardiansyah
Other Authors: P. K. Sahoo
Format: Others
Language:en_US
Published: 2016
Online Access:http://ndltd.ncl.edu.tw/handle/mpjgr7
Description
Summary:碩士 === 長庚大學 === 資訊工程學系 === 104 === In the second version of Hadoop MapReduce, YARN (Yet Another Resource Negotiator) was proposed to enhance the performance of the second version Hadoop MapReduce. One of the essential improvement is a dependency between the tasks, which is no longer required in the YARN. With this concept, task idleness was created inside the node due to a premature allocation of the reduced tasks. While, it should be processed once all map tasks of the same job have been processed completely. Consequently, it costs longer time to finish all of the submitted jobs. Mean while, the cost of allocating tasks into nodes should be considered as well, where allocation of the tasks into higher performance nodes should be prioritized. In this article, a novel scheduling policy which comprises of task selection and task scheduling is proposed to solve this drawbacks by minimizing the idle time as well as maximizing the resource utilization. The performance of our proposed algorithm was compared to FIFO, FAIR, and Capacity schedulers, as built-in and pluggable scheduling policies in Hadoop YARN, for the verification. The experimental results showed that our novel protocol performed better results in the average CPU utilization and job's completion time, as well as its task's completion time, over those compared schedulers.