LSTPD: Least Slack Time-Based Preemptive Deadline Constraint Scheduler for Hadoop Clusters

Big data refers to numerous forms of complex and large datasets which need distinctive computational platforms in order to be analyzed. Hadoop is one of the popular frameworks for analytics of big data. In Hadoop, a big job is split into multiple small tasks and then they are distributed to worker n...

Full description

Bibliographic Details
Main Authors: Ihsan Ullah, Muhammad Sajjad Khan, Muhammad Amir, Junsu Kim, Su Min Kim
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9117131/
id doaj-926a64a116994203bf38414abd6fb940
record_format Article
spelling doaj-926a64a116994203bf38414abd6fb9402021-03-30T02:46:48ZengIEEEIEEE Access2169-35362020-01-01811175111176210.1109/ACCESS.2020.30025659117131LSTPD: Least Slack Time-Based Preemptive Deadline Constraint Scheduler for Hadoop ClustersIhsan Ullah0Muhammad Sajjad Khan1https://orcid.org/0000-0003-3238-0434Muhammad Amir2Junsu Kim3Su Min Kim4Department of Electrical and Computer Engineering, Sungkyunkwan University, Suwon, South KoreaDepartment of Electronics Engineering, Korea Polytechnic University, Siheung, South KoreaDepartment of Electrical Engineering, International Islamic University at Islamabad, Islamabad, PakistanDepartment of Electronics Engineering, Korea Polytechnic University, Siheung, South KoreaDepartment of Electronics Engineering, Korea Polytechnic University, Siheung, South KoreaBig data refers to numerous forms of complex and large datasets which need distinctive computational platforms in order to be analyzed. Hadoop is one of the popular frameworks for analytics of big data. In Hadoop, a big job is split into multiple small tasks and then they are distributed to worker nodes in a parallel way using MapReduce to speed up computational processes. In this aspect, it is important how to improve throughput performance. MapReduce jobs require quick responses from the worker nodes to complete them under their deadlines. The existing scheduling schemes for Hadoop such as FIFO, fair, and capacity schedulers cannot guarantee the quick response requirement satisfying a prior deadline. Thus, Hadoop system needs to improve response time and completion time for the heterogeneous MapReduce jobs. In this paper, we propose an efficient preemptive deadline constraint scheduler based on least slack time and data locality. In order for better allocation of tasks and load balancing, we first analyze the task scheduling behaviors of the Hadoop platform. Based on that, we propose a novel preemptive approach which considers the remaining execution time of the job being executed in deciding preemption. The experimental results show that the proposed scheme significantly reduces the job execution time and queue waiting time, compared to existing schemes.https://ieeexplore.ieee.org/document/9117131/HadoopMapReducedistributed systemparallel computingpreemptive job schedulingqueuing theory
collection DOAJ
language English
format Article
sources DOAJ
author Ihsan Ullah
Muhammad Sajjad Khan
Muhammad Amir
Junsu Kim
Su Min Kim
spellingShingle Ihsan Ullah
Muhammad Sajjad Khan
Muhammad Amir
Junsu Kim
Su Min Kim
LSTPD: Least Slack Time-Based Preemptive Deadline Constraint Scheduler for Hadoop Clusters
IEEE Access
Hadoop
MapReduce
distributed system
parallel computing
preemptive job scheduling
queuing theory
author_facet Ihsan Ullah
Muhammad Sajjad Khan
Muhammad Amir
Junsu Kim
Su Min Kim
author_sort Ihsan Ullah
title LSTPD: Least Slack Time-Based Preemptive Deadline Constraint Scheduler for Hadoop Clusters
title_short LSTPD: Least Slack Time-Based Preemptive Deadline Constraint Scheduler for Hadoop Clusters
title_full LSTPD: Least Slack Time-Based Preemptive Deadline Constraint Scheduler for Hadoop Clusters
title_fullStr LSTPD: Least Slack Time-Based Preemptive Deadline Constraint Scheduler for Hadoop Clusters
title_full_unstemmed LSTPD: Least Slack Time-Based Preemptive Deadline Constraint Scheduler for Hadoop Clusters
title_sort lstpd: least slack time-based preemptive deadline constraint scheduler for hadoop clusters
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2020-01-01
description Big data refers to numerous forms of complex and large datasets which need distinctive computational platforms in order to be analyzed. Hadoop is one of the popular frameworks for analytics of big data. In Hadoop, a big job is split into multiple small tasks and then they are distributed to worker nodes in a parallel way using MapReduce to speed up computational processes. In this aspect, it is important how to improve throughput performance. MapReduce jobs require quick responses from the worker nodes to complete them under their deadlines. The existing scheduling schemes for Hadoop such as FIFO, fair, and capacity schedulers cannot guarantee the quick response requirement satisfying a prior deadline. Thus, Hadoop system needs to improve response time and completion time for the heterogeneous MapReduce jobs. In this paper, we propose an efficient preemptive deadline constraint scheduler based on least slack time and data locality. In order for better allocation of tasks and load balancing, we first analyze the task scheduling behaviors of the Hadoop platform. Based on that, we propose a novel preemptive approach which considers the remaining execution time of the job being executed in deciding preemption. The experimental results show that the proposed scheme significantly reduces the job execution time and queue waiting time, compared to existing schemes.
topic Hadoop
MapReduce
distributed system
parallel computing
preemptive job scheduling
queuing theory
url https://ieeexplore.ieee.org/document/9117131/
work_keys_str_mv AT ihsanullah lstpdleastslacktimebasedpreemptivedeadlineconstraintschedulerforhadoopclusters
AT muhammadsajjadkhan lstpdleastslacktimebasedpreemptivedeadlineconstraintschedulerforhadoopclusters
AT muhammadamir lstpdleastslacktimebasedpreemptivedeadlineconstraintschedulerforhadoopclusters
AT junsukim lstpdleastslacktimebasedpreemptivedeadlineconstraintschedulerforhadoopclusters
AT suminkim lstpdleastslacktimebasedpreemptivedeadlineconstraintschedulerforhadoopclusters
_version_ 1724184655854305280