Motion Planning of Robot Manipulators for a Smoother Path Using a Twin Delayed Deep Deterministic Policy Gradient with Hindsight Experience Replay

In order to enhance performance of robot systems in the manufacturing industry, it is essential to develop motion and task planning algorithms. Especially, it is important for the motion plan to be generated automatically in order to deal with various working environments. Although PRM (Probabilisti...

Full description

Bibliographic Details
Main Authors: MyeongSeop Kim, Dong-Ki Han, Jae-Han Park, Jung-Su Kim
Format: Article
Language:English
Published: MDPI AG 2020-01-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/10/2/575
id doaj-aebab02e84c240f6916ed787647a386d
record_format Article
spelling doaj-aebab02e84c240f6916ed787647a386d2020-11-25T00:35:16ZengMDPI AGApplied Sciences2076-34172020-01-0110257510.3390/app10020575app10020575Motion Planning of Robot Manipulators for a Smoother Path Using a Twin Delayed Deep Deterministic Policy Gradient with Hindsight Experience ReplayMyeongSeop Kim0Dong-Ki Han1Jae-Han Park2Jung-Su Kim3Department of Electrical and Information Engineering, Seoul National University of Science and Technology, Seoul 01811, KoreaDepartment of Electrical and Information Engineering, Seoul National University of Science and Technology, Seoul 01811, KoreaRobotics R&D Group, Korea Institute of Industrial Technology (KITECH), Ansan 15588, KoreaDepartment of Electrical and Information Engineering, Seoul National University of Science and Technology, Seoul 01811, KoreaIn order to enhance performance of robot systems in the manufacturing industry, it is essential to develop motion and task planning algorithms. Especially, it is important for the motion plan to be generated automatically in order to deal with various working environments. Although PRM (Probabilistic Roadmap) provides feasible paths when the starting and goal positions of a robot manipulator are given, the path might not be smooth enough, which can lead to inefficient performance of the robot system. This paper proposes a motion planning algorithm for robot manipulators using a twin delayed deep deterministic policy gradient (TD3) which is a reinforcement learning algorithm tailored to MDP with continuous action. Besides, hindsight experience replay (HER) is employed in the TD3 to enhance sample efficiency. Since path planning for a robot manipulator is an MDP (Markov Decision Process) with sparse reward and HER can deal with such a problem, this paper proposes a motion planning algorithm using TD3 with HER. The proposed algorithm is applied to 2-DOF and 3-DOF manipulators and it is shown that the designed paths are smoother and shorter than those designed by PRM.https://www.mdpi.com/2076-3417/10/2/575motion planningprobabilistic roadmap (prm)reinforcement learningpolicy gradienthindsight experience replay (her)
collection DOAJ
language English
format Article
sources DOAJ
author MyeongSeop Kim
Dong-Ki Han
Jae-Han Park
Jung-Su Kim
spellingShingle MyeongSeop Kim
Dong-Ki Han
Jae-Han Park
Jung-Su Kim
Motion Planning of Robot Manipulators for a Smoother Path Using a Twin Delayed Deep Deterministic Policy Gradient with Hindsight Experience Replay
Applied Sciences
motion planning
probabilistic roadmap (prm)
reinforcement learning
policy gradient
hindsight experience replay (her)
author_facet MyeongSeop Kim
Dong-Ki Han
Jae-Han Park
Jung-Su Kim
author_sort MyeongSeop Kim
title Motion Planning of Robot Manipulators for a Smoother Path Using a Twin Delayed Deep Deterministic Policy Gradient with Hindsight Experience Replay
title_short Motion Planning of Robot Manipulators for a Smoother Path Using a Twin Delayed Deep Deterministic Policy Gradient with Hindsight Experience Replay
title_full Motion Planning of Robot Manipulators for a Smoother Path Using a Twin Delayed Deep Deterministic Policy Gradient with Hindsight Experience Replay
title_fullStr Motion Planning of Robot Manipulators for a Smoother Path Using a Twin Delayed Deep Deterministic Policy Gradient with Hindsight Experience Replay
title_full_unstemmed Motion Planning of Robot Manipulators for a Smoother Path Using a Twin Delayed Deep Deterministic Policy Gradient with Hindsight Experience Replay
title_sort motion planning of robot manipulators for a smoother path using a twin delayed deep deterministic policy gradient with hindsight experience replay
publisher MDPI AG
series Applied Sciences
issn 2076-3417
publishDate 2020-01-01
description In order to enhance performance of robot systems in the manufacturing industry, it is essential to develop motion and task planning algorithms. Especially, it is important for the motion plan to be generated automatically in order to deal with various working environments. Although PRM (Probabilistic Roadmap) provides feasible paths when the starting and goal positions of a robot manipulator are given, the path might not be smooth enough, which can lead to inefficient performance of the robot system. This paper proposes a motion planning algorithm for robot manipulators using a twin delayed deep deterministic policy gradient (TD3) which is a reinforcement learning algorithm tailored to MDP with continuous action. Besides, hindsight experience replay (HER) is employed in the TD3 to enhance sample efficiency. Since path planning for a robot manipulator is an MDP (Markov Decision Process) with sparse reward and HER can deal with such a problem, this paper proposes a motion planning algorithm using TD3 with HER. The proposed algorithm is applied to 2-DOF and 3-DOF manipulators and it is shown that the designed paths are smoother and shorter than those designed by PRM.
topic motion planning
probabilistic roadmap (prm)
reinforcement learning
policy gradient
hindsight experience replay (her)
url https://www.mdpi.com/2076-3417/10/2/575
work_keys_str_mv AT myeongseopkim motionplanningofrobotmanipulatorsforasmootherpathusingatwindelayeddeepdeterministicpolicygradientwithhindsightexperiencereplay
AT dongkihan motionplanningofrobotmanipulatorsforasmootherpathusingatwindelayeddeepdeterministicpolicygradientwithhindsightexperiencereplay
AT jaehanpark motionplanningofrobotmanipulatorsforasmootherpathusingatwindelayeddeepdeterministicpolicygradientwithhindsightexperiencereplay
AT jungsukim motionplanningofrobotmanipulatorsforasmootherpathusingatwindelayeddeepdeterministicpolicygradientwithhindsightexperiencereplay
_version_ 1725309389113393152