Constrained Deep Q-Learning Gradually Approaching Ordinary Q-Learning

A deep Q network (DQN) (Mnih et al., 2013) is an extension of Q learning, which is a typical deep reinforcement learning method. In DQN, a Q function expresses all action values under all states, and it is approximated using a convolutional neural network. Using the approximated Q function, an optim...

Full description

Bibliographic Details
Main Authors:	Shota Ohnishi, Eiji Uchibe, Yotaro Yamaguchi, Kosuke Nakanishi, Yuji Yasui, Shin Ishii
Format:	Article
Language:	English
Published:	Frontiers Media S.A. 2019-12-01
Series:	Frontiers in Neurorobotics
Subjects:	deep reinforcement learning deep Q network regularization learning stabilization target network constrained reinforcement learning
Online Access:	https://www.frontiersin.org/article/10.3389/fnbot.2019.00103/full

id	doaj-b93b5b7b16874660b7e0bde3e5ef1057
record_format	Article
spelling	doaj-b93b5b7b16874660b7e0bde3e5ef10572020-11-25T01:11:45ZengFrontiers Media S.A.Frontiers in Neurorobotics1662-52182019-12-011310.3389/fnbot.2019.00103443506Constrained Deep Q-Learning Gradually Approaching Ordinary Q-LearningShota Ohnishi0Eiji Uchibe1Yotaro Yamaguchi2Kosuke Nakanishi3Yuji Yasui4Shin Ishii5Shin Ishii6Department of Systems Science, Graduate School of Informatics, Kyoto University, Now Affiliated With Panasonic Co., Ltd., Kyoto, JapanATR Computational Neuroscience Laboratories, Kyoto, JapanDepartment of Systems Science, Graduate School of Informatics, Kyoto University, Kyoto, JapanHonda R&D Co., Ltd., Saitama, JapanHonda R&D Co., Ltd., Saitama, JapanATR Computational Neuroscience Laboratories, Kyoto, JapanDepartment of Systems Science, Graduate School of Informatics, Kyoto University, Kyoto, JapanA deep Q network (DQN) (Mnih et al., 2013) is an extension of Q learning, which is a typical deep reinforcement learning method. In DQN, a Q function expresses all action values under all states, and it is approximated using a convolutional neural network. Using the approximated Q function, an optimal policy can be derived. In DQN, a target network, which calculates a target value and is updated by the Q function at regular intervals, is introduced to stabilize the learning process. A less frequent updates of the target network would result in a more stable learning process. However, because the target value is not propagated unless the target network is updated, DQN usually requires a large number of samples. In this study, we proposed Constrained DQN that uses the difference between the outputs of the Q function and the target network as a constraint on the target value. Constrained DQN updates parameters conservatively when the difference between the outputs of the Q function and the target network is large, and it updates them aggressively when this difference is small. In the proposed method, as learning progresses, the number of times that the constraints are activated decreases. Consequently, the update method gradually approaches conventional Q learning. We found that Constrained DQN converges with a smaller training dataset than in the case of DQN and that it is robust against changes in the update frequency of the target network and settings of a certain parameter of the optimizer. Although Constrained DQN alone does not show better performance in comparison to integrated approaches nor distributed methods, experimental results show that Constrained DQN can be used as an additional components to those methods.https://www.frontiersin.org/article/10.3389/fnbot.2019.00103/fulldeep reinforcement learningdeep Q networkregularizationlearning stabilizationtarget networkconstrained reinforcement learning
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Shota Ohnishi Eiji Uchibe Yotaro Yamaguchi Kosuke Nakanishi Yuji Yasui Shin Ishii Shin Ishii
spellingShingle	Shota Ohnishi Eiji Uchibe Yotaro Yamaguchi Kosuke Nakanishi Yuji Yasui Shin Ishii Shin Ishii Constrained Deep Q-Learning Gradually Approaching Ordinary Q-Learning Frontiers in Neurorobotics deep reinforcement learning deep Q network regularization learning stabilization target network constrained reinforcement learning
author_facet	Shota Ohnishi Eiji Uchibe Yotaro Yamaguchi Kosuke Nakanishi Yuji Yasui Shin Ishii Shin Ishii
author_sort	Shota Ohnishi
title	Constrained Deep Q-Learning Gradually Approaching Ordinary Q-Learning
title_short	Constrained Deep Q-Learning Gradually Approaching Ordinary Q-Learning
title_full	Constrained Deep Q-Learning Gradually Approaching Ordinary Q-Learning
title_fullStr	Constrained Deep Q-Learning Gradually Approaching Ordinary Q-Learning
title_full_unstemmed	Constrained Deep Q-Learning Gradually Approaching Ordinary Q-Learning
title_sort	constrained deep q-learning gradually approaching ordinary q-learning
publisher	Frontiers Media S.A.
series	Frontiers in Neurorobotics
issn	1662-5218
publishDate	2019-12-01
description	A deep Q network (DQN) (Mnih et al., 2013) is an extension of Q learning, which is a typical deep reinforcement learning method. In DQN, a Q function expresses all action values under all states, and it is approximated using a convolutional neural network. Using the approximated Q function, an optimal policy can be derived. In DQN, a target network, which calculates a target value and is updated by the Q function at regular intervals, is introduced to stabilize the learning process. A less frequent updates of the target network would result in a more stable learning process. However, because the target value is not propagated unless the target network is updated, DQN usually requires a large number of samples. In this study, we proposed Constrained DQN that uses the difference between the outputs of the Q function and the target network as a constraint on the target value. Constrained DQN updates parameters conservatively when the difference between the outputs of the Q function and the target network is large, and it updates them aggressively when this difference is small. In the proposed method, as learning progresses, the number of times that the constraints are activated decreases. Consequently, the update method gradually approaches conventional Q learning. We found that Constrained DQN converges with a smaller training dataset than in the case of DQN and that it is robust against changes in the update frequency of the target network and settings of a certain parameter of the optimizer. Although Constrained DQN alone does not show better performance in comparison to integrated approaches nor distributed methods, experimental results show that Constrained DQN can be used as an additional components to those methods.
topic	deep reinforcement learning deep Q network regularization learning stabilization target network constrained reinforcement learning
url	https://www.frontiersin.org/article/10.3389/fnbot.2019.00103/full
work_keys_str_mv	AT shotaohnishi constraineddeepqlearninggraduallyapproachingordinaryqlearning AT eijiuchibe constraineddeepqlearninggraduallyapproachingordinaryqlearning AT yotaroyamaguchi constraineddeepqlearninggraduallyapproachingordinaryqlearning AT kosukenakanishi constraineddeepqlearninggraduallyapproachingordinaryqlearning AT yujiyasui constraineddeepqlearninggraduallyapproachingordinaryqlearning AT shinishii constraineddeepqlearninggraduallyapproachingordinaryqlearning AT shinishii constraineddeepqlearninggraduallyapproachingordinaryqlearning
_version_	1725169824847364096

Constrained Deep Q-Learning Gradually Approaching Ordinary Q-Learning

Similar Items