A projected primal-dual gradient optimal control method for deep reinforcement learning

Abstract In this contribution, we start with a policy-based Reinforcement Learning ansatz using neural networks. The underlying Markov Decision Process consists of a transition probability representing the dynamical system and a policy realized by a neural network mapping the current state to parame...

Full description

Bibliographic Details
Main Authors: Simon Gottschalk, Michael Burger, Matthias Gerdts
Format: Article
Language:English
Published: SpringerOpen 2020-04-01
Series:Journal of Mathematics in Industry
Subjects:
Online Access:http://link.springer.com/article/10.1186/s13362-020-00075-3