A projected primal-dual gradient optimal control method for deep reinforcement learning
Abstract In this contribution, we start with a policy-based Reinforcement Learning ansatz using neural networks. The underlying Markov Decision Process consists of a transition probability representing the dynamical system and a policy realized by a neural network mapping the current state to parame...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
SpringerOpen
2020-04-01
|
Series: | Journal of Mathematics in Industry |
Subjects: | |
Online Access: | http://link.springer.com/article/10.1186/s13362-020-00075-3 |