Learning to Plan via Deep Optimistic Value Exploration
Deep exploration requires coordinated long-term planning. We present a model-based reinforcement learning algorithm that guides policy learning through a value function that exhibits optimism in the face of uncertainty. We capture uncertainty over values by combining predictions from an ensemble of...
Main Authors: | , , , |
---|---|
Other Authors: | , |
Format: | Article |
Language: | English |
Published: |
2020-05-11T19:59:29Z.
|
Subjects: | |
Online Access: | Get fulltext |