Learning to Plan via Deep Optimistic Value Exploration

Deep exploration requires coordinated long-term planning. We present a model-based reinforcement learning algorithm that guides policy learning through a value function that exhibits optimism in the face of uncertainty. We capture uncertainty over values by combining predictions from an ensemble of...

Full description

Bibliographic Details
Main Authors: Seyde, Tim (Author), Schwarting, Wilko (Author), Karaman, Sertac (Author), Rus, Daniela L (Author)
Other Authors: Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory (Contributor), Massachusetts Institute of Technology. Laboratory for Information and Decision Systems (Contributor)
Format: Article
Language:English
Published: 2020-05-11T19:59:29Z.
Subjects:
Online Access:Get fulltext