Off-policy reinforcement learning with Gaussian processes

An off-policy Bayesian nonparameteric approximate reinforcement learning framework, termed as GPQ, that employs a Gaussian processes (GP) model of the value (Q) function is presented in both the batch and online settings. Sufficient conditions on GP hyperparameter selection are established to guaran...

Full description

Bibliographic Details
Main Authors: Chowdhary, Girish (Author), Liu, Miao (Author), Grande, Robert (Contributor), Walsh, Thomas (Contributor), How, Jonathan P. (Contributor), Carin, Lawrence (Author)
Other Authors: Massachusetts Institute of Technology. Aerospace Controls Laboratory (Contributor), Massachusetts Institute of Technology. Department of Aeronautics and Astronautics (Contributor)
Format: Article
Language:English
Published: Institute of Electrical and Electronics Engineers (IEEE), 2015-05-11T19:13:37Z.
Subjects:
Online Access:Get fulltext