Off-policy reinforcement learning with Gaussian processes

An off-policy Bayesian nonparameteric approximate reinforcement learning framework, termed as GPQ, that employs a Gaussian processes (GP) model of the value (Q) function is presented in both the batch and online settings. Sufficient conditions on GP hyperparameter selection are established to guaran...

Full description

Bibliographic Details
Main Authors: Chowdhary, Girish (Author), Liu, Miao (Author), Grande, Robert (Contributor), Walsh, Thomas (Contributor), How, Jonathan P. (Contributor), Carin, Lawrence (Author)
Other Authors: Massachusetts Institute of Technology. Aerospace Controls Laboratory (Contributor), Massachusetts Institute of Technology. Department of Aeronautics and Astronautics (Contributor)
Format: Article
Language:English
Published: Institute of Electrical and Electronics Engineers (IEEE), 2015-05-11T19:13:37Z.
Subjects:
Online Access:Get fulltext
Description
Summary:An off-policy Bayesian nonparameteric approximate reinforcement learning framework, termed as GPQ, that employs a Gaussian processes (GP) model of the value (Q) function is presented in both the batch and online settings. Sufficient conditions on GP hyperparameter selection are established to guarantee convergence of off-policy GPQ in the batch setting, and theoretical and practical extensions are provided for the online case. Empirical results demonstrate GPQ has competitive learning speed in addition to its convergence guarantees and its ability to automatically choose its own bases locations.
United States. Office of Naval Research (Autonomy Program N000140910625)