Least Squares Temporal Difference Methods: An Analysis under General Conditions

We consider approximate policy evaluation for finite state and action Markov decision processes (MDP) with the least squares temporal difference (LSTD) algorithm, LSTD($\lambda$), in an exploration-enhanced learning context, where policy costs are computed from observations of a Markov chain differe...

Full description

Bibliographic Details
Main Author:	Yu, Huizhen (Contributor)
Other Authors:	Massachusetts Institute of Technology. Laboratory for Information and Decision Systems (Contributor)
Format:	Article
Language:	English
Published:	Society for Industrial and Applied Mathematics, 2013-03-12T18:09:37Z.
Subjects:	Article
Online Access:	Get fulltext

Internet

Get fulltext

Least Squares Temporal Difference Methods: An Analysis under General Conditions

Internet

Similar Items