Least Squares Temporal Difference Methods: An Analysis under General Conditions
We consider approximate policy evaluation for finite state and action Markov decision processes (MDP) with the least squares temporal difference (LSTD) algorithm, LSTD($\lambda$), in an exploration-enhanced learning context, where policy costs are computed from observations of a Markov chain differe...
Main Author: | |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
Society for Industrial and Applied Mathematics,
2013-03-12T18:09:37Z.
|
Subjects: | |
Online Access: | Get fulltext |