Basis Function Adaptation Methods for Cost Approximation in MDP

We generalize a basis adaptation method for cost approximation in Markov decision processes (MDP), extending earlier work of Menache, Mannor, and Shimkin. In our context, basis functions are parametrized and their parameters are tuned by minimizing an objective function involving the cost function a...

Full description

Bibliographic Details
Main Authors:	Yu, Huizhen (Author), Bertsekas, Dimitri P. (Contributor)
Other Authors:	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science (Contributor), Massachusetts Institute of Technology. Laboratory for Information and Decision Systems (Contributor)
Format:	Article
Language:	English
Published:	Institute of Electrical and Electronics Engineers, 2010-10-13T18:33:03Z.
Subjects:	Article
Online Access:	Get fulltext


LEADER	01796 am a22002413u 4500
001	59288
042			\|a dc
100	1	0	\|a Yu, Huizhen \|e author
100	1	0	\|a Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science \|e contributor
100	1	0	\|a Massachusetts Institute of Technology. Laboratory for Information and Decision Systems \|e contributor
100	1	0	\|a Bertsekas, Dimitri P. \|e contributor
100	1	0	\|a Bertsekas, Dimitri P. \|e contributor
700	1	0	\|a Bertsekas, Dimitri P. \|e author
245	0	0	\|a Basis Function Adaptation Methods for Cost Approximation in MDP
260			\|b Institute of Electrical and Electronics Engineers, \|c 2010-10-13T18:33:03Z.
856			\|z Get fulltext \|u http://hdl.handle.net/1721.1/59288
520			\|a We generalize a basis adaptation method for cost approximation in Markov decision processes (MDP), extending earlier work of Menache, Mannor, and Shimkin. In our context, basis functions are parametrized and their parameters are tuned by minimizing an objective function involving the cost function approximation obtained when a temporal differences (TD) or other method is used. The adaptation scheme involves only low order calculations and can be implemented in a way analogous to policy gradient methods. In the generalized basis adaptation framework we provide extensions to TD methods for nonlinear optimal stopping problems and to alternative cost approximations beyond those based on TD.
520			\|a Academy of Finland (grant 118653 (ALGODAN))
520			\|a IST Programme of the European Community (IST-2002-506778)
520			\|a National Science Foundation (U.S.) (Grant ECCS-0801549)
546			\|a en_US
655	7		\|a Article
773			\|t IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, 2009. ADPRL '09

Basis Function Adaptation Methods for Cost Approximation in MDP

Similar Items