TY - GEN
T1 - Basis Function Adaptation Methods for Cost Approximation in MDP
AU - Yu, Huizhen
AU - Bertsekas, Dimitri P.
PY - 2009
Y1 - 2009
N2 - We generalize a basis adaptation method for cost approximation in Markov decision processes (MDP), extending earlier work of Menache, Mannor, and Shimkin. In our context, basis functions are parametrized and their parameters are tuned by minimizing an objective function involving the cost function approximation obtained when a temporal differences (TD) or other method is used. The adaptation scheme involves only low order calculations and can be implemented in a way analogous to policy gradient methods. In the generalized basis adaptation framework we provide extensions to TD methods for nonlinear optimal stopping problems and to alternative cost approximations beyond those based on TD.
AB - We generalize a basis adaptation method for cost approximation in Markov decision processes (MDP), extending earlier work of Menache, Mannor, and Shimkin. In our context, basis functions are parametrized and their parameters are tuned by minimizing an objective function involving the cost function approximation obtained when a temporal differences (TD) or other method is used. The adaptation scheme involves only low order calculations and can be implemented in a way analogous to policy gradient methods. In the generalized basis adaptation framework we provide extensions to TD methods for nonlinear optimal stopping problems and to alternative cost approximations beyond those based on TD.
UR - http://www.scopus.com/inward/record.url?scp=67650458822&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=67650458822&partnerID=8YFLogxK
U2 - 10.1109/ADPRL.2009.4927528
DO - 10.1109/ADPRL.2009.4927528
M3 - Conference contribution
AN - SCOPUS:67650458822
SN - 9781424427611
T3 - 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, ADPRL 2009 - Proceedings
SP - 74
EP - 81
BT - 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, ADPRL 2009 - Proceedings
T2 - 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, ADPRL 2009
Y2 - 30 March 2009 through 2 April 2009
ER -