Basis Function Adaptation Methods for Cost Approximation in MDP

Huizhen Yu; Dimitri P. Bertsekas

doi:10.1109/ADPRL.2009.4927528

Basis Function Adaptation Methods for Cost Approximation in MDP

Huizhen Yu, Dimitri P. Bertsekas

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

31 Scopus citations

Abstract

We generalize a basis adaptation method for cost approximation in Markov decision processes (MDP), extending earlier work of Menache, Mannor, and Shimkin. In our context, basis functions are parametrized and their parameters are tuned by minimizing an objective function involving the cost function approximation obtained when a temporal differences (TD) or other method is used. The adaptation scheme involves only low order calculations and can be implemented in a way analogous to policy gradient methods. In the generalized basis adaptation framework we provide extensions to TD methods for nonlinear optimal stopping problems and to alternative cost approximations beyond those based on TD.

Original language	English (US)
Title of host publication	2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, ADPRL 2009 - Proceedings
Pages	74-81
Number of pages	8
DOIs	https://doi.org/10.1109/ADPRL.2009.4927528
State	Published - 2009
Externally published	Yes
Event	2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, ADPRL 2009 - Nashville, TN, United States Duration: Mar 30 2009 → Apr 2 2009

Publication series

Name	2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, ADPRL 2009 - Proceedings

Conference

Conference	2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, ADPRL 2009
Country/Territory	United States
City	Nashville, TN
Period	3/30/09 → 4/2/09

ASJC Scopus subject areas

Computational Theory and Mathematics
Software

Access to Document

10.1109/ADPRL.2009.4927528

Cite this

Yu, H., & Bertsekas, D. P. (2009). Basis Function Adaptation Methods for Cost Approximation in MDP. In 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, ADPRL 2009 - Proceedings (pp. 74-81). Article 4927528 (2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, ADPRL 2009 - Proceedings). https://doi.org/10.1109/ADPRL.2009.4927528

Basis Function Adaptation Methods for Cost Approximation in MDP. / Yu, Huizhen; Bertsekas, Dimitri P.
2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, ADPRL 2009 - Proceedings. 2009. p. 74-81 4927528 (2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, ADPRL 2009 - Proceedings).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Yu, H & Bertsekas, DP 2009, Basis Function Adaptation Methods for Cost Approximation in MDP. in 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, ADPRL 2009 - Proceedings., 4927528, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, ADPRL 2009 - Proceedings, pp. 74-81, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, ADPRL 2009, Nashville, TN, United States, 3/30/09. https://doi.org/10.1109/ADPRL.2009.4927528

@inproceedings{010cd9a029f34e47a1b3d51afe5497fb,

title = "Basis Function Adaptation Methods for Cost Approximation in MDP",

abstract = "We generalize a basis adaptation method for cost approximation in Markov decision processes (MDP), extending earlier work of Menache, Mannor, and Shimkin. In our context, basis functions are parametrized and their parameters are tuned by minimizing an objective function involving the cost function approximation obtained when a temporal differences (TD) or other method is used. The adaptation scheme involves only low order calculations and can be implemented in a way analogous to policy gradient methods. In the generalized basis adaptation framework we provide extensions to TD methods for nonlinear optimal stopping problems and to alternative cost approximations beyond those based on TD.",

author = "Huizhen Yu and Bertsekas, {Dimitri P.}",

year = "2009",

doi = "10.1109/ADPRL.2009.4927528",

language = "English (US)",

isbn = "9781424427611",

series = "2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, ADPRL 2009 - Proceedings",

pages = "74--81",

booktitle = "2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, ADPRL 2009 - Proceedings",

note = "2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, ADPRL 2009 ; Conference date: 30-03-2009 Through 02-04-2009",

}

TY - GEN

T1 - Basis Function Adaptation Methods for Cost Approximation in MDP

AU - Yu, Huizhen

AU - Bertsekas, Dimitri P.

PY - 2009

Y1 - 2009

N2 - We generalize a basis adaptation method for cost approximation in Markov decision processes (MDP), extending earlier work of Menache, Mannor, and Shimkin. In our context, basis functions are parametrized and their parameters are tuned by minimizing an objective function involving the cost function approximation obtained when a temporal differences (TD) or other method is used. The adaptation scheme involves only low order calculations and can be implemented in a way analogous to policy gradient methods. In the generalized basis adaptation framework we provide extensions to TD methods for nonlinear optimal stopping problems and to alternative cost approximations beyond those based on TD.

AB - We generalize a basis adaptation method for cost approximation in Markov decision processes (MDP), extending earlier work of Menache, Mannor, and Shimkin. In our context, basis functions are parametrized and their parameters are tuned by minimizing an objective function involving the cost function approximation obtained when a temporal differences (TD) or other method is used. The adaptation scheme involves only low order calculations and can be implemented in a way analogous to policy gradient methods. In the generalized basis adaptation framework we provide extensions to TD methods for nonlinear optimal stopping problems and to alternative cost approximations beyond those based on TD.

UR - http://www.scopus.com/inward/record.url?scp=67650458822&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=67650458822&partnerID=8YFLogxK

U2 - 10.1109/ADPRL.2009.4927528

DO - 10.1109/ADPRL.2009.4927528

M3 - Conference contribution

AN - SCOPUS:67650458822

SN - 9781424427611

T3 - 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, ADPRL 2009 - Proceedings

SP - 74

EP - 81

BT - 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, ADPRL 2009 - Proceedings

T2 - 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, ADPRL 2009

Y2 - 30 March 2009 through 2 April 2009

ER -

Basis Function Adaptation Methods for Cost Approximation in MDP

Abstract

Publication series

Conference

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this