TY - JOUR
T1 - Temporal difference methods for general projected equations
AU - Bertsekas, Dimitri P.
N1 - Funding Information:
Manuscript received April 02, 2009; revised September 26, 2009, January 26, 2010, and June 12, 2010; accepted January 20, 2011. Date of publication February 17, 2011; date of current version September 08, 2011. This work was supported by NSF Grant ECCS-0801549. Recommended by Associate Editor C.-H. Chen.
PY - 2011
Y1 - 2011
N2 - We consider projected equations for approximate solution of high-dimensional fixed point problems within low-dimensional subspaces. We introduce an analytical framework based on an equivalence with variational inequalities, and algorithms that may be implemented with low-dimensional simulation. These algorithms originated in approximate dynamic programming (DP), where they are collectively known as temporal difference (TD) methods. Even when specialized to DP, our methods include extensions/new versions of TD methods, which offer special implementation advantages and reduced overhead over the standard LSTD and LSPE methods, and can deal with near singularity in the associated matrix inversion. We develop deterministic iterative methods and their simulation-based versions, and we discuss a sharp qualitative distinction between them: the performance of the former is greatly affected by direction and feature scaling, yet the latter have the same asymptotic convergence rate regardless of scaling, because of their common simulation-induced performance bottleneck.
AB - We consider projected equations for approximate solution of high-dimensional fixed point problems within low-dimensional subspaces. We introduce an analytical framework based on an equivalence with variational inequalities, and algorithms that may be implemented with low-dimensional simulation. These algorithms originated in approximate dynamic programming (DP), where they are collectively known as temporal difference (TD) methods. Even when specialized to DP, our methods include extensions/new versions of TD methods, which offer special implementation advantages and reduced overhead over the standard LSTD and LSPE methods, and can deal with near singularity in the associated matrix inversion. We develop deterministic iterative methods and their simulation-based versions, and we discuss a sharp qualitative distinction between them: the performance of the former is greatly affected by direction and feature scaling, yet the latter have the same asymptotic convergence rate regardless of scaling, because of their common simulation-induced performance bottleneck.
KW - Approximation methods
KW - Dynamic programming
KW - Markov decision processes
KW - Reinforcement learning
KW - Temporal difference methods
UR - http://www.scopus.com/inward/record.url?scp=84860501567&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84860501567&partnerID=8YFLogxK
U2 - 10.1109/TAC.2011.2115290
DO - 10.1109/TAC.2011.2115290
M3 - Article
AN - SCOPUS:84860501567
SN - 0018-9286
VL - 56
SP - 2128
EP - 2139
JO - IEEE Transactions on Automatic Control
JF - IEEE Transactions on Automatic Control
IS - 9
ER -