Temporal difference methods for general projected equations

Dimitri P. Bertsekas

doi:10.1109/TAC.2011.2115290

Temporal difference methods for general projected equations

Dimitri P. Bertsekas

Research output: Contribution to journal › Article › peer-review

38 Scopus citations

Abstract

We consider projected equations for approximate solution of high-dimensional fixed point problems within low-dimensional subspaces. We introduce an analytical framework based on an equivalence with variational inequalities, and algorithms that may be implemented with low-dimensional simulation. These algorithms originated in approximate dynamic programming (DP), where they are collectively known as temporal difference (TD) methods. Even when specialized to DP, our methods include extensions/new versions of TD methods, which offer special implementation advantages and reduced overhead over the standard LSTD and LSPE methods, and can deal with near singularity in the associated matrix inversion. We develop deterministic iterative methods and their simulation-based versions, and we discuss a sharp qualitative distinction between them: the performance of the former is greatly affected by direction and feature scaling, yet the latter have the same asymptotic convergence rate regardless of scaling, because of their common simulation-induced performance bottleneck.

Original language	English (US)
Pages (from-to)	2128-2139
Number of pages	12
Journal	IEEE Transactions on Automatic Control
Volume	56
Issue number	9
DOIs	https://doi.org/10.1109/TAC.2011.2115290
State	Published - 2011
Externally published	Yes

Keywords

Approximation methods
Dynamic programming
Markov decision processes
Reinforcement learning
Temporal difference methods

ASJC Scopus subject areas

Control and Systems Engineering
Computer Science Applications
Electrical and Electronic Engineering

Access to Document

10.1109/TAC.2011.2115290

Cite this

@article{0b9f565a39e64752ba639037731f08d2,

title = "Temporal difference methods for general projected equations",

abstract = "We consider projected equations for approximate solution of high-dimensional fixed point problems within low-dimensional subspaces. We introduce an analytical framework based on an equivalence with variational inequalities, and algorithms that may be implemented with low-dimensional simulation. These algorithms originated in approximate dynamic programming (DP), where they are collectively known as temporal difference (TD) methods. Even when specialized to DP, our methods include extensions/new versions of TD methods, which offer special implementation advantages and reduced overhead over the standard LSTD and LSPE methods, and can deal with near singularity in the associated matrix inversion. We develop deterministic iterative methods and their simulation-based versions, and we discuss a sharp qualitative distinction between them: the performance of the former is greatly affected by direction and feature scaling, yet the latter have the same asymptotic convergence rate regardless of scaling, because of their common simulation-induced performance bottleneck.",

keywords = "Approximation methods, Dynamic programming, Markov decision processes, Reinforcement learning, Temporal difference methods",

author = "Bertsekas, {Dimitri P.}",

note = "Funding Information: Manuscript received April 02, 2009; revised September 26, 2009, January 26, 2010, and June 12, 2010; accepted January 20, 2011. Date of publication February 17, 2011; date of current version September 08, 2011. This work was supported by NSF Grant ECCS-0801549. Recommended by Associate Editor C.-H. Chen.",

year = "2011",

doi = "10.1109/TAC.2011.2115290",

language = "English (US)",

volume = "56",

pages = "2128--2139",

journal = "IEEE Transactions on Automatic Control",

issn = "0018-9286",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "9",

}

TY - JOUR

T1 - Temporal difference methods for general projected equations

AU - Bertsekas, Dimitri P.

N1 - Funding Information: Manuscript received April 02, 2009; revised September 26, 2009, January 26, 2010, and June 12, 2010; accepted January 20, 2011. Date of publication February 17, 2011; date of current version September 08, 2011. This work was supported by NSF Grant ECCS-0801549. Recommended by Associate Editor C.-H. Chen.

PY - 2011

Y1 - 2011

N2 - We consider projected equations for approximate solution of high-dimensional fixed point problems within low-dimensional subspaces. We introduce an analytical framework based on an equivalence with variational inequalities, and algorithms that may be implemented with low-dimensional simulation. These algorithms originated in approximate dynamic programming (DP), where they are collectively known as temporal difference (TD) methods. Even when specialized to DP, our methods include extensions/new versions of TD methods, which offer special implementation advantages and reduced overhead over the standard LSTD and LSPE methods, and can deal with near singularity in the associated matrix inversion. We develop deterministic iterative methods and their simulation-based versions, and we discuss a sharp qualitative distinction between them: the performance of the former is greatly affected by direction and feature scaling, yet the latter have the same asymptotic convergence rate regardless of scaling, because of their common simulation-induced performance bottleneck.

AB - We consider projected equations for approximate solution of high-dimensional fixed point problems within low-dimensional subspaces. We introduce an analytical framework based on an equivalence with variational inequalities, and algorithms that may be implemented with low-dimensional simulation. These algorithms originated in approximate dynamic programming (DP), where they are collectively known as temporal difference (TD) methods. Even when specialized to DP, our methods include extensions/new versions of TD methods, which offer special implementation advantages and reduced overhead over the standard LSTD and LSPE methods, and can deal with near singularity in the associated matrix inversion. We develop deterministic iterative methods and their simulation-based versions, and we discuss a sharp qualitative distinction between them: the performance of the former is greatly affected by direction and feature scaling, yet the latter have the same asymptotic convergence rate regardless of scaling, because of their common simulation-induced performance bottleneck.

KW - Approximation methods

KW - Dynamic programming

KW - Markov decision processes

KW - Reinforcement learning

KW - Temporal difference methods

UR - http://www.scopus.com/inward/record.url?scp=84860501567&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84860501567&partnerID=8YFLogxK

U2 - 10.1109/TAC.2011.2115290

DO - 10.1109/TAC.2011.2115290

M3 - Article

AN - SCOPUS:84860501567

SN - 0018-9286

VL - 56

SP - 2128

EP - 2139

JO - IEEE Transactions on Automatic Control

JF - IEEE Transactions on Automatic Control

IS - 9

ER -

Temporal difference methods for general projected equations

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this