Improved temporal difference methods with linear function approximation

Dimitri P. Bertsekas; Angelia Nedich; Vivek S. Borkar

doi:10.1109/9780470544785.ch9

Improved temporal difference methods with linear function approximation

Dimitri P. Bertsekas, Angelia Nedich, Vivek S. Borkar

Research output: Chapter in Book/Report/Conference proceeding › Chapter

37 Scopus citations

Abstract

This chapter considers temporal difference algorithms within the context of infinite-horizon finite-state dynamic programming problems with discounted cost and linear cost function approximation. This problem arises as a subproblem in the policy iteration method of dynamic programming. Additional discussions of such problems can be found in Chapters 6 and 12. The method presented here is the first iterative temporal difference method that converges without requiring a diminishing step size. The chapter discusses the connections with Sutton’s ID(λ) and with various versions of least-squares that are based on value iteration. It is shown using both analysis and experiments that the proposed method is substantially faster, simpler, and more reliable than TD(λ). Comparisons are also made with the LSTD method of Boyan, and Bradtke and Barto.

Original language	English (US)
Title of host publication	Handbook of Learning and Approximate Dynamic Programming
Publisher	John Wiley and Sons Inc.
Pages	235-259
Number of pages	25
ISBN (Electronic)	9780470544785
ISBN (Print)	047166054X, 9780471660545
DOIs	https://doi.org/10.1109/9780470544785.ch9
State	Published - Jan 1 2004
Externally published	Yes

Keywords

Argon
Convergence
Eigenvalues and eigenfunctions
Function approximation
Markov processes
Trajectory
Vectors

ASJC Scopus subject areas

General Computer Science

Access to Document

10.1109/9780470544785.ch9

Cite this

@inbook{01ffe6e9224c44cf8223ba10086954ef,

title = "Improved temporal difference methods with linear function approximation",

abstract = "This chapter considers temporal difference algorithms within the context of infinite-horizon finite-state dynamic programming problems with discounted cost and linear cost function approximation. This problem arises as a subproblem in the policy iteration method of dynamic programming. Additional discussions of such problems can be found in Chapters 6 and 12. The method presented here is the first iterative temporal difference method that converges without requiring a diminishing step size. The chapter discusses the connections with Sutton{\textquoteright}s ID(λ) and with various versions of least-squares that are based on value iteration. It is shown using both analysis and experiments that the proposed method is substantially faster, simpler, and more reliable than TD(λ). Comparisons are also made with the LSTD method of Boyan, and Bradtke and Barto.",

keywords = "Argon, Convergence, Eigenvalues and eigenfunctions, Function approximation, Markov processes, Trajectory, Vectors",

author = "Bertsekas, {Dimitri P.} and Angelia Nedich and Borkar, {Vivek S.}",

year = "2004",

month = jan,

day = "1",

doi = "10.1109/9780470544785.ch9",

language = "English (US)",

isbn = "047166054X",

pages = "235--259",

booktitle = "Handbook of Learning and Approximate Dynamic Programming",

publisher = "John Wiley and Sons Inc.",

address = "United States",

}

TY - CHAP

T1 - Improved temporal difference methods with linear function approximation

AU - Bertsekas, Dimitri P.

AU - Nedich, Angelia

AU - Borkar, Vivek S.

PY - 2004/1/1

Y1 - 2004/1/1

N2 - This chapter considers temporal difference algorithms within the context of infinite-horizon finite-state dynamic programming problems with discounted cost and linear cost function approximation. This problem arises as a subproblem in the policy iteration method of dynamic programming. Additional discussions of such problems can be found in Chapters 6 and 12. The method presented here is the first iterative temporal difference method that converges without requiring a diminishing step size. The chapter discusses the connections with Sutton’s ID(λ) and with various versions of least-squares that are based on value iteration. It is shown using both analysis and experiments that the proposed method is substantially faster, simpler, and more reliable than TD(λ). Comparisons are also made with the LSTD method of Boyan, and Bradtke and Barto.

AB - This chapter considers temporal difference algorithms within the context of infinite-horizon finite-state dynamic programming problems with discounted cost and linear cost function approximation. This problem arises as a subproblem in the policy iteration method of dynamic programming. Additional discussions of such problems can be found in Chapters 6 and 12. The method presented here is the first iterative temporal difference method that converges without requiring a diminishing step size. The chapter discusses the connections with Sutton’s ID(λ) and with various versions of least-squares that are based on value iteration. It is shown using both analysis and experiments that the proposed method is substantially faster, simpler, and more reliable than TD(λ). Comparisons are also made with the LSTD method of Boyan, and Bradtke and Barto.

KW - Argon

KW - Convergence

KW - Eigenvalues and eigenfunctions

KW - Function approximation

KW - Markov processes

KW - Trajectory

KW - Vectors

UR - http://www.scopus.com/inward/record.url?scp=85036496976&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85036496976&partnerID=8YFLogxK

U2 - 10.1109/9780470544785.ch9

DO - 10.1109/9780470544785.ch9

M3 - Chapter

AN - SCOPUS:85036496976

SN - 047166054X

SN - 9780471660545

SP - 235

EP - 259

BT - Handbook of Learning and Approximate Dynamic Programming

PB - John Wiley and Sons Inc.

ER -

Improved temporal difference methods with linear function approximation

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this