Abstract
This chapter considers temporal difference algorithms within the context of infinite-horizon finite-state dynamic programming problems with discounted cost and linear cost function approximation. This problem arises as a subproblem in the policy iteration method of dynamic programming. Additional discussions of such problems can be found in Chapters 6 and 12. The method presented here is the first iterative temporal difference method that converges without requiring a diminishing step size. The chapter discusses the connections with Sutton’s ID(λ) and with various versions of least-squares that are based on value iteration. It is shown using both analysis and experiments that the proposed method is substantially faster, simpler, and more reliable than TD(λ). Comparisons are also made with the LSTD method of Boyan, and Bradtke and Barto.
Original language | English (US) |
---|---|
Title of host publication | Handbook of Learning and Approximate Dynamic Programming |
Publisher | John Wiley and Sons Inc. |
Pages | 235-259 |
Number of pages | 25 |
ISBN (Electronic) | 9780470544785 |
ISBN (Print) | 047166054X, 9780471660545 |
DOIs | |
State | Published - Jan 1 2004 |
Externally published | Yes |
Keywords
- Argon
- Convergence
- Eigenvalues and eigenfunctions
- Function approximation
- Markov processes
- Trajectory
- Vectors
ASJC Scopus subject areas
- General Computer Science