Temporal difference methods for general projected equations

Research output: Contribution to journalArticlepeer-review

38 Scopus citations

Abstract

We consider projected equations for approximate solution of high-dimensional fixed point problems within low-dimensional subspaces. We introduce an analytical framework based on an equivalence with variational inequalities, and algorithms that may be implemented with low-dimensional simulation. These algorithms originated in approximate dynamic programming (DP), where they are collectively known as temporal difference (TD) methods. Even when specialized to DP, our methods include extensions/new versions of TD methods, which offer special implementation advantages and reduced overhead over the standard LSTD and LSPE methods, and can deal with near singularity in the associated matrix inversion. We develop deterministic iterative methods and their simulation-based versions, and we discuss a sharp qualitative distinction between them: the performance of the former is greatly affected by direction and feature scaling, yet the latter have the same asymptotic convergence rate regardless of scaling, because of their common simulation-induced performance bottleneck.

Original languageEnglish (US)
Pages (from-to)2128-2139
Number of pages12
JournalIEEE Transactions on Automatic Control
Volume56
Issue number9
DOIs
StatePublished - 2011
Externally publishedYes

Keywords

  • Approximation methods
  • Dynamic programming
  • Markov decision processes
  • Reinforcement learning
  • Temporal difference methods

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Computer Science Applications
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Temporal difference methods for general projected equations'. Together they form a unique fingerprint.

Cite this