An analysis of gradient-based policy iteration

James Dankert, Lei Yang, Jennie Si

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Recently, a system theoretic framework for learning and optimization has been developed that shows how many approximate dynamic programming paradigms such as perturbation analysis, Markov decision processes, and reinforcement learning are very closely related. Using this system theoretic framework a new optimization technique called gradient-based policy iteration (GBPI) has been developed. In this paper we will show how GBPI iteration can be extended to partially observable Markov decision processes (POMDPs). We will also develop the value iteration analogue of GBPI and show that this new version of value iteration, extended to POMDPs, not only theoretically acts like value iteration but also does so numerically.

Original languageEnglish (US)
Title of host publicationProceedings of the International Joint Conference on Neural Networks, IJCNN 2005
Number of pages6
StatePublished - 2005
EventInternational Joint Conference on Neural Networks, IJCNN 2005 - Montreal, QC, Canada
Duration: Jul 31 2005Aug 4 2005

Publication series

NameProceedings of the International Joint Conference on Neural Networks


OtherInternational Joint Conference on Neural Networks, IJCNN 2005
CityMontreal, QC

ASJC Scopus subject areas

  • Software
  • Artificial Intelligence


Dive into the research topics of 'An analysis of gradient-based policy iteration'. Together they form a unique fingerprint.

Cite this