Q-learning algorithms for optimal stopping based on least squares

Research output: Chapter in Book/Report/Conference proceedingConference contribution

20 Scopus citations

Abstract

We consider the solution of discounted optimal stopping problems using linear function approximation methods. A Q-learning algorithm for such problems, proposed by Tsitsiklis and Van Roy, is based on the method of temporal differences and stochastic approximation. We propose alternative algorithms, which are based on projected value iteration ideas and least squares. We prove the convergence of some of these algorithms and discuss their properties.

Original languageEnglish (US)
Title of host publication2007 European Control Conference, ECC 2007
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2368-2375
Number of pages8
ISBN (Electronic)9783952417386
DOIs
StatePublished - 2007
Externally publishedYes
Event2007 9th European Control Conference, ECC 2007 - Kos, Greece
Duration: Jul 2 2007Jul 5 2007

Publication series

Name2007 European Control Conference, ECC 2007

Other

Other2007 9th European Control Conference, ECC 2007
Country/TerritoryGreece
CityKos
Period7/2/077/5/07

ASJC Scopus subject areas

  • Control and Systems Engineering

Fingerprint

Dive into the research topics of 'Q-learning algorithms for optimal stopping based on least squares'. Together they form a unique fingerprint.

Cite this