Q-learning algorithms for optimal stopping based on least squares

Huizhen Yu; Dimitri P. Bertsekas

doi:10.23919/ecc.2007.7068523

Q-learning algorithms for optimal stopping based on least squares

Huizhen Yu, Dimitri P. Bertsekas

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

20 Scopus citations

Abstract

We consider the solution of discounted optimal stopping problems using linear function approximation methods. A Q-learning algorithm for such problems, proposed by Tsitsiklis and Van Roy, is based on the method of temporal differences and stochastic approximation. We propose alternative algorithms, which are based on projected value iteration ideas and least squares. We prove the convergence of some of these algorithms and discuss their properties.

Original language	English (US)
Title of host publication	2007 European Control Conference, ECC 2007
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	2368-2375
Number of pages	8
ISBN (Electronic)	9783952417386
DOIs	https://doi.org/10.23919/ecc.2007.7068523
State	Published - 2007
Externally published	Yes
Event	2007 9th European Control Conference, ECC 2007 - Kos, Greece Duration: Jul 2 2007 → Jul 5 2007

Publication series

Name	2007 European Control Conference, ECC 2007

Other

Other	2007 9th European Control Conference, ECC 2007
Country/Territory	Greece
City	Kos
Period	7/2/07 → 7/5/07

ASJC Scopus subject areas

Control and Systems Engineering

Access to Document

10.23919/ecc.2007.7068523

Cite this

Yu, H & Bertsekas, DP 2007, Q-learning algorithms for optimal stopping based on least squares. in 2007 European Control Conference, ECC 2007., 7068523, 2007 European Control Conference, ECC 2007, Institute of Electrical and Electronics Engineers Inc., pp. 2368-2375, 2007 9th European Control Conference, ECC 2007, Kos, Greece, 7/2/07. https://doi.org/10.23919/ecc.2007.7068523

@inproceedings{5cc7179b163c4b62b6d65b4fc2682874,

title = "Q-learning algorithms for optimal stopping based on least squares",

abstract = "We consider the solution of discounted optimal stopping problems using linear function approximation methods. A Q-learning algorithm for such problems, proposed by Tsitsiklis and Van Roy, is based on the method of temporal differences and stochastic approximation. We propose alternative algorithms, which are based on projected value iteration ideas and least squares. We prove the convergence of some of these algorithms and discuss their properties.",

author = "Huizhen Yu and Bertsekas, {Dimitri P.}",

note = "Publisher Copyright: {\textcopyright} 2007 EUCA.; 2007 9th European Control Conference, ECC 2007 ; Conference date: 02-07-2007 Through 05-07-2007",

year = "2007",

doi = "10.23919/ecc.2007.7068523",

language = "English (US)",

series = "2007 European Control Conference, ECC 2007",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "2368--2375",

booktitle = "2007 European Control Conference, ECC 2007",

}

TY - GEN

T1 - Q-learning algorithms for optimal stopping based on least squares

AU - Yu, Huizhen

AU - Bertsekas, Dimitri P.

PY - 2007

Y1 - 2007

N2 - We consider the solution of discounted optimal stopping problems using linear function approximation methods. A Q-learning algorithm for such problems, proposed by Tsitsiklis and Van Roy, is based on the method of temporal differences and stochastic approximation. We propose alternative algorithms, which are based on projected value iteration ideas and least squares. We prove the convergence of some of these algorithms and discuss their properties.

AB - We consider the solution of discounted optimal stopping problems using linear function approximation methods. A Q-learning algorithm for such problems, proposed by Tsitsiklis and Van Roy, is based on the method of temporal differences and stochastic approximation. We propose alternative algorithms, which are based on projected value iteration ideas and least squares. We prove the convergence of some of these algorithms and discuss their properties.

UR - http://www.scopus.com/inward/record.url?scp=84927748655&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84927748655&partnerID=8YFLogxK

U2 - 10.23919/ecc.2007.7068523

DO - 10.23919/ecc.2007.7068523

M3 - Conference contribution

AN - SCOPUS:84927748655

T3 - 2007 European Control Conference, ECC 2007

SP - 2368

EP - 2375

BT - 2007 European Control Conference, ECC 2007

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2007 9th European Control Conference, ECC 2007

Y2 - 2 July 2007 through 5 July 2007

ER -

Q-learning algorithms for optimal stopping based on least squares

Abstract

Publication series

Other

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this