TY - GEN
T1 - Q-learning algorithms for optimal stopping based on least squares
AU - Yu, Huizhen
AU - Bertsekas, Dimitri P.
N1 - Publisher Copyright:
© 2007 EUCA.
PY - 2007
Y1 - 2007
N2 - We consider the solution of discounted optimal stopping problems using linear function approximation methods. A Q-learning algorithm for such problems, proposed by Tsitsiklis and Van Roy, is based on the method of temporal differences and stochastic approximation. We propose alternative algorithms, which are based on projected value iteration ideas and least squares. We prove the convergence of some of these algorithms and discuss their properties.
AB - We consider the solution of discounted optimal stopping problems using linear function approximation methods. A Q-learning algorithm for such problems, proposed by Tsitsiklis and Van Roy, is based on the method of temporal differences and stochastic approximation. We propose alternative algorithms, which are based on projected value iteration ideas and least squares. We prove the convergence of some of these algorithms and discuss their properties.
UR - http://www.scopus.com/inward/record.url?scp=84927748655&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84927748655&partnerID=8YFLogxK
U2 - 10.23919/ecc.2007.7068523
DO - 10.23919/ecc.2007.7068523
M3 - Conference contribution
AN - SCOPUS:84927748655
T3 - 2007 European Control Conference, ECC 2007
SP - 2368
EP - 2375
BT - 2007 European Control Conference, ECC 2007
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2007 9th European Control Conference, ECC 2007
Y2 - 2 July 2007 through 5 July 2007
ER -