A mixed value and policy iteration method for stochastic control with universally measurable policies

Huizhen Yu; Dimitri P. Bertsekas

doi:10.1287/moor.2014.0704

A mixed value and policy iteration method for stochastic control with universally measurable policies

Huizhen Yu, Dimitri P. Bertsekas

Research output: Contribution to journal › Article › peer-review

8 Scopus citations

Abstract

We consider stochastic optimal control models with Borel spaces and universally measurable policies. For such models the standard policy iteration is known to have difficult measurability issues and cannot be carried out in general. We present a mixed value and policy iteration method that circumvents this difficulty. The method allows the use of stationary policies in computing the optimal cost function in a manner that resembles policy iteration. It can also be used to address similar difficulties of policy iteration in the context of upper and lower semicontinuous models. We analyze the convergence of the method in infinite horizon total cost problems for the discounted case where the one-stage costs are bounded and for the undiscounted case where the one-stage costs are nonpositive or nonnegative. For undiscounted total cost problems with nonnegative one-stage costs, we also give a new convergence theorem for value iteration that shows that value iteration converges whenever it is initialized with a function that is above the optimal cost function and yet bounded by a multiple of the optimal cost function. This condition resembles Whittle's bridging condition and is partly motivated by it. The theorem is also partly motivated by a result of Maitra and Sudderth that showed that value iteration, when initialized with the constant function zero, could require a transfinite number of iterations to converge. We use the new convergence theorem for value iteration to establish the convergence of our mixed value and policy iteration method for the nonnegative cost case.

Original language	English (US)
Pages (from-to)	926-968
Number of pages	43
Journal	Mathematics of Operations Research
Volume	40
Issue number	4
DOIs	https://doi.org/10.1287/moor.2014.0704
State	Published - Nov 2015
Externally published	Yes

Keywords

Borel spaces markov decision process
Convergence
Discrete-time stochastic control
Measurability
Policy iteration
Total cost criteria
Value iteration

ASJC Scopus subject areas

General Mathematics
Computer Science Applications
Management Science and Operations Research

Access to Document

10.1287/moor.2014.0704

Cite this

@article{9b31052a40ea4940a72ec8f1e45fbcbe,

title = "A mixed value and policy iteration method for stochastic control with universally measurable policies",

abstract = "We consider stochastic optimal control models with Borel spaces and universally measurable policies. For such models the standard policy iteration is known to have difficult measurability issues and cannot be carried out in general. We present a mixed value and policy iteration method that circumvents this difficulty. The method allows the use of stationary policies in computing the optimal cost function in a manner that resembles policy iteration. It can also be used to address similar difficulties of policy iteration in the context of upper and lower semicontinuous models. We analyze the convergence of the method in infinite horizon total cost problems for the discounted case where the one-stage costs are bounded and for the undiscounted case where the one-stage costs are nonpositive or nonnegative. For undiscounted total cost problems with nonnegative one-stage costs, we also give a new convergence theorem for value iteration that shows that value iteration converges whenever it is initialized with a function that is above the optimal cost function and yet bounded by a multiple of the optimal cost function. This condition resembles Whittle's bridging condition and is partly motivated by it. The theorem is also partly motivated by a result of Maitra and Sudderth that showed that value iteration, when initialized with the constant function zero, could require a transfinite number of iterations to converge. We use the new convergence theorem for value iteration to establish the convergence of our mixed value and policy iteration method for the nonnegative cost case.",

keywords = "Borel spaces markov decision process, Convergence, Discrete-time stochastic control, Measurability, Policy iteration, Total cost criteria, Value iteration",

author = "Huizhen Yu and Bertsekas, {Dimitri P.}",

note = "Publisher Copyright: {\textcopyright} 2015 INFORMS.",

year = "2015",

month = nov,

doi = "10.1287/moor.2014.0704",

language = "English (US)",

volume = "40",

pages = "926--968",

journal = "Mathematics of Operations Research",

issn = "0364-765X",

publisher = "INFORMS Inst.for Operations Res.and the Management Sciences",

number = "4",

}

TY - JOUR

T1 - A mixed value and policy iteration method for stochastic control with universally measurable policies

AU - Yu, Huizhen

AU - Bertsekas, Dimitri P.

PY - 2015/11

Y1 - 2015/11

N2 - We consider stochastic optimal control models with Borel spaces and universally measurable policies. For such models the standard policy iteration is known to have difficult measurability issues and cannot be carried out in general. We present a mixed value and policy iteration method that circumvents this difficulty. The method allows the use of stationary policies in computing the optimal cost function in a manner that resembles policy iteration. It can also be used to address similar difficulties of policy iteration in the context of upper and lower semicontinuous models. We analyze the convergence of the method in infinite horizon total cost problems for the discounted case where the one-stage costs are bounded and for the undiscounted case where the one-stage costs are nonpositive or nonnegative. For undiscounted total cost problems with nonnegative one-stage costs, we also give a new convergence theorem for value iteration that shows that value iteration converges whenever it is initialized with a function that is above the optimal cost function and yet bounded by a multiple of the optimal cost function. This condition resembles Whittle's bridging condition and is partly motivated by it. The theorem is also partly motivated by a result of Maitra and Sudderth that showed that value iteration, when initialized with the constant function zero, could require a transfinite number of iterations to converge. We use the new convergence theorem for value iteration to establish the convergence of our mixed value and policy iteration method for the nonnegative cost case.

AB - We consider stochastic optimal control models with Borel spaces and universally measurable policies. For such models the standard policy iteration is known to have difficult measurability issues and cannot be carried out in general. We present a mixed value and policy iteration method that circumvents this difficulty. The method allows the use of stationary policies in computing the optimal cost function in a manner that resembles policy iteration. It can also be used to address similar difficulties of policy iteration in the context of upper and lower semicontinuous models. We analyze the convergence of the method in infinite horizon total cost problems for the discounted case where the one-stage costs are bounded and for the undiscounted case where the one-stage costs are nonpositive or nonnegative. For undiscounted total cost problems with nonnegative one-stage costs, we also give a new convergence theorem for value iteration that shows that value iteration converges whenever it is initialized with a function that is above the optimal cost function and yet bounded by a multiple of the optimal cost function. This condition resembles Whittle's bridging condition and is partly motivated by it. The theorem is also partly motivated by a result of Maitra and Sudderth that showed that value iteration, when initialized with the constant function zero, could require a transfinite number of iterations to converge. We use the new convergence theorem for value iteration to establish the convergence of our mixed value and policy iteration method for the nonnegative cost case.

KW - Borel spaces markov decision process

KW - Convergence

KW - Discrete-time stochastic control

KW - Measurability

KW - Policy iteration

KW - Total cost criteria

KW - Value iteration

UR - http://www.scopus.com/inward/record.url?scp=84940687396&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84940687396&partnerID=8YFLogxK

U2 - 10.1287/moor.2014.0704

DO - 10.1287/moor.2014.0704

M3 - Article

AN - SCOPUS:84940687396

SN - 0364-765X

VL - 40

SP - 926

EP - 968

JO - Mathematics of Operations Research

JF - Mathematics of Operations Research

IS - 4

ER -

A mixed value and policy iteration method for stochastic control with universally measurable policies

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this