Feature-based aggregation and deep reinforcement learning: A survey and some new implementations

Dimitri P. Bertsekas

doi:10.1109/JAS.2018.7511249

Feature-based aggregation and deep reinforcement learning: A survey and some new implementations

Dimitri P. Bertsekas

Research output: Contribution to journal › Article › peer-review

85 Scopus citations

Abstract

In this paper we discuss policy iteration methods for approximate solution of a finite-state discounted Markov decision problem, with a focus on feature-based aggregation methods and their connection with deep reinforcement learning schemes. We introduce features of the states of the original problem, and we formulate a smaller aggregate Markov decision problem, whose states relate to the features. We discuss properties and possible implementations of this type of aggregation, including a new approach to approximate policy iteration. In this approach the policy improvement operation combines feature-based aggregation with feature construction using deep neural networks or other calculations. We argue that the cost function of a policy may be approximated much more accurately by the nonlinear function of the features provided by aggregation, than by the linear function of the features provided by neural network-based reinforcement learning, thereby potentially leading to more effective policy improvement.

Original language	English (US)
Article number	8476633
Pages (from-to)	1-31
Number of pages	31
Journal	IEEE/CAA Journal of Automatica Sinica
Volume	6
Issue number	1
DOIs	https://doi.org/10.1109/JAS.2018.7511249
State	Published - Jan 2019
Externally published	Yes

Keywords

Aggregation
Deep neural networks
Dynamic programming
Feature-based architectures
Markovian decision problems
Policy iteration
Reinforcement learning
Rollout algorithms

ASJC Scopus subject areas

Control and Optimization
Artificial Intelligence
Information Systems
Control and Systems Engineering

Access to Document

10.1109/JAS.2018.7511249

Cite this

@article{910e2cfc59eb4250b06642844b3bef15,

title = "Feature-based aggregation and deep reinforcement learning: A survey and some new implementations",

abstract = "In this paper we discuss policy iteration methods for approximate solution of a finite-state discounted Markov decision problem, with a focus on feature-based aggregation methods and their connection with deep reinforcement learning schemes. We introduce features of the states of the original problem, and we formulate a smaller aggregate Markov decision problem, whose states relate to the features. We discuss properties and possible implementations of this type of aggregation, including a new approach to approximate policy iteration. In this approach the policy improvement operation combines feature-based aggregation with feature construction using deep neural networks or other calculations. We argue that the cost function of a policy may be approximated much more accurately by the nonlinear function of the features provided by aggregation, than by the linear function of the features provided by neural network-based reinforcement learning, thereby potentially leading to more effective policy improvement.",

keywords = "Aggregation, Deep neural networks, Dynamic programming, Feature-based architectures, Markovian decision problems, Policy iteration, Reinforcement learning, Rollout algorithms",

author = "Bertsekas, {Dimitri P.}",

note = "Publisher Copyright: {\textcopyright} 2014 Chinese Association of Automation.",

year = "2019",

month = jan,

doi = "10.1109/JAS.2018.7511249",

language = "English (US)",

volume = "6",

pages = "1--31",

journal = "IEEE/CAA Journal of Automatica Sinica",

issn = "2329-9266",

publisher = "IEEE Advancing Technology for Humanity",

number = "1",

}

TY - JOUR

T1 - Feature-based aggregation and deep reinforcement learning

T2 - A survey and some new implementations

AU - Bertsekas, Dimitri P.

PY - 2019/1

Y1 - 2019/1

N2 - In this paper we discuss policy iteration methods for approximate solution of a finite-state discounted Markov decision problem, with a focus on feature-based aggregation methods and their connection with deep reinforcement learning schemes. We introduce features of the states of the original problem, and we formulate a smaller aggregate Markov decision problem, whose states relate to the features. We discuss properties and possible implementations of this type of aggregation, including a new approach to approximate policy iteration. In this approach the policy improvement operation combines feature-based aggregation with feature construction using deep neural networks or other calculations. We argue that the cost function of a policy may be approximated much more accurately by the nonlinear function of the features provided by aggregation, than by the linear function of the features provided by neural network-based reinforcement learning, thereby potentially leading to more effective policy improvement.

AB - In this paper we discuss policy iteration methods for approximate solution of a finite-state discounted Markov decision problem, with a focus on feature-based aggregation methods and their connection with deep reinforcement learning schemes. We introduce features of the states of the original problem, and we formulate a smaller aggregate Markov decision problem, whose states relate to the features. We discuss properties and possible implementations of this type of aggregation, including a new approach to approximate policy iteration. In this approach the policy improvement operation combines feature-based aggregation with feature construction using deep neural networks or other calculations. We argue that the cost function of a policy may be approximated much more accurately by the nonlinear function of the features provided by aggregation, than by the linear function of the features provided by neural network-based reinforcement learning, thereby potentially leading to more effective policy improvement.

KW - Aggregation

KW - Deep neural networks

KW - Dynamic programming

KW - Feature-based architectures

KW - Markovian decision problems

KW - Policy iteration

KW - Reinforcement learning

KW - Rollout algorithms

UR - http://www.scopus.com/inward/record.url?scp=85054396359&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85054396359&partnerID=8YFLogxK

U2 - 10.1109/JAS.2018.7511249

DO - 10.1109/JAS.2018.7511249

M3 - Article

AN - SCOPUS:85054396359

SN - 2329-9266

VL - 6

SP - 1

EP - 31

JO - IEEE/CAA Journal of Automatica Sinica

JF - IEEE/CAA Journal of Automatica Sinica

IS - 1

M1 - 8476633

ER -

Feature-based aggregation and deep reinforcement learning: A survey and some new implementations

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this