Multiagent Reinforcement Learning: Rollout and Policy Iteration for POMDP With Application to Multirobot Problems

Sushmita Bhattacharya; Siva Kailas; Sahil Badyal; Stephanie Gil; Dimitri Bertsekas

doi:10.1109/TRO.2023.3347128

Multiagent Reinforcement Learning: Rollout and Policy Iteration for POMDP With Application to Multirobot Problems

Sushmita Bhattacharya, Siva Kailas, Sahil Badyal, Stephanie Gil, Dimitri Bertsekas

Engineering, Ira A. Fulton Schools of (IAFSE)

Research output: Contribution to journal › Article › peer-review

Abstract

In this article, we consider the computational and communication challenges of partially observable multiagent sequential decision-making problems. We present algorithms that simultaneously or sequentially optimize the agents' controls by using multistep lookahead, truncated rollout with a known base policy, and a terminal cost function approximation. In particular: 1) we consider multiagent rollout algorithms that dramatically reduce required computation while preserving the key policy improvement property of the standard rollout method. We improve our multiagent rollout policy by incorporating it in an offline approximate policy iteration scheme, and we apply an additional 'online play' scheme enhancing offline approximation architectures; 2) we consider the imperfect communication case and provide various extensions to our rollout methods to deal with this case; and 3) we demonstrate the performance of our methods in extensive simulations by applying our method to a challenging partially observable multiagent sequential repair problem (state space size 10^{37} and control space size 10^{7}). Our extensive simulations demonstrate that our methods produce better policies for large and complex multiagent problems in comparison with existing methods, including POMCP, MADDPG, and work well where other methods fail to scale up.

Original language	English (US)
Pages (from-to)	2003-2023
Number of pages	21
Journal	IEEE Transactions on Robotics
Volume	40
DOIs	https://doi.org/10.1109/TRO.2023.3347128
State	Published - 2024

Keywords

Approximate policy iteration (approximate PI)
imperfect communication
multiagent reinforcement learning
multiagent rollout
online play policy
partial observation Markovian decision problem (POMDP)

ASJC Scopus subject areas

Electrical and Electronic Engineering
Control and Systems Engineering
Computer Science Applications

Access to Document

10.1109/TRO.2023.3347128

Cite this

@article{cdf9b5d4d0c447be946e2fdff3cad85e,

title = "Multiagent Reinforcement Learning: Rollout and Policy Iteration for POMDP With Application to Multirobot Problems",

abstract = "In this article, we consider the computational and communication challenges of partially observable multiagent sequential decision-making problems. We present algorithms that simultaneously or sequentially optimize the agents' controls by using multistep lookahead, truncated rollout with a known base policy, and a terminal cost function approximation. In particular: 1) we consider multiagent rollout algorithms that dramatically reduce required computation while preserving the key policy improvement property of the standard rollout method. We improve our multiagent rollout policy by incorporating it in an offline approximate policy iteration scheme, and we apply an additional 'online play' scheme enhancing offline approximation architectures; 2) we consider the imperfect communication case and provide various extensions to our rollout methods to deal with this case; and 3) we demonstrate the performance of our methods in extensive simulations by applying our method to a challenging partially observable multiagent sequential repair problem (state space size 10^{37} and control space size 10^{7}). Our extensive simulations demonstrate that our methods produce better policies for large and complex multiagent problems in comparison with existing methods, including POMCP, MADDPG, and work well where other methods fail to scale up.",

keywords = "Approximate policy iteration (approximate PI), imperfect communication, multiagent reinforcement learning, multiagent rollout, online play policy, partial observation Markovian decision problem (POMDP)",

author = "Sushmita Bhattacharya and Siva Kailas and Sahil Badyal and Stephanie Gil and Dimitri Bertsekas",

note = "Publisher Copyright: {\textcopyright} 2004-2012 IEEE.",

year = "2024",

doi = "10.1109/TRO.2023.3347128",

language = "English (US)",

volume = "40",

pages = "2003--2023",

journal = "IEEE Transactions on Robotics",

issn = "1552-3098",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Multiagent Reinforcement Learning

T2 - Rollout and Policy Iteration for POMDP With Application to Multirobot Problems

AU - Bhattacharya, Sushmita

AU - Kailas, Siva

AU - Badyal, Sahil

AU - Gil, Stephanie

AU - Bertsekas, Dimitri

PY - 2024

Y1 - 2024

N2 - In this article, we consider the computational and communication challenges of partially observable multiagent sequential decision-making problems. We present algorithms that simultaneously or sequentially optimize the agents' controls by using multistep lookahead, truncated rollout with a known base policy, and a terminal cost function approximation. In particular: 1) we consider multiagent rollout algorithms that dramatically reduce required computation while preserving the key policy improvement property of the standard rollout method. We improve our multiagent rollout policy by incorporating it in an offline approximate policy iteration scheme, and we apply an additional 'online play' scheme enhancing offline approximation architectures; 2) we consider the imperfect communication case and provide various extensions to our rollout methods to deal with this case; and 3) we demonstrate the performance of our methods in extensive simulations by applying our method to a challenging partially observable multiagent sequential repair problem (state space size 10^{37} and control space size 10^{7}). Our extensive simulations demonstrate that our methods produce better policies for large and complex multiagent problems in comparison with existing methods, including POMCP, MADDPG, and work well where other methods fail to scale up.

AB - In this article, we consider the computational and communication challenges of partially observable multiagent sequential decision-making problems. We present algorithms that simultaneously or sequentially optimize the agents' controls by using multistep lookahead, truncated rollout with a known base policy, and a terminal cost function approximation. In particular: 1) we consider multiagent rollout algorithms that dramatically reduce required computation while preserving the key policy improvement property of the standard rollout method. We improve our multiagent rollout policy by incorporating it in an offline approximate policy iteration scheme, and we apply an additional 'online play' scheme enhancing offline approximation architectures; 2) we consider the imperfect communication case and provide various extensions to our rollout methods to deal with this case; and 3) we demonstrate the performance of our methods in extensive simulations by applying our method to a challenging partially observable multiagent sequential repair problem (state space size 10^{37} and control space size 10^{7}). Our extensive simulations demonstrate that our methods produce better policies for large and complex multiagent problems in comparison with existing methods, including POMCP, MADDPG, and work well where other methods fail to scale up.

KW - Approximate policy iteration (approximate PI)

KW - imperfect communication

KW - multiagent reinforcement learning

KW - multiagent rollout

KW - online play policy

KW - partial observation Markovian decision problem (POMDP)

UR - http://www.scopus.com/inward/record.url?scp=85181566788&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85181566788&partnerID=8YFLogxK

U2 - 10.1109/TRO.2023.3347128

DO - 10.1109/TRO.2023.3347128

M3 - Article

AN - SCOPUS:85181566788

SN - 1552-3098

VL - 40

SP - 2003

EP - 2023

JO - IEEE Transactions on Robotics

JF - IEEE Transactions on Robotics

ER -

Multiagent Reinforcement Learning: Rollout and Policy Iteration for POMDP With Application to Multirobot Problems

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this