Multiagent Reinforcement Learning: Rollout and Policy Iteration for POMDP With Application to Multirobot Problems

Sushmita Bhattacharya, Siva Kailas, Sahil Badyal, Stephanie Gil, Dimitri Bertsekas

Research output: Contribution to journalArticlepeer-review

Abstract

In this article, we consider the computational and communication challenges of partially observable multiagent sequential decision-making problems. We present algorithms that simultaneously or sequentially optimize the agents' controls by using multistep lookahead, truncated rollout with a known base policy, and a terminal cost function approximation. In particular: 1) we consider multiagent rollout algorithms that dramatically reduce required computation while preserving the key policy improvement property of the standard rollout method. We improve our multiagent rollout policy by incorporating it in an offline approximate policy iteration scheme, and we apply an additional 'online play' scheme enhancing offline approximation architectures; 2) we consider the imperfect communication case and provide various extensions to our rollout methods to deal with this case; and 3) we demonstrate the performance of our methods in extensive simulations by applying our method to a challenging partially observable multiagent sequential repair problem (state space size 10^{37} and control space size 10^{7}). Our extensive simulations demonstrate that our methods produce better policies for large and complex multiagent problems in comparison with existing methods, including POMCP, MADDPG, and work well where other methods fail to scale up.

Original languageEnglish (US)
Pages (from-to)2003-2023
Number of pages21
JournalIEEE Transactions on Robotics
Volume40
DOIs
StatePublished - 2024

Keywords

  • Approximate policy iteration (approximate PI)
  • imperfect communication
  • multiagent reinforcement learning
  • multiagent rollout
  • online play policy
  • partial observation Markovian decision problem (POMDP)

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Control and Systems Engineering
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Multiagent Reinforcement Learning: Rollout and Policy Iteration for POMDP With Application to Multirobot Problems'. Together they form a unique fingerprint.

Cite this