Abstract
In this article, we consider the computational and communication challenges of partially observable multiagent sequential decision-making problems. We present algorithms that simultaneously or sequentially optimize the agents' controls by using multistep lookahead, truncated rollout with a known base policy, and a terminal cost function approximation. In particular: 1) we consider multiagent rollout algorithms that dramatically reduce required computation while preserving the key policy improvement property of the standard rollout method. We improve our multiagent rollout policy by incorporating it in an offline approximate policy iteration scheme, and we apply an additional 'online play' scheme enhancing offline approximation architectures; 2) we consider the imperfect communication case and provide various extensions to our rollout methods to deal with this case; and 3) we demonstrate the performance of our methods in extensive simulations by applying our method to a challenging partially observable multiagent sequential repair problem (state space size 10^{37} and control space size 10^{7}). Our extensive simulations demonstrate that our methods produce better policies for large and complex multiagent problems in comparison with existing methods, including POMCP, MADDPG, and work well where other methods fail to scale up.
Original language | English (US) |
---|---|
Pages (from-to) | 2003-2023 |
Number of pages | 21 |
Journal | IEEE Transactions on Robotics |
Volume | 40 |
DOIs | |
State | Published - 2024 |
Keywords
- Approximate policy iteration (approximate PI)
- imperfect communication
- multiagent reinforcement learning
- multiagent rollout
- online play policy
- partial observation Markovian decision problem (POMDP)
ASJC Scopus subject areas
- Electrical and Electronic Engineering
- Control and Systems Engineering
- Computer Science Applications