TY - GEN
T1 - Distributed q-learning with state tracking for multi-agent networked control
AU - Wang, Hang
AU - Lin, Sen
AU - Jafarkhani, Hamid
AU - Zhang, Junshan
N1 - Funding Information:
This work is supported in part by NSF Grants CNS-2003081 and CPS-1739344.
Publisher Copyright:
© 2021 International Foundation for Autonomous Agents and Multiagent Systems (www.ifaamas.org). All rights reserved.
PY - 2021
Y1 - 2021
N2 - This paper studies distributed Q-learning for Linear Quadratic Regulator (LQR) in a multi-agent network. The existing results often assume that agents can observe the global system state, which may be infeasible in large-scale systems due to privacy concerns or communication constraints. In this work, we consider a setting with unknown system models and no centralized coordinator. We devise a state tracking (ST) based Q-learning algorithm to design optimal controllers for agents. Specifically, we assume that agents maintain local estimates of the global state based on their local information and communications with neighbors. At each step, every agent updates its local global state estimation, based on which it solves an approximate Q-factor locally through policy iteration. Assuming a decaying injected excitation noise during the policy evaluation, we prove that the local estimation converges to the true global state, and establish the convergence of the proposed distributed ST-based Q-learning algorithm. The experimental studies corroborate our theoretical results by showing that our proposed method achieves comparable performance with the centralized case.
AB - This paper studies distributed Q-learning for Linear Quadratic Regulator (LQR) in a multi-agent network. The existing results often assume that agents can observe the global system state, which may be infeasible in large-scale systems due to privacy concerns or communication constraints. In this work, we consider a setting with unknown system models and no centralized coordinator. We devise a state tracking (ST) based Q-learning algorithm to design optimal controllers for agents. Specifically, we assume that agents maintain local estimates of the global state based on their local information and communications with neighbors. At each step, every agent updates its local global state estimation, based on which it solves an approximate Q-factor locally through policy iteration. Assuming a decaying injected excitation noise during the policy evaluation, we prove that the local estimation converges to the true global state, and establish the convergence of the proposed distributed ST-based Q-learning algorithm. The experimental studies corroborate our theoretical results by showing that our proposed method achieves comparable performance with the centralized case.
KW - Linear Quadratic Control
KW - Multi-agent
KW - Reinforcement Learning
UR - http://www.scopus.com/inward/record.url?scp=85112253409&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85112253409&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85112253409
T3 - Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS
SP - 1680
EP - 1682
BT - 20th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2021
PB - International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS)
T2 - 20th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2021
Y2 - 3 May 2021 through 7 May 2021
ER -