TY - GEN
T1 - Using Reinforcement Learning to Herd a Robotic Swarm to a Target Distribution
AU - Kakish, Zahi
AU - Elamvazhuthi, Karthik
AU - Berman, Spring
N1 - Funding Information:
Acknowledgment. This work was supported by the Arizona State University Global Security Initiative. Many thanks to Dr. Sean Wilson at the Georgia Tech Research Institute for running the robot experiments on the Robotarium.
Funding Information:
This work was supported by the Arizona State University Global Security Initiative. Many thanks to Dr. Sean Wilson at the Georgia Tech Research Institute for running the robot experiments on the Robotarium.
Publisher Copyright:
© 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2022
Y1 - 2022
N2 - In this paper, we present a reinforcement learning approach to designing a control policy for a “leader” agent that herds a swarm of “follower” agents, via repulsive interactions, as quickly as possible to a target probability distribution over a strongly connected graph. The leader control policy is a function of the swarm distribution, which evolves over time according to a mean-field model in the form of an ordinary difference equation. The dependence of the policy on agent populations at each graph vertex, rather than on individual agent activity, simplifies the observations required by the leader and enables the control strategy to scale with the number of agents. Two Temporal-Difference learning algorithms, SARSA and Q-Learning, are used to generate the leader control policy based on the follower agent distribution and the leader’s location on the graph. A simulation environment corresponding to a grid graph with 4 vertices was used to train and validate the control policies for follower agent populations ranging from 10 to 1000. Finally, the control policies trained on 100 simulated agents were used to successfully redistribute a physical swarm of 10 small robots to a target distribution among 4 spatial regions.
AB - In this paper, we present a reinforcement learning approach to designing a control policy for a “leader” agent that herds a swarm of “follower” agents, via repulsive interactions, as quickly as possible to a target probability distribution over a strongly connected graph. The leader control policy is a function of the swarm distribution, which evolves over time according to a mean-field model in the form of an ordinary difference equation. The dependence of the policy on agent populations at each graph vertex, rather than on individual agent activity, simplifies the observations required by the leader and enables the control strategy to scale with the number of agents. Two Temporal-Difference learning algorithms, SARSA and Q-Learning, are used to generate the leader control policy based on the follower agent distribution and the leader’s location on the graph. A simulation environment corresponding to a grid graph with 4 vertices was used to train and validate the control policies for follower agent populations ranging from 10 to 1000. Finally, the control policies trained on 100 simulated agents were used to successfully redistribute a physical swarm of 10 small robots to a target distribution among 4 spatial regions.
KW - Graph theory
KW - Mean-field model
KW - Reinforcement learning
KW - Swarm robotics
UR - http://www.scopus.com/inward/record.url?scp=85123282893&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85123282893&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-92790-5_31
DO - 10.1007/978-3-030-92790-5_31
M3 - Conference contribution
AN - SCOPUS:85123282893
SN - 9783030927899
T3 - Springer Proceedings in Advanced Robotics
SP - 401
EP - 414
BT - Distributed Autonomous Robotic Systems - 15th International Symposium, 2022
A2 - Matsuno, Fumitoshi
A2 - Azuma, Shun-ichi
A2 - Yamamoto, Masahito
PB - Springer Nature
T2 - 15th International Symposium on Distributed Autonomous Robotic Systems, DARS 2021 and 4th International Symposium on Swarm Behavior and Bio-Inspired Robotics, SWARM 2021
Y2 - 1 June 2021 through 4 June 2021
ER -