TY - GEN
T1 - Adaptive Control of MIMO UAVs for Prioritized Harvesting via Hierarchical Reinforcement Learning
AU - Keshavamurthy, Bharath
AU - Michelusi, Nicolo
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - This work addresses the control and coordination of a fleet of MIMO-capable UAVs for efficiently harvesting prioritized traffic from a random distribution of heterogeneous MIMO-capable users. The objective is to maximize the infinite horizon average reward per UAV subject to mobility and average power constraints. Consequently, this problem is formulated as a Dynamic Pickup and Delivery Problem (DPDP) and solved via a Hierarchical Reinforcement Learning (HRL) framework. In the upper tier of the policy hierarchy, a double deep Q-network adaptively partitions the DPDP into a series of static Pickup and Delivery Problems (PDPs) of varying timescales by dynamically caching and releasing batches of user requests. Subsequently, the lower tier employs a mixed integer programming construction, modeled as a multiple Traveling Salesman Problem (mTSP) with capacity and resource constraints, wherein the goal is to optimize user association and scheduling of the released batch of user requests via graphical branch-and-bound, 3D UAV service positions via zero-forcing beam-forming and two-stage exhaustive grid search, and 3D UAV trajectories using learning based competitive swarm optimization. Simulations prove that the HRL solution outperforms static UAV deployments (63%), adaptive Voronoi decompositions (33%), and state-of-the-art iterative fleet automation algorithms (69%), vis-á-vis user quality-of-service and UAV power efficiency.
AB - This work addresses the control and coordination of a fleet of MIMO-capable UAVs for efficiently harvesting prioritized traffic from a random distribution of heterogeneous MIMO-capable users. The objective is to maximize the infinite horizon average reward per UAV subject to mobility and average power constraints. Consequently, this problem is formulated as a Dynamic Pickup and Delivery Problem (DPDP) and solved via a Hierarchical Reinforcement Learning (HRL) framework. In the upper tier of the policy hierarchy, a double deep Q-network adaptively partitions the DPDP into a series of static Pickup and Delivery Problems (PDPs) of varying timescales by dynamically caching and releasing batches of user requests. Subsequently, the lower tier employs a mixed integer programming construction, modeled as a multiple Traveling Salesman Problem (mTSP) with capacity and resource constraints, wherein the goal is to optimize user association and scheduling of the released batch of user requests via graphical branch-and-bound, 3D UAV service positions via zero-forcing beam-forming and two-stage exhaustive grid search, and 3D UAV trajectories using learning based competitive swarm optimization. Simulations prove that the HRL solution outperforms static UAV deployments (63%), adaptive Voronoi decompositions (33%), and state-of-the-art iterative fleet automation algorithms (69%), vis-á-vis user quality-of-service and UAV power efficiency.
UR - https://www.scopus.com/pages/publications/105002692576
UR - https://www.scopus.com/pages/publications/105002692576#tab=citedBy
U2 - 10.1109/IEEECONF60004.2024.10942694
DO - 10.1109/IEEECONF60004.2024.10942694
M3 - Conference contribution
AN - SCOPUS:105002692576
T3 - Conference Record - Asilomar Conference on Signals, Systems and Computers
SP - 7
EP - 11
BT - Conference Record of the 58th Asilomar Conference on Signals, Systems and Computers, ACSSC 2024
A2 - Matthews, Michael B.
PB - IEEE Computer Society
T2 - 58th Asilomar Conference on Signals, Systems and Computers, ACSSC 2024
Y2 - 27 October 2024 through 30 October 2024
ER -