Multiagent Reinforcement Learning for Autonomous Routing and Pickup Problem with Adaptation to Variable Demand

Daniel Garces; Sushmita Bhattacharya; Stephanie Gil; Dimitri Bertsekas

doi:10.1109/ICRA48891.2023.10161067

Multiagent Reinforcement Learning for Autonomous Routing and Pickup Problem with Adaptation to Variable Demand

Daniel Garces, Sushmita Bhattacharya, Stephanie Gil, Dimitri Bertsekas

Engineering, Ira A. Fulton Schools of (IAFSE)

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

1 Scopus citations

Abstract

We derive a learning framework to generate routing/pickup policies for a fleet of autonomous vehicles tasked with servicing stochastically appearing requests on a city map. We focus on policies that 1) give rise to coordination amongst the vehicles, thereby reducing wait times for servicing requests, 2) are non-myopic, and consider a-priori potential future requests, 3) can adapt to changes in the underlying demand distribution. Specifically, we are interested in policies that are adaptive to fluctuations of actual demand conditions in urban environments, such as on-peak vs. off-peak hours. We achieve this through a combination of (i) an online play algorithm that improves the performance of an offline-trained policy, and (ii) an offline approximation scheme that allows for adapting to changes in the underlying demand model. In particular, we achieve adaptivity of our learned policy to different demand distributions by quantifying a region of validity using the q-valid radius of a Wasserstein Ambiguity Set. We propose a mechanism for switching the originally trained offline approximation when the current demand is outside the original validity region. In this case, we propose to use an offline architecture, trained on a historical demand model that is closer to the current demand in terms of Wasserstein distance. We learn routing and pickup policies over real taxicab requests in San Francisco with high variability between on-peak and off-peak hours, demonstrating the ability of our method to adapt to real fluctuation in demand distributions. Our numerical results demonstrate that our method outperforms alternative rollout-based reinforcement learning schemes, as well as other classical methods from operations research.

Original language	English (US)
Title of host publication	Proceedings - ICRA 2023
Subtitle of host publication	IEEE International Conference on Robotics and Automation
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	3524-3531
Number of pages	8
ISBN (Electronic)	9798350323658
DOIs	https://doi.org/10.1109/ICRA48891.2023.10161067
State	Published - 2023
Event	2023 IEEE International Conference on Robotics and Automation, ICRA 2023 - London, United Kingdom Duration: May 29 2023 → Jun 2 2023

Publication series

Name	Proceedings - IEEE International Conference on Robotics and Automation
Volume	2023-May
ISSN (Print)	1050-4729

Conference

Conference	2023 IEEE International Conference on Robotics and Automation, ICRA 2023
Country/Territory	United Kingdom
City	London
Period	5/29/23 → 6/2/23

ASJC Scopus subject areas

Software
Control and Systems Engineering
Electrical and Electronic Engineering
Artificial Intelligence

Access to Document

10.1109/ICRA48891.2023.10161067

Cite this

Garces, D., Bhattacharya, S., Gil, S., & Bertsekas, D. (2023). Multiagent Reinforcement Learning for Autonomous Routing and Pickup Problem with Adaptation to Variable Demand. In Proceedings - ICRA 2023: IEEE International Conference on Robotics and Automation (pp. 3524-3531). (Proceedings - IEEE International Conference on Robotics and Automation; Vol. 2023-May). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICRA48891.2023.10161067

Multiagent Reinforcement Learning for Autonomous Routing and Pickup Problem with Adaptation to Variable Demand. / Garces, Daniel; Bhattacharya, Sushmita; Gil, Stephanie et al.
Proceedings - ICRA 2023: IEEE International Conference on Robotics and Automation. Institute of Electrical and Electronics Engineers Inc., 2023. p. 3524-3531 (Proceedings - IEEE International Conference on Robotics and Automation; Vol. 2023-May).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Garces, D, Bhattacharya, S, Gil, S & Bertsekas, D 2023, Multiagent Reinforcement Learning for Autonomous Routing and Pickup Problem with Adaptation to Variable Demand. in Proceedings - ICRA 2023: IEEE International Conference on Robotics and Automation. Proceedings - IEEE International Conference on Robotics and Automation, vol. 2023-May, Institute of Electrical and Electronics Engineers Inc., pp. 3524-3531, 2023 IEEE International Conference on Robotics and Automation, ICRA 2023, London, United Kingdom, 5/29/23. https://doi.org/10.1109/ICRA48891.2023.10161067

Garces D, Bhattacharya S, Gil S, Bertsekas D. Multiagent Reinforcement Learning for Autonomous Routing and Pickup Problem with Adaptation to Variable Demand. In Proceedings - ICRA 2023: IEEE International Conference on Robotics and Automation. Institute of Electrical and Electronics Engineers Inc. 2023. p. 3524-3531. (Proceedings - IEEE International Conference on Robotics and Automation). doi: 10.1109/ICRA48891.2023.10161067

Garces, Daniel ; Bhattacharya, Sushmita ; Gil, Stephanie et al. / Multiagent Reinforcement Learning for Autonomous Routing and Pickup Problem with Adaptation to Variable Demand. Proceedings - ICRA 2023: IEEE International Conference on Robotics and Automation. Institute of Electrical and Electronics Engineers Inc., 2023. pp. 3524-3531 (Proceedings - IEEE International Conference on Robotics and Automation).

@inproceedings{7da7c6df274b47d3abce9aa58668b40b,

title = "Multiagent Reinforcement Learning for Autonomous Routing and Pickup Problem with Adaptation to Variable Demand",

abstract = "We derive a learning framework to generate routing/pickup policies for a fleet of autonomous vehicles tasked with servicing stochastically appearing requests on a city map. We focus on policies that 1) give rise to coordination amongst the vehicles, thereby reducing wait times for servicing requests, 2) are non-myopic, and consider a-priori potential future requests, 3) can adapt to changes in the underlying demand distribution. Specifically, we are interested in policies that are adaptive to fluctuations of actual demand conditions in urban environments, such as on-peak vs. off-peak hours. We achieve this through a combination of (i) an online play algorithm that improves the performance of an offline-trained policy, and (ii) an offline approximation scheme that allows for adapting to changes in the underlying demand model. In particular, we achieve adaptivity of our learned policy to different demand distributions by quantifying a region of validity using the q-valid radius of a Wasserstein Ambiguity Set. We propose a mechanism for switching the originally trained offline approximation when the current demand is outside the original validity region. In this case, we propose to use an offline architecture, trained on a historical demand model that is closer to the current demand in terms of Wasserstein distance. We learn routing and pickup policies over real taxicab requests in San Francisco with high variability between on-peak and off-peak hours, demonstrating the ability of our method to adapt to real fluctuation in demand distributions. Our numerical results demonstrate that our method outperforms alternative rollout-based reinforcement learning schemes, as well as other classical methods from operations research.",

author = "Daniel Garces and Sushmita Bhattacharya and Stephanie Gil and Dimitri Bertsekas",

note = "Publisher Copyright: {\textcopyright} 2023 IEEE.; 2023 IEEE International Conference on Robotics and Automation, ICRA 2023 ; Conference date: 29-05-2023 Through 02-06-2023",

year = "2023",

doi = "10.1109/ICRA48891.2023.10161067",

language = "English (US)",

series = "Proceedings - IEEE International Conference on Robotics and Automation",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "3524--3531",

booktitle = "Proceedings - ICRA 2023",

}

TY - GEN

T1 - Multiagent Reinforcement Learning for Autonomous Routing and Pickup Problem with Adaptation to Variable Demand

AU - Garces, Daniel

AU - Bhattacharya, Sushmita

AU - Gil, Stephanie

AU - Bertsekas, Dimitri

PY - 2023

Y1 - 2023

N2 - We derive a learning framework to generate routing/pickup policies for a fleet of autonomous vehicles tasked with servicing stochastically appearing requests on a city map. We focus on policies that 1) give rise to coordination amongst the vehicles, thereby reducing wait times for servicing requests, 2) are non-myopic, and consider a-priori potential future requests, 3) can adapt to changes in the underlying demand distribution. Specifically, we are interested in policies that are adaptive to fluctuations of actual demand conditions in urban environments, such as on-peak vs. off-peak hours. We achieve this through a combination of (i) an online play algorithm that improves the performance of an offline-trained policy, and (ii) an offline approximation scheme that allows for adapting to changes in the underlying demand model. In particular, we achieve adaptivity of our learned policy to different demand distributions by quantifying a region of validity using the q-valid radius of a Wasserstein Ambiguity Set. We propose a mechanism for switching the originally trained offline approximation when the current demand is outside the original validity region. In this case, we propose to use an offline architecture, trained on a historical demand model that is closer to the current demand in terms of Wasserstein distance. We learn routing and pickup policies over real taxicab requests in San Francisco with high variability between on-peak and off-peak hours, demonstrating the ability of our method to adapt to real fluctuation in demand distributions. Our numerical results demonstrate that our method outperforms alternative rollout-based reinforcement learning schemes, as well as other classical methods from operations research.

AB - We derive a learning framework to generate routing/pickup policies for a fleet of autonomous vehicles tasked with servicing stochastically appearing requests on a city map. We focus on policies that 1) give rise to coordination amongst the vehicles, thereby reducing wait times for servicing requests, 2) are non-myopic, and consider a-priori potential future requests, 3) can adapt to changes in the underlying demand distribution. Specifically, we are interested in policies that are adaptive to fluctuations of actual demand conditions in urban environments, such as on-peak vs. off-peak hours. We achieve this through a combination of (i) an online play algorithm that improves the performance of an offline-trained policy, and (ii) an offline approximation scheme that allows for adapting to changes in the underlying demand model. In particular, we achieve adaptivity of our learned policy to different demand distributions by quantifying a region of validity using the q-valid radius of a Wasserstein Ambiguity Set. We propose a mechanism for switching the originally trained offline approximation when the current demand is outside the original validity region. In this case, we propose to use an offline architecture, trained on a historical demand model that is closer to the current demand in terms of Wasserstein distance. We learn routing and pickup policies over real taxicab requests in San Francisco with high variability between on-peak and off-peak hours, demonstrating the ability of our method to adapt to real fluctuation in demand distributions. Our numerical results demonstrate that our method outperforms alternative rollout-based reinforcement learning schemes, as well as other classical methods from operations research.

UR - http://www.scopus.com/inward/record.url?scp=85165631901&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85165631901&partnerID=8YFLogxK

U2 - 10.1109/ICRA48891.2023.10161067

DO - 10.1109/ICRA48891.2023.10161067

M3 - Conference contribution

AN - SCOPUS:85165631901

T3 - Proceedings - IEEE International Conference on Robotics and Automation

SP - 3524

EP - 3531

BT - Proceedings - ICRA 2023

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2023 IEEE International Conference on Robotics and Automation, ICRA 2023

Y2 - 29 May 2023 through 2 June 2023

ER -

Multiagent Reinforcement Learning for Autonomous Routing and Pickup Problem with Adaptation to Variable Demand

Abstract

Publication series

Conference

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this