Counterfactually-Guided Causal Reinforcement Learning with Reward Machines

Nasim Baharisangari, Yash Paliwal, Zhe Xu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In causal reinforcement learning (RL), counterfactual reasoning deals with 'what if' situations and allows for investigating the potential consequences of actions or events that did not actually happen. In this paper, we combine counterfactual reasoning and reinforcement learning (RL) and propose Counterfactually-Guided Causal Reinforcement Learning with Reward Machines (CGC-RL). In CGC-RL, using observational data, we first compute the optimal counterfactual sequence with the highest probability of completing a given task. Then, we construct an RM compatible with the counterfactual sequence. We use the constructed RM to apply dynamic potential-based reward shaping to encourage the agent to follow the counterfactual sequence. We prove the policy-invariance under dynamic reward shaping with RMs. Finally, we implement CGC-RL in one case study and compare the results with three baselines. Our results show that CGC-RL outperforms the baselines.

Original languageEnglish (US)
Title of host publication2024 American Control Conference, ACC 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages522-527
Number of pages6
ISBN (Electronic)9798350382655
DOIs
StatePublished - 2024
Event2024 American Control Conference, ACC 2024 - Toronto, Canada
Duration: Jul 10 2024Jul 12 2024

Publication series

NameProceedings of the American Control Conference
ISSN (Print)0743-1619

Conference

Conference2024 American Control Conference, ACC 2024
Country/TerritoryCanada
CityToronto
Period7/10/247/12/24

ASJC Scopus subject areas

  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Counterfactually-Guided Causal Reinforcement Learning with Reward Machines'. Together they form a unique fingerprint.

Cite this