TY - GEN
T1 - Counterfactually-Guided Causal Reinforcement Learning with Reward Machines
AU - Baharisangari, Nasim
AU - Paliwal, Yash
AU - Xu, Zhe
N1 - Publisher Copyright:
© 2024 AACC.
PY - 2024
Y1 - 2024
N2 - In causal reinforcement learning (RL), counterfactual reasoning deals with 'what if' situations and allows for investigating the potential consequences of actions or events that did not actually happen. In this paper, we combine counterfactual reasoning and reinforcement learning (RL) and propose Counterfactually-Guided Causal Reinforcement Learning with Reward Machines (CGC-RL). In CGC-RL, using observational data, we first compute the optimal counterfactual sequence with the highest probability of completing a given task. Then, we construct an RM compatible with the counterfactual sequence. We use the constructed RM to apply dynamic potential-based reward shaping to encourage the agent to follow the counterfactual sequence. We prove the policy-invariance under dynamic reward shaping with RMs. Finally, we implement CGC-RL in one case study and compare the results with three baselines. Our results show that CGC-RL outperforms the baselines.
AB - In causal reinforcement learning (RL), counterfactual reasoning deals with 'what if' situations and allows for investigating the potential consequences of actions or events that did not actually happen. In this paper, we combine counterfactual reasoning and reinforcement learning (RL) and propose Counterfactually-Guided Causal Reinforcement Learning with Reward Machines (CGC-RL). In CGC-RL, using observational data, we first compute the optimal counterfactual sequence with the highest probability of completing a given task. Then, we construct an RM compatible with the counterfactual sequence. We use the constructed RM to apply dynamic potential-based reward shaping to encourage the agent to follow the counterfactual sequence. We prove the policy-invariance under dynamic reward shaping with RMs. Finally, we implement CGC-RL in one case study and compare the results with three baselines. Our results show that CGC-RL outperforms the baselines.
UR - http://www.scopus.com/inward/record.url?scp=85204429004&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85204429004&partnerID=8YFLogxK
U2 - 10.23919/ACC60939.2024.10644503
DO - 10.23919/ACC60939.2024.10644503
M3 - Conference contribution
AN - SCOPUS:85204429004
T3 - Proceedings of the American Control Conference
SP - 522
EP - 527
BT - 2024 American Control Conference, ACC 2024
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2024 American Control Conference, ACC 2024
Y2 - 10 July 2024 through 12 July 2024
ER -