TY - GEN
T1 - Epistemic Exploration for Generalizable Planning and Learning in Non-Stationary Settings
AU - Karia, Rushang
AU - Verma, Pulkit
AU - Speranzon, Alberto
AU - Srivastava, Siddharth
N1 - Publisher Copyright:
Copyright © 2024, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
PY - 2024/5/30
Y1 - 2024/5/30
N2 - This paper introduces a new approach for continual planning and model learning in relational, non-stationary stochastic environments. Such capabilities are essential for the deployment of sequential decision-making systems in the uncertain and constantly evolving real world. Working in such practical settings with unknown (and non-stationary) transition systems and changing tasks, the proposed framework models gaps in the agent's current state of knowledge and uses them to conduct focused, investigative explorations. Data collected using these explorations is used for learning generalizable probabilistic models for solving the current task despite continual changes in the environment dynamics. Empirical evaluations on several non-stationary benchmark domains show that this approach significantly outperforms planning and RL baselines in terms of sample complexity. Theoretical results show that the system exhibits desirable convergence properties when stationarity holds.
AB - This paper introduces a new approach for continual planning and model learning in relational, non-stationary stochastic environments. Such capabilities are essential for the deployment of sequential decision-making systems in the uncertain and constantly evolving real world. Working in such practical settings with unknown (and non-stationary) transition systems and changing tasks, the proposed framework models gaps in the agent's current state of knowledge and uses them to conduct focused, investigative explorations. Data collected using these explorations is used for learning generalizable probabilistic models for solving the current task despite continual changes in the environment dynamics. Empirical evaluations on several non-stationary benchmark domains show that this approach significantly outperforms planning and RL baselines in terms of sample complexity. Theoretical results show that the system exhibits desirable convergence properties when stationarity holds.
UR - http://www.scopus.com/inward/record.url?scp=85195919313&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85195919313&partnerID=8YFLogxK
U2 - 10.1609/icaps.v34i1.31489
DO - 10.1609/icaps.v34i1.31489
M3 - Conference contribution
AN - SCOPUS:85195919313
T3 - Proceedings International Conference on Automated Planning and Scheduling, ICAPS
SP - 310
EP - 318
BT - Proceedings of the 34th International Conference on Automated Planning and Scheduling, ICAPS 2024
A2 - Bernardini, Sara
A2 - Muise, Christian
PB - Association for the Advancement of Artificial Intelligence
T2 - 34th International Conference on Automated Planning and Scheduling, ICAPS 2024
Y2 - 1 June 2024 through 6 June 2024
ER -