Learning from Ambiguous Demonstrations with Self-Explanation Guided Reinforcement Learning

Yantian Zha; Lin Guan; Subbarao Kambhampati

doi:10.1609/aaai.v38i9.28907

Learning from Ambiguous Demonstrations with Self-Explanation Guided Reinforcement Learning

Yantian Zha, Lin Guan, Subbarao Kambhampati

Engineering, Ira A. Fulton Schools of (IAFSE)

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Abstract

Our work aims at efficiently leveraging ambiguous demonstrations for the training of a reinforcement learning (RL) agent. An ambiguous demonstration can usually be interpreted in multiple ways, which severely hinders the RL agent from learning stably and efficiently. Since an optimal demonstration may also suffer from being ambiguous, previous works that combine RL and learning from demonstration (RLfD works) may not work well. Inspired by how humans handle such situations, we propose to use self-explanation (an agent generates explanations for itself) to recognize valuable high-level relational features as an interpretation of why a successful trajectory is successful. This way, the agent can leverage the explained important relations as guidance for its RL learning. Our main contribution is to propose the Self-Explanation for RL from Demonstrations (SERLfD) framework, which can overcome the limitations of existing RLfD works. Our experimental results show that an RLfD model can be improved by using our SERLfD framework in terms of training stability and performance. To foster further research in self-explanation-guided robot learning, we have made our demonstrations and code publicly accessible at https://github.com/YantianZha/SERLfD. For a deeper understanding of our work, interested readers can refer to our arXiv version at https://arxiv.org/pdf/2110.05286.pdf, including an accompanying appendix.

Original language	English (US)
Title of host publication	Technical Tracks 14
Editors	Michael Wooldridge, Jennifer Dy, Sriraam Natarajan
Publisher	Association for the Advancement of Artificial Intelligence
Pages	10395-10403
Number of pages	9
Edition	9
ISBN (Electronic)	1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879
DOIs	https://doi.org/10.1609/aaai.v38i9.28907
State	Published - Mar 25 2024
Event	38th AAAI Conference on Artificial Intelligence, AAAI 2024 - Vancouver, Canada Duration: Feb 20 2024 → Feb 27 2024

Publication series

Name	Proceedings of the AAAI Conference on Artificial Intelligence
Number	9
Volume	38
ISSN (Print)	2159-5399
ISSN (Electronic)	2374-3468

Conference

Conference	38th AAAI Conference on Artificial Intelligence, AAAI 2024
Country/Territory	Canada
City	Vancouver
Period	2/20/24 → 2/27/24

ASJC Scopus subject areas

Artificial Intelligence

Access to Document

10.1609/aaai.v38i9.28907

Cite this

Zha, Y., Guan, L., & Kambhampati, S. (2024). Learning from Ambiguous Demonstrations with Self-Explanation Guided Reinforcement Learning. In M. Wooldridge, J. Dy, & S. Natarajan (Eds.), Technical Tracks 14 (9 ed., pp. 10395-10403). (Proceedings of the AAAI Conference on Artificial Intelligence; Vol. 38, No. 9). Association for the Advancement of Artificial Intelligence. https://doi.org/10.1609/aaai.v38i9.28907

Learning from Ambiguous Demonstrations with Self-Explanation Guided Reinforcement Learning. / Zha, Yantian; Guan, Lin; Kambhampati, Subbarao.
Technical Tracks 14. ed. / Michael Wooldridge; Jennifer Dy; Sriraam Natarajan. 9. ed. Association for the Advancement of Artificial Intelligence, 2024. p. 10395-10403 (Proceedings of the AAAI Conference on Artificial Intelligence; Vol. 38, No. 9).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Zha, Y, Guan, L & Kambhampati, S 2024, Learning from Ambiguous Demonstrations with Self-Explanation Guided Reinforcement Learning. in M Wooldridge, J Dy & S Natarajan (eds), Technical Tracks 14. 9 edn, Proceedings of the AAAI Conference on Artificial Intelligence, no. 9, vol. 38, Association for the Advancement of Artificial Intelligence, pp. 10395-10403, 38th AAAI Conference on Artificial Intelligence, AAAI 2024, Vancouver, Canada, 2/20/24. https://doi.org/10.1609/aaai.v38i9.28907

Zha Y, Guan L, Kambhampati S. Learning from Ambiguous Demonstrations with Self-Explanation Guided Reinforcement Learning. In Wooldridge M, Dy J, Natarajan S, editors, Technical Tracks 14. 9 ed. Association for the Advancement of Artificial Intelligence. 2024. p. 10395-10403. (Proceedings of the AAAI Conference on Artificial Intelligence; 9). doi: 10.1609/aaai.v38i9.28907

Zha, Yantian ; Guan, Lin ; Kambhampati, Subbarao. / Learning from Ambiguous Demonstrations with Self-Explanation Guided Reinforcement Learning. Technical Tracks 14. editor / Michael Wooldridge ; Jennifer Dy ; Sriraam Natarajan. 9. ed. Association for the Advancement of Artificial Intelligence, 2024. pp. 10395-10403 (Proceedings of the AAAI Conference on Artificial Intelligence; 9).

@inproceedings{44943ec0baff4e2ab02f897c1ec9f683,

title = "Learning from Ambiguous Demonstrations with Self-Explanation Guided Reinforcement Learning",

abstract = "Our work aims at efficiently leveraging ambiguous demonstrations for the training of a reinforcement learning (RL) agent. An ambiguous demonstration can usually be interpreted in multiple ways, which severely hinders the RL agent from learning stably and efficiently. Since an optimal demonstration may also suffer from being ambiguous, previous works that combine RL and learning from demonstration (RLfD works) may not work well. Inspired by how humans handle such situations, we propose to use self-explanation (an agent generates explanations for itself) to recognize valuable high-level relational features as an interpretation of why a successful trajectory is successful. This way, the agent can leverage the explained important relations as guidance for its RL learning. Our main contribution is to propose the Self-Explanation for RL from Demonstrations (SERLfD) framework, which can overcome the limitations of existing RLfD works. Our experimental results show that an RLfD model can be improved by using our SERLfD framework in terms of training stability and performance. To foster further research in self-explanation-guided robot learning, we have made our demonstrations and code publicly accessible at https://github.com/YantianZha/SERLfD. For a deeper understanding of our work, interested readers can refer to our arXiv version at https://arxiv.org/pdf/2110.05286.pdf, including an accompanying appendix.",

author = "Yantian Zha and Lin Guan and Subbarao Kambhampati",

note = "Publisher Copyright: Copyright {\textcopyright} 2024, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.; 38th AAAI Conference on Artificial Intelligence, AAAI 2024 ; Conference date: 20-02-2024 Through 27-02-2024",

year = "2024",

month = mar,

day = "25",

doi = "10.1609/aaai.v38i9.28907",

language = "English (US)",

series = "Proceedings of the AAAI Conference on Artificial Intelligence",

publisher = "Association for the Advancement of Artificial Intelligence",

number = "9",

pages = "10395--10403",

editor = "Michael Wooldridge and Jennifer Dy and Sriraam Natarajan",

booktitle = "Technical Tracks 14",

edition = "9",

}

TY - GEN

T1 - Learning from Ambiguous Demonstrations with Self-Explanation Guided Reinforcement Learning

AU - Zha, Yantian

AU - Guan, Lin

AU - Kambhampati, Subbarao

PY - 2024/3/25

Y1 - 2024/3/25

N2 - Our work aims at efficiently leveraging ambiguous demonstrations for the training of a reinforcement learning (RL) agent. An ambiguous demonstration can usually be interpreted in multiple ways, which severely hinders the RL agent from learning stably and efficiently. Since an optimal demonstration may also suffer from being ambiguous, previous works that combine RL and learning from demonstration (RLfD works) may not work well. Inspired by how humans handle such situations, we propose to use self-explanation (an agent generates explanations for itself) to recognize valuable high-level relational features as an interpretation of why a successful trajectory is successful. This way, the agent can leverage the explained important relations as guidance for its RL learning. Our main contribution is to propose the Self-Explanation for RL from Demonstrations (SERLfD) framework, which can overcome the limitations of existing RLfD works. Our experimental results show that an RLfD model can be improved by using our SERLfD framework in terms of training stability and performance. To foster further research in self-explanation-guided robot learning, we have made our demonstrations and code publicly accessible at https://github.com/YantianZha/SERLfD. For a deeper understanding of our work, interested readers can refer to our arXiv version at https://arxiv.org/pdf/2110.05286.pdf, including an accompanying appendix.

AB - Our work aims at efficiently leveraging ambiguous demonstrations for the training of a reinforcement learning (RL) agent. An ambiguous demonstration can usually be interpreted in multiple ways, which severely hinders the RL agent from learning stably and efficiently. Since an optimal demonstration may also suffer from being ambiguous, previous works that combine RL and learning from demonstration (RLfD works) may not work well. Inspired by how humans handle such situations, we propose to use self-explanation (an agent generates explanations for itself) to recognize valuable high-level relational features as an interpretation of why a successful trajectory is successful. This way, the agent can leverage the explained important relations as guidance for its RL learning. Our main contribution is to propose the Self-Explanation for RL from Demonstrations (SERLfD) framework, which can overcome the limitations of existing RLfD works. Our experimental results show that an RLfD model can be improved by using our SERLfD framework in terms of training stability and performance. To foster further research in self-explanation-guided robot learning, we have made our demonstrations and code publicly accessible at https://github.com/YantianZha/SERLfD. For a deeper understanding of our work, interested readers can refer to our arXiv version at https://arxiv.org/pdf/2110.05286.pdf, including an accompanying appendix.

UR - http://www.scopus.com/inward/record.url?scp=85189298764&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85189298764&partnerID=8YFLogxK

U2 - 10.1609/aaai.v38i9.28907

DO - 10.1609/aaai.v38i9.28907

M3 - Conference contribution

AN - SCOPUS:85189298764

T3 - Proceedings of the AAAI Conference on Artificial Intelligence

SP - 10395

EP - 10403

BT - Technical Tracks 14

A2 - Wooldridge, Michael

A2 - Dy, Jennifer

A2 - Natarajan, Sriraam

PB - Association for the Advancement of Artificial Intelligence

T2 - 38th AAAI Conference on Artificial Intelligence, AAAI 2024

Y2 - 20 February 2024 through 27 February 2024

ER -

Learning from Ambiguous Demonstrations with Self-Explanation Guided Reinforcement Learning

Abstract

Publication series

Conference

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this