Language-conditioned imitation learning for robot manipulation tasks

Simon Stepputtis; Joseph Campbell; Mariano Phielipp; Stefan Lee; Chitta Baral; Heni Ben Amor

Language-conditioned imitation learning for robot manipulation tasks

Simon Stepputtis, Joseph Campbell, Mariano Phielipp, Stefan Lee, Chitta Baral, Heni Ben Amor

Engineering, Ira A. Fulton Schools of (IAFSE)

Research output: Contribution to journal › Conference article › peer-review

55 Scopus citations

Abstract

Imitation learning is a popular approach for teaching motor skills to robots. However, most approaches focus on extracting policy parameters from execution traces alone (i.e., motion trajectories and perceptual data). No adequate communication channel exists between the human expert and the robot to describe critical aspects of the task, such as the properties of the target object or the intended shape of the motion. Motivated by insights into the human teaching process, we introduce a method for incorporating unstructured natural language into imitation learning. At training time, the expert can provide demonstrations along with verbal descriptions in order to describe the underlying intent (e.g., “go to the large green bowl”). The training process then interrelates these two modalities to encode the correlations between language, perception, and motion. The resulting language-conditioned visuomotor policies can be conditioned at runtime on new human commands and instructions, which allows for more fine-grained control over the trained policies while also reducing situational ambiguity. We demonstrate in a set of simulation experiments how our approach can learn language-conditioned manipulation policies for a seven-degree-of-freedom robot arm and compare the results to a variety of alternative methods.

Original language	English (US)
Journal	Advances in Neural Information Processing Systems
Volume	2020-December
State	Published - 2020
Event	34th Conference on Neural Information Processing Systems, NeurIPS 2020 - Virtual, Online Duration: Dec 6 2020 → Dec 12 2020

ASJC Scopus subject areas

Computer Networks and Communications
Information Systems
Signal Processing

Cite this

@article{a5d70336694c4a46be16315171dc550d,

title = "Language-conditioned imitation learning for robot manipulation tasks",

abstract = "Imitation learning is a popular approach for teaching motor skills to robots. However, most approaches focus on extracting policy parameters from execution traces alone (i.e., motion trajectories and perceptual data). No adequate communication channel exists between the human expert and the robot to describe critical aspects of the task, such as the properties of the target object or the intended shape of the motion. Motivated by insights into the human teaching process, we introduce a method for incorporating unstructured natural language into imitation learning. At training time, the expert can provide demonstrations along with verbal descriptions in order to describe the underlying intent (e.g., “go to the large green bowl”). The training process then interrelates these two modalities to encode the correlations between language, perception, and motion. The resulting language-conditioned visuomotor policies can be conditioned at runtime on new human commands and instructions, which allows for more fine-grained control over the trained policies while also reducing situational ambiguity. We demonstrate in a set of simulation experiments how our approach can learn language-conditioned manipulation policies for a seven-degree-of-freedom robot arm and compare the results to a variety of alternative methods.",

author = "Simon Stepputtis and Joseph Campbell and Mariano Phielipp and Stefan Lee and Chitta Baral and Amor, {Heni Ben}",

note = "Funding Information: This work was supported by a grant from the Interplanetary Initiative at Arizona State University. We would like to thank Lindy Elkins-Tanton, Katsu Yamane, and Benjamin Kuipers for their valuable insights and feedback during the early stages of this project. Publisher Copyright: {\textcopyright} 2020 Neural information processing systems foundation. All rights reserved.; 34th Conference on Neural Information Processing Systems, NeurIPS 2020 ; Conference date: 06-12-2020 Through 12-12-2020",

year = "2020",

language = "English (US)",

volume = "2020-December",

journal = "Advances in Neural Information Processing Systems",

issn = "1049-5258",

}

TY - JOUR

T1 - Language-conditioned imitation learning for robot manipulation tasks

AU - Stepputtis, Simon

AU - Campbell, Joseph

AU - Phielipp, Mariano

AU - Lee, Stefan

AU - Baral, Chitta

AU - Amor, Heni Ben

N1 - Funding Information: This work was supported by a grant from the Interplanetary Initiative at Arizona State University. We would like to thank Lindy Elkins-Tanton, Katsu Yamane, and Benjamin Kuipers for their valuable insights and feedback during the early stages of this project. Publisher Copyright: © 2020 Neural information processing systems foundation. All rights reserved.

PY - 2020

Y1 - 2020

N2 - Imitation learning is a popular approach for teaching motor skills to robots. However, most approaches focus on extracting policy parameters from execution traces alone (i.e., motion trajectories and perceptual data). No adequate communication channel exists between the human expert and the robot to describe critical aspects of the task, such as the properties of the target object or the intended shape of the motion. Motivated by insights into the human teaching process, we introduce a method for incorporating unstructured natural language into imitation learning. At training time, the expert can provide demonstrations along with verbal descriptions in order to describe the underlying intent (e.g., “go to the large green bowl”). The training process then interrelates these two modalities to encode the correlations between language, perception, and motion. The resulting language-conditioned visuomotor policies can be conditioned at runtime on new human commands and instructions, which allows for more fine-grained control over the trained policies while also reducing situational ambiguity. We demonstrate in a set of simulation experiments how our approach can learn language-conditioned manipulation policies for a seven-degree-of-freedom robot arm and compare the results to a variety of alternative methods.

AB - Imitation learning is a popular approach for teaching motor skills to robots. However, most approaches focus on extracting policy parameters from execution traces alone (i.e., motion trajectories and perceptual data). No adequate communication channel exists between the human expert and the robot to describe critical aspects of the task, such as the properties of the target object or the intended shape of the motion. Motivated by insights into the human teaching process, we introduce a method for incorporating unstructured natural language into imitation learning. At training time, the expert can provide demonstrations along with verbal descriptions in order to describe the underlying intent (e.g., “go to the large green bowl”). The training process then interrelates these two modalities to encode the correlations between language, perception, and motion. The resulting language-conditioned visuomotor policies can be conditioned at runtime on new human commands and instructions, which allows for more fine-grained control over the trained policies while also reducing situational ambiguity. We demonstrate in a set of simulation experiments how our approach can learn language-conditioned manipulation policies for a seven-degree-of-freedom robot arm and compare the results to a variety of alternative methods.

UR - http://www.scopus.com/inward/record.url?scp=85108454531&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85108454531&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:85108454531

SN - 1049-5258

VL - 2020-December

JO - Advances in Neural Information Processing Systems

JF - Advances in Neural Information Processing Systems

T2 - 34th Conference on Neural Information Processing Systems, NeurIPS 2020

Y2 - 6 December 2020 through 12 December 2020

ER -

Language-conditioned imitation learning for robot manipulation tasks

Abstract

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this