Cooking with blocks: A recipe for visual reasoning on image-pairs

Tejas Gokhale; Shailaja Sampat; Zhiyuan Fang; Yezhou Yang; Chitta Baral

Cooking with blocks: A recipe for visual reasoning on image-pairs

Tejas Gokhale, Shailaja Sampat, Zhiyuan Fang, Yezhou Yang, Chitta Baral

Engineering, Ira A. Fulton Schools of (IAFSE)

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

2 Scopus citations

Abstract

The ability of identifying changes or transformations in a scene and to reason about their causes and effects, is a key aspect of intelligence. In this work we go beyond recent advances in computational perception, and introduce a more challenging task, Image-based Event-Sequencing (IES). In IES, the task is to predict a sequence of actions required to rearrange objects from the configuration in an input source image to the one in the target image. IES also requires systems to possess inductive generalizability. Motivated from evidence in cognitive development, we compile the first IES dataset, the Blocksworld Image Reasoning Dataset (BIRD) which contains images of wooden blocks in different configurations, and the sequence of moves to rearrange one configuration to the other. We first explore the use of existing deep learning architectures and show that these end-to-end methods under-perform in inferring temporal event-sequences and fail at inductive generalization. We propose a modular two-step approach: Visual Perception followed by Event-Sequencing, and demonstrate improved performance by combining learning and reasoning. Finally, by showing an extension of our approach on natural images, we seek to pave the way for future research on event sequencing for real world scenes.

Original language	English (US)
Title of host publication	Proceedings - 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2019
Publisher	IEEE Computer Society
Pages	5-8
Number of pages	4
ISBN (Electronic)	9781728125060
State	Published - Jun 2019
Event	32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2019 - Long Beach, United States Duration: Jun 16 2019 → Jun 20 2019

Publication series

Name	IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops
Volume	2019-June
ISSN (Print)	2160-7508
ISSN (Electronic)	2160-7516

Conference

Conference	32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2019
Country/Territory	United States
City	Long Beach
Period	6/16/19 → 6/20/19

ASJC Scopus subject areas

Computer Vision and Pattern Recognition
Electrical and Electronic Engineering

Cite this

Gokhale, T., Sampat, S., Fang, Z., Yang, Y., & Baral, C. (2019). Cooking with blocks: A recipe for visual reasoning on image-pairs. In Proceedings - 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2019 (pp. 5-8). (IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops; Vol. 2019-June). IEEE Computer Society.

Cooking with blocks: A recipe for visual reasoning on image-pairs. / Gokhale, Tejas; Sampat, Shailaja; Fang, Zhiyuan et al.
Proceedings - 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2019. IEEE Computer Society, 2019. p. 5-8 (IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops; Vol. 2019-June).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Gokhale, T, Sampat, S, Fang, Z, Yang, Y & Baral, C 2019, Cooking with blocks: A recipe for visual reasoning on image-pairs. in Proceedings - 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2019. IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, vol. 2019-June, IEEE Computer Society, pp. 5-8, 32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2019, Long Beach, United States, 6/16/19.

@inproceedings{d7ccd1b5aa884753bdd269d01b10b1b3,

title = "Cooking with blocks: A recipe for visual reasoning on image-pairs",

abstract = "The ability of identifying changes or transformations in a scene and to reason about their causes and effects, is a key aspect of intelligence. In this work we go beyond recent advances in computational perception, and introduce a more challenging task, Image-based Event-Sequencing (IES). In IES, the task is to predict a sequence of actions required to rearrange objects from the configuration in an input source image to the one in the target image. IES also requires systems to possess inductive generalizability. Motivated from evidence in cognitive development, we compile the first IES dataset, the Blocksworld Image Reasoning Dataset (BIRD) which contains images of wooden blocks in different configurations, and the sequence of moves to rearrange one configuration to the other. We first explore the use of existing deep learning architectures and show that these end-to-end methods under-perform in inferring temporal event-sequences and fail at inductive generalization. We propose a modular two-step approach: Visual Perception followed by Event-Sequencing, and demonstrate improved performance by combining learning and reasoning. Finally, by showing an extension of our approach on natural images, we seek to pave the way for future research on event sequencing for real world scenes.",

author = "Tejas Gokhale and Shailaja Sampat and Zhiyuan Fang and Yezhou Yang and Chitta Baral",

note = "Funding Information: We acknowledge support from NSF Grant 1816039. Publisher Copyright: {\textcopyright} 2019 IEEE Computer Society. All rights reserved.; 32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2019 ; Conference date: 16-06-2019 Through 20-06-2019",

year = "2019",

month = jun,

language = "English (US)",

series = "IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops",

publisher = "IEEE Computer Society",

pages = "5--8",

booktitle = "Proceedings - 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2019",

}

TY - GEN

T1 - Cooking with blocks

T2 - 32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2019

AU - Gokhale, Tejas

AU - Sampat, Shailaja

AU - Fang, Zhiyuan

AU - Yang, Yezhou

AU - Baral, Chitta

PY - 2019/6

Y1 - 2019/6

N2 - The ability of identifying changes or transformations in a scene and to reason about their causes and effects, is a key aspect of intelligence. In this work we go beyond recent advances in computational perception, and introduce a more challenging task, Image-based Event-Sequencing (IES). In IES, the task is to predict a sequence of actions required to rearrange objects from the configuration in an input source image to the one in the target image. IES also requires systems to possess inductive generalizability. Motivated from evidence in cognitive development, we compile the first IES dataset, the Blocksworld Image Reasoning Dataset (BIRD) which contains images of wooden blocks in different configurations, and the sequence of moves to rearrange one configuration to the other. We first explore the use of existing deep learning architectures and show that these end-to-end methods under-perform in inferring temporal event-sequences and fail at inductive generalization. We propose a modular two-step approach: Visual Perception followed by Event-Sequencing, and demonstrate improved performance by combining learning and reasoning. Finally, by showing an extension of our approach on natural images, we seek to pave the way for future research on event sequencing for real world scenes.

AB - The ability of identifying changes or transformations in a scene and to reason about their causes and effects, is a key aspect of intelligence. In this work we go beyond recent advances in computational perception, and introduce a more challenging task, Image-based Event-Sequencing (IES). In IES, the task is to predict a sequence of actions required to rearrange objects from the configuration in an input source image to the one in the target image. IES also requires systems to possess inductive generalizability. Motivated from evidence in cognitive development, we compile the first IES dataset, the Blocksworld Image Reasoning Dataset (BIRD) which contains images of wooden blocks in different configurations, and the sequence of moves to rearrange one configuration to the other. We first explore the use of existing deep learning architectures and show that these end-to-end methods under-perform in inferring temporal event-sequences and fail at inductive generalization. We propose a modular two-step approach: Visual Perception followed by Event-Sequencing, and demonstrate improved performance by combining learning and reasoning. Finally, by showing an extension of our approach on natural images, we seek to pave the way for future research on event sequencing for real world scenes.

UR - http://www.scopus.com/inward/record.url?scp=85113852803&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85113852803&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85113852803

T3 - IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops

SP - 5

EP - 8

BT - Proceedings - 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2019

PB - IEEE Computer Society

Y2 - 16 June 2019 through 20 June 2019

ER -

Cooking with blocks: A recipe for visual reasoning on image-pairs

Abstract

Publication series

Conference

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this