CRIPP-VQA: Counterfactual Reasoning about Implicit Physical Properties via Video Question Answering

Maitreya Patel; Tejas Gokhale; Chitta Baral; Yezhou Yang

CRIPP-VQA: Counterfactual Reasoning about Implicit Physical Properties via Video Question Answering

Maitreya Patel, Tejas Gokhale, Chitta Baral, Yezhou Yang

Engineering, Ira A. Fulton Schools of (IAFSE)

Research output: Contribution to conference › Paper › peer-review

Abstract

Videos often capture objects, their visible properties, their motion, and the interactions between different objects. Objects also have physical properties such as mass, which the imaging pipeline is unable to directly capture. However, these properties can be estimated by utilizing cues from relative object motion and the dynamics introduced by collisions. In this paper, we introduce CRIPP-VQA, a new video question answering dataset for reasoning about the implicit physical properties of objects in a scene. CRIPP-VQA contains videos of objects in motion, annotated with questions that involve counterfactual reasoning about the effect of actions, questions about planning in order to reach a goal, and descriptive questions about visible properties of objects. The CRIPP-VQA test set enables evaluation under several out-of-distribution settings - videos with objects with masses, coefficients of friction, and initial velocities that are not observed in the training distribution. Our experiments reveal a surprising and significant performance gap in terms of answering questions about implicit properties (the focus of this paper) and explicit properties of objects (the focus of prior work).

Original language	English (US)
Pages	9856-9870
Number of pages	15
State	Published - 2022
Event	2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022 - Abu Dhabi, United Arab Emirates Duration: Dec 7 2022 → Dec 11 2022

Conference

Conference	2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022
Country/Territory	United Arab Emirates
City	Abu Dhabi
Period	12/7/22 → 12/11/22

ASJC Scopus subject areas

Computational Theory and Mathematics
Computer Science Applications
Information Systems

Cite this

CRIPP-VQA: Counterfactual Reasoning about Implicit Physical Properties via Video Question Answering. / Patel, Maitreya; Gokhale, Tejas; Baral, Chitta et al.
2022. 9856-9870 Paper presented at 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates.

Research output: Contribution to conference › Paper › peer-review

@conference{b8967e42b31540b4b346d14f7a51f3e1,

title = "CRIPP-VQA: Counterfactual Reasoning about Implicit Physical Properties via Video Question Answering",

abstract = "Videos often capture objects, their visible properties, their motion, and the interactions between different objects. Objects also have physical properties such as mass, which the imaging pipeline is unable to directly capture. However, these properties can be estimated by utilizing cues from relative object motion and the dynamics introduced by collisions. In this paper, we introduce CRIPP-VQA, a new video question answering dataset for reasoning about the implicit physical properties of objects in a scene. CRIPP-VQA contains videos of objects in motion, annotated with questions that involve counterfactual reasoning about the effect of actions, questions about planning in order to reach a goal, and descriptive questions about visible properties of objects. The CRIPP-VQA test set enables evaluation under several out-of-distribution settings - videos with objects with masses, coefficients of friction, and initial velocities that are not observed in the training distribution. Our experiments reveal a surprising and significant performance gap in terms of answering questions about implicit properties (the focus of this paper) and explicit properties of objects (the focus of prior work).",

author = "Maitreya Patel and Tejas Gokhale and Chitta Baral and Yezhou Yang",

note = "Funding Information: This work was supported by NSF RI grants #1750082, #1816039 and #2132724, and the DARPA GAILA ADAM project. Publisher Copyright: {\textcopyright} 2022 Association for Computational Linguistics.; 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022 ; Conference date: 07-12-2022 Through 11-12-2022",

year = "2022",

language = "English (US)",

pages = "9856--9870",

}

TY - CONF

T1 - CRIPP-VQA

T2 - 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022

AU - Patel, Maitreya

AU - Gokhale, Tejas

AU - Baral, Chitta

AU - Yang, Yezhou

N1 - Funding Information: This work was supported by NSF RI grants #1750082, #1816039 and #2132724, and the DARPA GAILA ADAM project. Publisher Copyright: © 2022 Association for Computational Linguistics.

PY - 2022

Y1 - 2022

N2 - Videos often capture objects, their visible properties, their motion, and the interactions between different objects. Objects also have physical properties such as mass, which the imaging pipeline is unable to directly capture. However, these properties can be estimated by utilizing cues from relative object motion and the dynamics introduced by collisions. In this paper, we introduce CRIPP-VQA, a new video question answering dataset for reasoning about the implicit physical properties of objects in a scene. CRIPP-VQA contains videos of objects in motion, annotated with questions that involve counterfactual reasoning about the effect of actions, questions about planning in order to reach a goal, and descriptive questions about visible properties of objects. The CRIPP-VQA test set enables evaluation under several out-of-distribution settings - videos with objects with masses, coefficients of friction, and initial velocities that are not observed in the training distribution. Our experiments reveal a surprising and significant performance gap in terms of answering questions about implicit properties (the focus of this paper) and explicit properties of objects (the focus of prior work).

AB - Videos often capture objects, their visible properties, their motion, and the interactions between different objects. Objects also have physical properties such as mass, which the imaging pipeline is unable to directly capture. However, these properties can be estimated by utilizing cues from relative object motion and the dynamics introduced by collisions. In this paper, we introduce CRIPP-VQA, a new video question answering dataset for reasoning about the implicit physical properties of objects in a scene. CRIPP-VQA contains videos of objects in motion, annotated with questions that involve counterfactual reasoning about the effect of actions, questions about planning in order to reach a goal, and descriptive questions about visible properties of objects. The CRIPP-VQA test set enables evaluation under several out-of-distribution settings - videos with objects with masses, coefficients of friction, and initial velocities that are not observed in the training distribution. Our experiments reveal a surprising and significant performance gap in terms of answering questions about implicit properties (the focus of this paper) and explicit properties of objects (the focus of prior work).

UR - http://www.scopus.com/inward/record.url?scp=85149439309&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85149439309&partnerID=8YFLogxK

M3 - Paper

AN - SCOPUS:85149439309

SP - 9856

EP - 9870

Y2 - 7 December 2022 through 11 December 2022

ER -

CRIPP-VQA: Counterfactual Reasoning about Implicit Physical Properties via Video Question Answering

Abstract

Conference

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this