MUTANT: A training paradigm for out-of-distribution generalization in visual question answering

Tejas Gokhale; Pratyay Banerjee; Chitta Baral; Yezhou Yang

MUTANT: A training paradigm for out-of-distribution generalization in visual question answering

Tejas Gokhale, Pratyay Banerjee, Chitta Baral, Yezhou Yang

Engineering, Ira A. Fulton Schools of (IAFSE)

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

58 Scopus citations

Abstract

While progress has been made on the visual question answering leaderboards, models often utilize spurious correlations and priors in datasets under the i.i.d. setting. As such, evaluation on out-of-distribution (OOD) test samples has emerged as a proxy for generalization. In this paper, we present MUTANT, a training paradigm that exposes the model to perceptually similar, yet semantically distinct mutations of the input, to improve OOD generalization, such as the VQA-CP challenge. Under this paradigm, models utilize a consistency-constrained training objective to understand the effect of semantic changes in input (question-image pair) on the output (answer). Unlike existing methods on VQA-CP, MUTANT does not rely on the knowledge about the nature of train and test answer distributions. MUTANT establishes a new state-ofthe-art accuracy on VQA-CP with a 10.57% improvement. Our work opens up avenues for the use of semantic input mutations for OOD generalization in question answering.

Original language	English (US)
Title of host publication	EMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference
Publisher	Association for Computational Linguistics (ACL)
Pages	878-892
Number of pages	15
ISBN (Electronic)	9781952148606
State	Published - 2020
Event	2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020 - Virtual, Online Duration: Nov 16 2020 → Nov 20 2020

Publication series

Name	EMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference

Conference

Conference	2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020
City	Virtual, Online
Period	11/16/20 → 11/20/20

ASJC Scopus subject areas

Information Systems
Computer Science Applications
Computational Theory and Mathematics

Cite this

Gokhale, T., Banerjee, P., Baral, C., & Yang, Y. (2020). MUTANT: A training paradigm for out-of-distribution generalization in visual question answering. In EMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference (pp. 878-892). (EMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference). Association for Computational Linguistics (ACL).

MUTANT: A training paradigm for out-of-distribution generalization in visual question answering. / Gokhale, Tejas; Banerjee, Pratyay; Baral, Chitta et al.
EMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference. Association for Computational Linguistics (ACL), 2020. p. 878-892 (EMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Gokhale, T, Banerjee, P, Baral, C & Yang, Y 2020, MUTANT: A training paradigm for out-of-distribution generalization in visual question answering. in EMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference. EMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, Association for Computational Linguistics (ACL), pp. 878-892, 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Virtual, Online, 11/16/20.

Gokhale T, Banerjee P, Baral C , Yang Y. MUTANT: A training paradigm for out-of-distribution generalization in visual question answering. In EMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference. Association for Computational Linguistics (ACL). 2020. p. 878-892. (EMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference).

Gokhale, Tejas ; Banerjee, Pratyay ; Baral, Chitta et al. / MUTANT : A training paradigm for out-of-distribution generalization in visual question answering. EMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference. Association for Computational Linguistics (ACL), 2020. pp. 878-892 (EMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference).

@inproceedings{eca5e4732a9b4bb9b302d14962091fac,

title = "MUTANT: A training paradigm for out-of-distribution generalization in visual question answering",

abstract = "While progress has been made on the visual question answering leaderboards, models often utilize spurious correlations and priors in datasets under the i.i.d. setting. As such, evaluation on out-of-distribution (OOD) test samples has emerged as a proxy for generalization. In this paper, we present MUTANT, a training paradigm that exposes the model to perceptually similar, yet semantically distinct mutations of the input, to improve OOD generalization, such as the VQA-CP challenge. Under this paradigm, models utilize a consistency-constrained training objective to understand the effect of semantic changes in input (question-image pair) on the output (answer). Unlike existing methods on VQA-CP, MUTANT does not rely on the knowledge about the nature of train and test answer distributions. MUTANT establishes a new state-ofthe-art accuracy on VQA-CP with a 10.57% improvement. Our work opens up avenues for the use of semantic input mutations for OOD generalization in question answering.",

author = "Tejas Gokhale and Pratyay Banerjee and Chitta Baral and Yezhou Yang",

note = "Funding Information: The authors acknowledge support from the NSF Robust Intelligence Program project #1816039, the DARPA KAIROS program (LESTAT project), the DARPA SAIL-ON program, and ONR award N00014-20-1-2332. Publisher Copyright: {\textcopyright} 2020 Association for Computational Linguistics; 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020 ; Conference date: 16-11-2020 Through 20-11-2020",

year = "2020",

language = "English (US)",

series = "EMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference",

publisher = "Association for Computational Linguistics (ACL)",

pages = "878--892",

booktitle = "EMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference",

}

TY - GEN

T1 - MUTANT

T2 - 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020

AU - Gokhale, Tejas

AU - Banerjee, Pratyay

AU - Baral, Chitta

AU - Yang, Yezhou

N1 - Funding Information: The authors acknowledge support from the NSF Robust Intelligence Program project #1816039, the DARPA KAIROS program (LESTAT project), the DARPA SAIL-ON program, and ONR award N00014-20-1-2332. Publisher Copyright: © 2020 Association for Computational Linguistics

PY - 2020

Y1 - 2020

N2 - While progress has been made on the visual question answering leaderboards, models often utilize spurious correlations and priors in datasets under the i.i.d. setting. As such, evaluation on out-of-distribution (OOD) test samples has emerged as a proxy for generalization. In this paper, we present MUTANT, a training paradigm that exposes the model to perceptually similar, yet semantically distinct mutations of the input, to improve OOD generalization, such as the VQA-CP challenge. Under this paradigm, models utilize a consistency-constrained training objective to understand the effect of semantic changes in input (question-image pair) on the output (answer). Unlike existing methods on VQA-CP, MUTANT does not rely on the knowledge about the nature of train and test answer distributions. MUTANT establishes a new state-ofthe-art accuracy on VQA-CP with a 10.57% improvement. Our work opens up avenues for the use of semantic input mutations for OOD generalization in question answering.

AB - While progress has been made on the visual question answering leaderboards, models often utilize spurious correlations and priors in datasets under the i.i.d. setting. As such, evaluation on out-of-distribution (OOD) test samples has emerged as a proxy for generalization. In this paper, we present MUTANT, a training paradigm that exposes the model to perceptually similar, yet semantically distinct mutations of the input, to improve OOD generalization, such as the VQA-CP challenge. Under this paradigm, models utilize a consistency-constrained training objective to understand the effect of semantic changes in input (question-image pair) on the output (answer). Unlike existing methods on VQA-CP, MUTANT does not rely on the knowledge about the nature of train and test answer distributions. MUTANT establishes a new state-ofthe-art accuracy on VQA-CP with a 10.57% improvement. Our work opens up avenues for the use of semantic input mutations for OOD generalization in question answering.

UR - http://www.scopus.com/inward/record.url?scp=85118469744&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85118469744&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85118469744

T3 - EMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference

SP - 878

EP - 892

BT - EMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference

PB - Association for Computational Linguistics (ACL)

Y2 - 16 November 2020 through 20 November 2020

ER -

MUTANT: A training paradigm for out-of-distribution generalization in visual question answering

Abstract

Publication series

Conference

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this