To Find Waldo You Need Contextual Cues: Debiasing Who's Waldo

Yiran Luo, Pratyay Banerjee, Tejas Gokhale, Yezhou Yang, Chitta Baral

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations

Abstract

We present a debiased dataset for the Person-centric Visual Grounding (PCVG) task first proposed by Cui et al. (2021) in the Who's Waldo dataset. Given an image and a caption, PCVG requires pairing up a person's name mentioned in a caption with a bounding box that points to the person in the image. We find that the original Who's Waldo dataset compiled for this task contains a large number of biased samples that are solvable simply by heuristic methods; for instance, in many cases the first name in the sentence corresponds to the largest bounding box, or the sequence of names in the sentence corresponds to an exact left-to-right order in the image. Naturally, models trained on these biased data lead to over-estimation of performance on the benchmark. To enforce models being correct for the correct reasons, we design automated tools to filter and debias the original dataset by ruling out all examples of insufficient context, such as those with no verb or with a long chain of conjunct names in their captions. Our experiments show that our new sub-sampled dataset contains less bias with much lowered heuristic performances and widened gaps between heuristic and supervised methods. We also demonstrate the same benchmark model trained on our debiased training set outperforms that trained on the original biased (and larger) training set on our debiased test set. We argue our debiased dataset offers the PCVG task a more practical baseline for reliable benchmarking and future improvements.

Original languageEnglish (US)
Title of host publicationACL 2022 - 60th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Short Papers)
EditorsSmaranda Muresan, Preslav Nakov, Aline Villavicencio
PublisherAssociation for Computational Linguistics (ACL)
Pages355-361
Number of pages7
ISBN (Electronic)9781955917223
StatePublished - 2022
Event60th Annual Meeting of the Association for Computational Linguistics, ACL 2022 - Dublin, Ireland
Duration: May 22 2022May 27 2022

Publication series

NameProceedings of the Annual Meeting of the Association for Computational Linguistics
Volume2
ISSN (Print)0736-587X

Conference

Conference60th Annual Meeting of the Association for Computational Linguistics, ACL 2022
Country/TerritoryIreland
CityDublin
Period5/22/225/27/22

ASJC Scopus subject areas

  • Computer Science Applications
  • Linguistics and Language
  • Language and Linguistics

Fingerprint

Dive into the research topics of 'To Find Waldo You Need Contextual Cues: Debiasing Who's Waldo'. Together they form a unique fingerprint.

Cite this