Investigating Selective Prediction Approaches Across Several Tasks in IID, OOD, and Adversarial Settings

Neeraj Varshney; Swaroop Mishra; Chitta Baral

Investigating Selective Prediction Approaches Across Several Tasks in IID, OOD, and Adversarial Settings

Neeraj Varshney, Swaroop Mishra, Chitta Baral

Engineering, Ira A. Fulton Schools of (IAFSE)

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

17 Scopus citations

Abstract

In order to equip NLP systems with 'selective prediction' capability, several task-specific approaches have been proposed. However, which approaches work best across tasks or even if they consistently outperform the simplest baseline MaxProb remains to be explored. To this end, we systematically study selective prediction in a large-scale setup of 17 datasets across several NLP tasks. Through comprehensive experiments under in-domain (IID), out-of-domain (OOD), and adversarial (ADV) settings, we show that despite leveraging additional resources (held-out data/computation), none of the existing approaches consistently and considerably outperforms MaxProb in all three settings. Furthermore, their performance does not translate well across tasks. For instance, Monte-Carlo Dropout outperforms all other approaches on Duplicate Detection datasets but does not fare well on NLI datasets, especially in the OOD setting. Thus, we recommend that future selective prediction approaches should be evaluated across tasks and settings for reliable estimation of their capabilities.

Original language	English (US)
Title of host publication	ACL 2022 - 60th Annual Meeting of the Association for Computational Linguistics, Findings of ACL 2022
Editors	Smaranda Muresan, Preslav Nakov, Aline Villavicencio
Publisher	Association for Computational Linguistics (ACL)
Pages	1995-2002
Number of pages	8
ISBN (Electronic)	9781955917254
State	Published - 2022
Event	60th Annual Meeting of the Association for Computational Linguistics, ACL 2022 - Dublin, Ireland Duration: May 22 2022 → May 27 2022

Publication series

Name	Proceedings of the Annual Meeting of the Association for Computational Linguistics
ISSN (Print)	0736-587X

Conference

Conference	60th Annual Meeting of the Association for Computational Linguistics, ACL 2022
Country/Territory	Ireland
City	Dublin
Period	5/22/22 → 5/27/22

ASJC Scopus subject areas

Computer Science Applications
Linguistics and Language
Language and Linguistics

Cite this

Varshney, N., Mishra, S., & Baral, C. (2022). Investigating Selective Prediction Approaches Across Several Tasks in IID, OOD, and Adversarial Settings. In S. Muresan, P. Nakov, & A. Villavicencio (Eds.), ACL 2022 - 60th Annual Meeting of the Association for Computational Linguistics, Findings of ACL 2022 (pp. 1995-2002). (Proceedings of the Annual Meeting of the Association for Computational Linguistics). Association for Computational Linguistics (ACL).

Investigating Selective Prediction Approaches Across Several Tasks in IID, OOD, and Adversarial Settings. / Varshney, Neeraj; Mishra, Swaroop; Baral, Chitta.
ACL 2022 - 60th Annual Meeting of the Association for Computational Linguistics, Findings of ACL 2022. ed. / Smaranda Muresan; Preslav Nakov; Aline Villavicencio. Association for Computational Linguistics (ACL), 2022. p. 1995-2002 (Proceedings of the Annual Meeting of the Association for Computational Linguistics).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Varshney, N, Mishra, S & Baral, C 2022, Investigating Selective Prediction Approaches Across Several Tasks in IID, OOD, and Adversarial Settings. in S Muresan, P Nakov & A Villavicencio (eds), ACL 2022 - 60th Annual Meeting of the Association for Computational Linguistics, Findings of ACL 2022. Proceedings of the Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics (ACL), pp. 1995-2002, 60th Annual Meeting of the Association for Computational Linguistics, ACL 2022, Dublin, Ireland, 5/22/22.

Varshney N, Mishra S, Baral C. Investigating Selective Prediction Approaches Across Several Tasks in IID, OOD, and Adversarial Settings. In Muresan S, Nakov P, Villavicencio A, editors, ACL 2022 - 60th Annual Meeting of the Association for Computational Linguistics, Findings of ACL 2022. Association for Computational Linguistics (ACL). 2022. p. 1995-2002. (Proceedings of the Annual Meeting of the Association for Computational Linguistics).

Varshney, Neeraj ; Mishra, Swaroop ; Baral, Chitta. / Investigating Selective Prediction Approaches Across Several Tasks in IID, OOD, and Adversarial Settings. ACL 2022 - 60th Annual Meeting of the Association for Computational Linguistics, Findings of ACL 2022. editor / Smaranda Muresan ; Preslav Nakov ; Aline Villavicencio. Association for Computational Linguistics (ACL), 2022. pp. 1995-2002 (Proceedings of the Annual Meeting of the Association for Computational Linguistics).

@inproceedings{e627944c3b9d4cf68d29f96f99ec2e0f,

title = "Investigating Selective Prediction Approaches Across Several Tasks in IID, OOD, and Adversarial Settings",

abstract = "In order to equip NLP systems with 'selective prediction' capability, several task-specific approaches have been proposed. However, which approaches work best across tasks or even if they consistently outperform the simplest baseline MaxProb remains to be explored. To this end, we systematically study selective prediction in a large-scale setup of 17 datasets across several NLP tasks. Through comprehensive experiments under in-domain (IID), out-of-domain (OOD), and adversarial (ADV) settings, we show that despite leveraging additional resources (held-out data/computation), none of the existing approaches consistently and considerably outperforms MaxProb in all three settings. Furthermore, their performance does not translate well across tasks. For instance, Monte-Carlo Dropout outperforms all other approaches on Duplicate Detection datasets but does not fare well on NLI datasets, especially in the OOD setting. Thus, we recommend that future selective prediction approaches should be evaluated across tasks and settings for reliable estimation of their capabilities.",

author = "Neeraj Varshney and Swaroop Mishra and Chitta Baral",

note = "Publisher Copyright: {\textcopyright} 2022 Association for Computational Linguistics.; 60th Annual Meeting of the Association for Computational Linguistics, ACL 2022 ; Conference date: 22-05-2022 Through 27-05-2022",

year = "2022",

language = "English (US)",

series = "Proceedings of the Annual Meeting of the Association for Computational Linguistics",

publisher = "Association for Computational Linguistics (ACL)",

pages = "1995--2002",

editor = "Smaranda Muresan and Preslav Nakov and Aline Villavicencio",

booktitle = "ACL 2022 - 60th Annual Meeting of the Association for Computational Linguistics, Findings of ACL 2022",

}

TY - GEN

T1 - Investigating Selective Prediction Approaches Across Several Tasks in IID, OOD, and Adversarial Settings

AU - Varshney, Neeraj

AU - Mishra, Swaroop

AU - Baral, Chitta

PY - 2022

Y1 - 2022

N2 - In order to equip NLP systems with 'selective prediction' capability, several task-specific approaches have been proposed. However, which approaches work best across tasks or even if they consistently outperform the simplest baseline MaxProb remains to be explored. To this end, we systematically study selective prediction in a large-scale setup of 17 datasets across several NLP tasks. Through comprehensive experiments under in-domain (IID), out-of-domain (OOD), and adversarial (ADV) settings, we show that despite leveraging additional resources (held-out data/computation), none of the existing approaches consistently and considerably outperforms MaxProb in all three settings. Furthermore, their performance does not translate well across tasks. For instance, Monte-Carlo Dropout outperforms all other approaches on Duplicate Detection datasets but does not fare well on NLI datasets, especially in the OOD setting. Thus, we recommend that future selective prediction approaches should be evaluated across tasks and settings for reliable estimation of their capabilities.

AB - In order to equip NLP systems with 'selective prediction' capability, several task-specific approaches have been proposed. However, which approaches work best across tasks or even if they consistently outperform the simplest baseline MaxProb remains to be explored. To this end, we systematically study selective prediction in a large-scale setup of 17 datasets across several NLP tasks. Through comprehensive experiments under in-domain (IID), out-of-domain (OOD), and adversarial (ADV) settings, we show that despite leveraging additional resources (held-out data/computation), none of the existing approaches consistently and considerably outperforms MaxProb in all three settings. Furthermore, their performance does not translate well across tasks. For instance, Monte-Carlo Dropout outperforms all other approaches on Duplicate Detection datasets but does not fare well on NLI datasets, especially in the OOD setting. Thus, we recommend that future selective prediction approaches should be evaluated across tasks and settings for reliable estimation of their capabilities.

UR - http://www.scopus.com/inward/record.url?scp=85135816982&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85135816982&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85135816982

T3 - Proceedings of the Annual Meeting of the Association for Computational Linguistics

SP - 1995

EP - 2002

BT - ACL 2022 - 60th Annual Meeting of the Association for Computational Linguistics, Findings of ACL 2022

A2 - Muresan, Smaranda

A2 - Nakov, Preslav

A2 - Villavicencio, Aline

PB - Association for Computational Linguistics (ACL)

T2 - 60th Annual Meeting of the Association for Computational Linguistics, ACL 2022

Y2 - 22 May 2022 through 27 May 2022

ER -

Investigating Selective Prediction Approaches Across Several Tasks in IID, OOD, and Adversarial Settings

Abstract

Publication series

Conference

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this