ILDAE: Instance-Level Difficulty Analysis of Evaluation Data

Neeraj Varshney; Swaroop Mishra; Chitta Baral

ILDAE: Instance-Level Difficulty Analysis of Evaluation Data

Neeraj Varshney, Swaroop Mishra, Chitta Baral

Engineering, Ira A. Fulton Schools of (IAFSE)

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

10 Scopus citations

Abstract

Knowledge of difficulty level of questions helps a teacher in several ways, such as estimating students' potential quickly by asking carefully selected questions and improving quality of examination by modifying trivial and hard questions. Can we extract such benefits of instance difficulty in Natural Language Processing? To this end, we conduct Instance-Level Difficulty Analysis of Evaluation data (ILDAE) in a large-scale setup of 23 datasets and demonstrate its five novel applications: 1) conducting efficient-yet-accurate evaluations with fewer instances saving computational cost and time, 2) improving quality of existing evaluation datasets by repairing erroneous and trivial instances, 3) selecting the best model based on application requirements, 4) analyzing dataset characteristics for guiding future data creation, 5) estimating Out-of-Domain performance reliably. Comprehensive experiments for these applications lead to several interesting results, such as evaluation using just 5% instances (selected via ILDAE) achieves as high as 0.93 Kendall correlation with evaluation using complete dataset and computing weighted accuracy using difficulty scores leads to 5.2% higher correlation with Out-of-Domain performance. We release the difficulty scores and hope our work will encourage research in this important yet understudied field of leveraging instance difficulty in evaluations.

Original language	English (US)
Title of host publication	ACL 2022 - 60th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers)
Editors	Smaranda Muresan, Preslav Nakov, Aline Villavicencio
Publisher	Association for Computational Linguistics (ACL)
Pages	3412-3425
Number of pages	14
ISBN (Electronic)	9781955917216
State	Published - 2022
Event	60th Annual Meeting of the Association for Computational Linguistics, ACL 2022 - Dublin, Ireland Duration: May 22 2022 → May 27 2022

Publication series

Name	Proceedings of the Annual Meeting of the Association for Computational Linguistics
Volume	1
ISSN (Print)	0736-587X

Conference

Conference	60th Annual Meeting of the Association for Computational Linguistics, ACL 2022
Country/Territory	Ireland
City	Dublin
Period	5/22/22 → 5/27/22

ASJC Scopus subject areas

Computer Science Applications
Linguistics and Language
Language and Linguistics

Cite this

Varshney, N., Mishra, S., & Baral, C. (2022). ILDAE: Instance-Level Difficulty Analysis of Evaluation Data. In S. Muresan, P. Nakov, & A. Villavicencio (Eds.), ACL 2022 - 60th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers) (pp. 3412-3425). (Proceedings of the Annual Meeting of the Association for Computational Linguistics; Vol. 1). Association for Computational Linguistics (ACL).

ILDAE: Instance-Level Difficulty Analysis of Evaluation Data. / Varshney, Neeraj; Mishra, Swaroop; Baral, Chitta.
ACL 2022 - 60th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers). ed. / Smaranda Muresan; Preslav Nakov; Aline Villavicencio. Association for Computational Linguistics (ACL), 2022. p. 3412-3425 (Proceedings of the Annual Meeting of the Association for Computational Linguistics; Vol. 1).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Varshney, N, Mishra, S & Baral, C 2022, ILDAE: Instance-Level Difficulty Analysis of Evaluation Data. in S Muresan, P Nakov & A Villavicencio (eds), ACL 2022 - 60th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers). Proceedings of the Annual Meeting of the Association for Computational Linguistics, vol. 1, Association for Computational Linguistics (ACL), pp. 3412-3425, 60th Annual Meeting of the Association for Computational Linguistics, ACL 2022, Dublin, Ireland, 5/22/22.

Varshney N, Mishra S, Baral C. ILDAE: Instance-Level Difficulty Analysis of Evaluation Data. In Muresan S, Nakov P, Villavicencio A, editors, ACL 2022 - 60th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers). Association for Computational Linguistics (ACL). 2022. p. 3412-3425. (Proceedings of the Annual Meeting of the Association for Computational Linguistics).

Varshney, Neeraj ; Mishra, Swaroop ; Baral, Chitta. / ILDAE : Instance-Level Difficulty Analysis of Evaluation Data. ACL 2022 - 60th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers). editor / Smaranda Muresan ; Preslav Nakov ; Aline Villavicencio. Association for Computational Linguistics (ACL), 2022. pp. 3412-3425 (Proceedings of the Annual Meeting of the Association for Computational Linguistics).

@inproceedings{968250d5db284f3aa9f89a29f28c3804,

title = "ILDAE: Instance-Level Difficulty Analysis of Evaluation Data",

abstract = "Knowledge of difficulty level of questions helps a teacher in several ways, such as estimating students' potential quickly by asking carefully selected questions and improving quality of examination by modifying trivial and hard questions. Can we extract such benefits of instance difficulty in Natural Language Processing? To this end, we conduct Instance-Level Difficulty Analysis of Evaluation data (ILDAE) in a large-scale setup of 23 datasets and demonstrate its five novel applications: 1) conducting efficient-yet-accurate evaluations with fewer instances saving computational cost and time, 2) improving quality of existing evaluation datasets by repairing erroneous and trivial instances, 3) selecting the best model based on application requirements, 4) analyzing dataset characteristics for guiding future data creation, 5) estimating Out-of-Domain performance reliably. Comprehensive experiments for these applications lead to several interesting results, such as evaluation using just 5% instances (selected via ILDAE) achieves as high as 0.93 Kendall correlation with evaluation using complete dataset and computing weighted accuracy using difficulty scores leads to 5.2% higher correlation with Out-of-Domain performance. We release the difficulty scores and hope our work will encourage research in this important yet understudied field of leveraging instance difficulty in evaluations.",

author = "Neeraj Varshney and Swaroop Mishra and Chitta Baral",

note = "Publisher Copyright: {\textcopyright} 2022 Association for Computational Linguistics.; 60th Annual Meeting of the Association for Computational Linguistics, ACL 2022 ; Conference date: 22-05-2022 Through 27-05-2022",

year = "2022",

language = "English (US)",

series = "Proceedings of the Annual Meeting of the Association for Computational Linguistics",

publisher = "Association for Computational Linguistics (ACL)",

pages = "3412--3425",

editor = "Smaranda Muresan and Preslav Nakov and Aline Villavicencio",

booktitle = "ACL 2022 - 60th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers)",

}

TY - GEN

T1 - ILDAE

T2 - 60th Annual Meeting of the Association for Computational Linguistics, ACL 2022

AU - Varshney, Neeraj

AU - Mishra, Swaroop

AU - Baral, Chitta

PY - 2022

Y1 - 2022

N2 - Knowledge of difficulty level of questions helps a teacher in several ways, such as estimating students' potential quickly by asking carefully selected questions and improving quality of examination by modifying trivial and hard questions. Can we extract such benefits of instance difficulty in Natural Language Processing? To this end, we conduct Instance-Level Difficulty Analysis of Evaluation data (ILDAE) in a large-scale setup of 23 datasets and demonstrate its five novel applications: 1) conducting efficient-yet-accurate evaluations with fewer instances saving computational cost and time, 2) improving quality of existing evaluation datasets by repairing erroneous and trivial instances, 3) selecting the best model based on application requirements, 4) analyzing dataset characteristics for guiding future data creation, 5) estimating Out-of-Domain performance reliably. Comprehensive experiments for these applications lead to several interesting results, such as evaluation using just 5% instances (selected via ILDAE) achieves as high as 0.93 Kendall correlation with evaluation using complete dataset and computing weighted accuracy using difficulty scores leads to 5.2% higher correlation with Out-of-Domain performance. We release the difficulty scores and hope our work will encourage research in this important yet understudied field of leveraging instance difficulty in evaluations.

AB - Knowledge of difficulty level of questions helps a teacher in several ways, such as estimating students' potential quickly by asking carefully selected questions and improving quality of examination by modifying trivial and hard questions. Can we extract such benefits of instance difficulty in Natural Language Processing? To this end, we conduct Instance-Level Difficulty Analysis of Evaluation data (ILDAE) in a large-scale setup of 23 datasets and demonstrate its five novel applications: 1) conducting efficient-yet-accurate evaluations with fewer instances saving computational cost and time, 2) improving quality of existing evaluation datasets by repairing erroneous and trivial instances, 3) selecting the best model based on application requirements, 4) analyzing dataset characteristics for guiding future data creation, 5) estimating Out-of-Domain performance reliably. Comprehensive experiments for these applications lead to several interesting results, such as evaluation using just 5% instances (selected via ILDAE) achieves as high as 0.93 Kendall correlation with evaluation using complete dataset and computing weighted accuracy using difficulty scores leads to 5.2% higher correlation with Out-of-Domain performance. We release the difficulty scores and hope our work will encourage research in this important yet understudied field of leveraging instance difficulty in evaluations.

UR - http://www.scopus.com/inward/record.url?scp=85137568211&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85137568211&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85137568211

T3 - Proceedings of the Annual Meeting of the Association for Computational Linguistics

SP - 3412

EP - 3425

BT - ACL 2022 - 60th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers)

A2 - Muresan, Smaranda

A2 - Nakov, Preslav

A2 - Villavicencio, Aline

PB - Association for Computational Linguistics (ACL)

Y2 - 22 May 2022 through 27 May 2022

ER -

ILDAE: Instance-Level Difficulty Analysis of Evaluation Data

Abstract

Publication series

Conference

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this