A Unified Evaluation Framework for Novelty Detection and Accommodation in NLP with an Instantiation in Authorship Attribution

Neeraj Varshney; Himanshu Gupta; Eric Robertson; Bing Liu; Chitta Baral

A Unified Evaluation Framework for Novelty Detection and Accommodation in NLP with an Instantiation in Authorship Attribution

Neeraj Varshney, Himanshu Gupta, Eric Robertson, Bing Liu, Chitta Baral

Engineering, Ira A. Fulton Schools of (IAFSE)

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Abstract

State-of-the-art natural language processing models have been shown to achieve remarkable performance in 'closed-world' settings where all the labels in the evaluation set are known at training time. However, in real-world settings, 'novel' instances that do not belong to any known class are often observed. This renders the ability to deal with novelties crucial. To initiate a systematic research in this important area of 'dealing with novelties', we introduce NoveltyTask, a multi-stage task to evaluate a system's performance on pipelined novelty 'detection' and 'accommodation' tasks. We provide mathematical formulation of NoveltyTask and instantiate it with the authorship attribution task that pertains to identifying the correct author of a given text. We use Amazon reviews corpus and compile a large dataset (consisting of 250k instances across 200 authors/labels) for NoveltyTask. We conduct comprehensive experiments and explore several baseline methods for the task. Our results show that the methods achieve considerably low performance making the task challenging and leaving sufficient room for improvement. Finally, we believe our work will encourage research in this underexplored area of dealing with novelties, an important step en route to developing robust systems.

Original language	English (US)
Title of host publication	Findings of the Association for Computational Linguistics, ACL 2023
Publisher	Association for Computational Linguistics (ACL)
Pages	1794-1818
Number of pages	25
ISBN (Electronic)	9781959429623
State	Published - 2023
Event	61st Annual Meeting of the Association for Computational Linguistics, ACL 2023 - Toronto, Canada Duration: Jul 9 2023 → Jul 14 2023

Publication series

Name	Proceedings of the Annual Meeting of the Association for Computational Linguistics
ISSN (Print)	0736-587X

Conference

Conference	61st Annual Meeting of the Association for Computational Linguistics, ACL 2023
Country/Territory	Canada
City	Toronto
Period	7/9/23 → 7/14/23

ASJC Scopus subject areas

Computer Science Applications
Linguistics and Language
Language and Linguistics

Cite this

Varshney, N., Gupta, H., Robertson, E., Liu, B., & Baral, C. (2023). A Unified Evaluation Framework for Novelty Detection and Accommodation in NLP with an Instantiation in Authorship Attribution. In Findings of the Association for Computational Linguistics, ACL 2023 (pp. 1794-1818). (Proceedings of the Annual Meeting of the Association for Computational Linguistics). Association for Computational Linguistics (ACL).

A Unified Evaluation Framework for Novelty Detection and Accommodation in NLP with an Instantiation in Authorship Attribution. / Varshney, Neeraj; Gupta, Himanshu; Robertson, Eric et al.
Findings of the Association for Computational Linguistics, ACL 2023. Association for Computational Linguistics (ACL), 2023. p. 1794-1818 (Proceedings of the Annual Meeting of the Association for Computational Linguistics).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Varshney, N, Gupta, H, Robertson, E, Liu, B & Baral, C 2023, A Unified Evaluation Framework for Novelty Detection and Accommodation in NLP with an Instantiation in Authorship Attribution. in Findings of the Association for Computational Linguistics, ACL 2023. Proceedings of the Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics (ACL), pp. 1794-1818, 61st Annual Meeting of the Association for Computational Linguistics, ACL 2023, Toronto, Canada, 7/9/23.

Varshney N, Gupta H, Robertson E, Liu B, Baral C. A Unified Evaluation Framework for Novelty Detection and Accommodation in NLP with an Instantiation in Authorship Attribution. In Findings of the Association for Computational Linguistics, ACL 2023. Association for Computational Linguistics (ACL). 2023. p. 1794-1818. (Proceedings of the Annual Meeting of the Association for Computational Linguistics).

Varshney, Neeraj ; Gupta, Himanshu ; Robertson, Eric et al. / A Unified Evaluation Framework for Novelty Detection and Accommodation in NLP with an Instantiation in Authorship Attribution. Findings of the Association for Computational Linguistics, ACL 2023. Association for Computational Linguistics (ACL), 2023. pp. 1794-1818 (Proceedings of the Annual Meeting of the Association for Computational Linguistics).

@inproceedings{be5bcffcf600496b9b83929670ff01d7,

title = "A Unified Evaluation Framework for Novelty Detection and Accommodation in NLP with an Instantiation in Authorship Attribution",

abstract = "State-of-the-art natural language processing models have been shown to achieve remarkable performance in 'closed-world' settings where all the labels in the evaluation set are known at training time. However, in real-world settings, 'novel' instances that do not belong to any known class are often observed. This renders the ability to deal with novelties crucial. To initiate a systematic research in this important area of 'dealing with novelties', we introduce NoveltyTask, a multi-stage task to evaluate a system's performance on pipelined novelty 'detection' and 'accommodation' tasks. We provide mathematical formulation of NoveltyTask and instantiate it with the authorship attribution task that pertains to identifying the correct author of a given text. We use Amazon reviews corpus and compile a large dataset (consisting of 250k instances across 200 authors/labels) for NoveltyTask. We conduct comprehensive experiments and explore several baseline methods for the task. Our results show that the methods achieve considerably low performance making the task challenging and leaving sufficient room for improvement. Finally, we believe our work will encourage research in this underexplored area of dealing with novelties, an important step en route to developing robust systems.",

author = "Neeraj Varshney and Himanshu Gupta and Eric Robertson and Bing Liu and Chitta Baral",

note = "Publisher Copyright: {\textcopyright} 2023 Association for Computational Linguistics.; 61st Annual Meeting of the Association for Computational Linguistics, ACL 2023 ; Conference date: 09-07-2023 Through 14-07-2023",

year = "2023",

language = "English (US)",

series = "Proceedings of the Annual Meeting of the Association for Computational Linguistics",

publisher = "Association for Computational Linguistics (ACL)",

pages = "1794--1818",

booktitle = "Findings of the Association for Computational Linguistics, ACL 2023",

}

TY - GEN

T1 - A Unified Evaluation Framework for Novelty Detection and Accommodation in NLP with an Instantiation in Authorship Attribution

AU - Varshney, Neeraj

AU - Gupta, Himanshu

AU - Robertson, Eric

AU - Liu, Bing

AU - Baral, Chitta

PY - 2023

Y1 - 2023

N2 - State-of-the-art natural language processing models have been shown to achieve remarkable performance in 'closed-world' settings where all the labels in the evaluation set are known at training time. However, in real-world settings, 'novel' instances that do not belong to any known class are often observed. This renders the ability to deal with novelties crucial. To initiate a systematic research in this important area of 'dealing with novelties', we introduce NoveltyTask, a multi-stage task to evaluate a system's performance on pipelined novelty 'detection' and 'accommodation' tasks. We provide mathematical formulation of NoveltyTask and instantiate it with the authorship attribution task that pertains to identifying the correct author of a given text. We use Amazon reviews corpus and compile a large dataset (consisting of 250k instances across 200 authors/labels) for NoveltyTask. We conduct comprehensive experiments and explore several baseline methods for the task. Our results show that the methods achieve considerably low performance making the task challenging and leaving sufficient room for improvement. Finally, we believe our work will encourage research in this underexplored area of dealing with novelties, an important step en route to developing robust systems.

AB - State-of-the-art natural language processing models have been shown to achieve remarkable performance in 'closed-world' settings where all the labels in the evaluation set are known at training time. However, in real-world settings, 'novel' instances that do not belong to any known class are often observed. This renders the ability to deal with novelties crucial. To initiate a systematic research in this important area of 'dealing with novelties', we introduce NoveltyTask, a multi-stage task to evaluate a system's performance on pipelined novelty 'detection' and 'accommodation' tasks. We provide mathematical formulation of NoveltyTask and instantiate it with the authorship attribution task that pertains to identifying the correct author of a given text. We use Amazon reviews corpus and compile a large dataset (consisting of 250k instances across 200 authors/labels) for NoveltyTask. We conduct comprehensive experiments and explore several baseline methods for the task. Our results show that the methods achieve considerably low performance making the task challenging and leaving sufficient room for improvement. Finally, we believe our work will encourage research in this underexplored area of dealing with novelties, an important step en route to developing robust systems.

UR - http://www.scopus.com/inward/record.url?scp=85175447780&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85175447780&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85175447780

T3 - Proceedings of the Annual Meeting of the Association for Computational Linguistics

SP - 1794

EP - 1818

BT - Findings of the Association for Computational Linguistics, ACL 2023

PB - Association for Computational Linguistics (ACL)

T2 - 61st Annual Meeting of the Association for Computational Linguistics, ACL 2023

Y2 - 9 July 2023 through 14 July 2023

ER -

A Unified Evaluation Framework for Novelty Detection and Accommodation in NLP with an Instantiation in Authorship Attribution

Abstract

Publication series

Conference

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this