LINGO: Visually Debiasing Natural Language Instructions to Support Task Diversity

A. Arunkumar; S. Sharma; R. Agrawal; S. Chandrasekaran; C. Bryan

doi:10.1111/cgf.14840

LINGO: Visually Debiasing Natural Language Instructions to Support Task Diversity

A. Arunkumar, S. Sharma, R. Agrawal, S. Chandrasekaran, C. Bryan

Engineering, Ira A. Fulton Schools of (IAFSE)

Research output: Contribution to journal › Article › peer-review

1 Scopus citations

Abstract

Cross-task generalization is a significant outcome that defines mastery in natural language understanding. Humans show a remarkable aptitude for this, and can solve many different types of tasks, given definitions in the form of textual instructions and a small set of examples. Recent work with pre-trained language models mimics this learning style: users can define and exemplify a task for the model to attempt as a series of natural language prompts or instructions. While prompting approaches have led to higher cross-task generalization compared to traditional supervised learning, analyzing ‘bias’ in the task instructions given to the model is a difficult problem, and has thus been relatively unexplored. For instance, are we truly modeling a task, or are we modeling a user's instructions? To help investigate this, we develop LINGO, a novel visual analytics interface that supports an effective, task-driven workflow to (1) help identify bias in natural language task instructions, (2) alter (or create) task instructions to reduce bias, and (3) evaluate pre-trained model performance on debiased task instructions. To robustly evaluate LINGO, we conduct a user study with both novice and expert instruction creators, over a dataset of 1,616 linguistic tasks and their natural language instructions, spanning 55 different languages. For both user groups, LINGO promotes the creation of more difficult tasks for pre-trained models, that contain higher linguistic diversity and lower instruction bias. We additionally discuss how the insights learned in developing and evaluating LINGO can aid in the design of future dashboards that aim to minimize the effort involved in prompt creation across multiple domains.

Original language	English (US)
Pages (from-to)	409-421
Number of pages	13
Journal	Computer Graphics Forum
Volume	42
Issue number	3
DOIs	https://doi.org/10.1111/cgf.14840
State	Published - Jun 2023

Keywords

CCS Concepts
Text input
• Computing methodologies → Natural language processing
• Human-centered computing → Visual analytics

ASJC Scopus subject areas

Computer Graphics and Computer-Aided Design

Access to Document

10.1111/cgf.14840

Cite this

@article{212834d9e5ab4a769a4c6c0ce25ddc73,

title = "LINGO: Visually Debiasing Natural Language Instructions to Support Task Diversity",

abstract = "Cross-task generalization is a significant outcome that defines mastery in natural language understanding. Humans show a remarkable aptitude for this, and can solve many different types of tasks, given definitions in the form of textual instructions and a small set of examples. Recent work with pre-trained language models mimics this learning style: users can define and exemplify a task for the model to attempt as a series of natural language prompts or instructions. While prompting approaches have led to higher cross-task generalization compared to traditional supervised learning, analyzing {\textquoteleft}bias{\textquoteright} in the task instructions given to the model is a difficult problem, and has thus been relatively unexplored. For instance, are we truly modeling a task, or are we modeling a user's instructions? To help investigate this, we develop LINGO, a novel visual analytics interface that supports an effective, task-driven workflow to (1) help identify bias in natural language task instructions, (2) alter (or create) task instructions to reduce bias, and (3) evaluate pre-trained model performance on debiased task instructions. To robustly evaluate LINGO, we conduct a user study with both novice and expert instruction creators, over a dataset of 1,616 linguistic tasks and their natural language instructions, spanning 55 different languages. For both user groups, LINGO promotes the creation of more difficult tasks for pre-trained models, that contain higher linguistic diversity and lower instruction bias. We additionally discuss how the insights learned in developing and evaluating LINGO can aid in the design of future dashboards that aim to minimize the effort involved in prompt creation across multiple domains.",

keywords = "CCS Concepts, Text input, • Computing methodologies → Natural language processing, • Human-centered computing → Visual analytics",

author = "A. Arunkumar and S. Sharma and R. Agrawal and S. Chandrasekaran and C. Bryan",

note = "Publisher Copyright: {\textcopyright} 2023 Eurographics - The European Association for Computer Graphics and John Wiley &#x0026; Sons Ltd.",

year = "2023",

month = jun,

doi = "10.1111/cgf.14840",

language = "English (US)",

volume = "42",

pages = "409--421",

journal = "Computer Graphics Forum",

issn = "0167-7055",

publisher = "Wiley-Blackwell",

number = "3",

}

TY - JOUR

T1 - LINGO

T2 - Visually Debiasing Natural Language Instructions to Support Task Diversity

AU - Arunkumar, A.

AU - Sharma, S.

AU - Agrawal, R.

AU - Chandrasekaran, S.

AU - Bryan, C.

PY - 2023/6

Y1 - 2023/6

N2 - Cross-task generalization is a significant outcome that defines mastery in natural language understanding. Humans show a remarkable aptitude for this, and can solve many different types of tasks, given definitions in the form of textual instructions and a small set of examples. Recent work with pre-trained language models mimics this learning style: users can define and exemplify a task for the model to attempt as a series of natural language prompts or instructions. While prompting approaches have led to higher cross-task generalization compared to traditional supervised learning, analyzing ‘bias’ in the task instructions given to the model is a difficult problem, and has thus been relatively unexplored. For instance, are we truly modeling a task, or are we modeling a user's instructions? To help investigate this, we develop LINGO, a novel visual analytics interface that supports an effective, task-driven workflow to (1) help identify bias in natural language task instructions, (2) alter (or create) task instructions to reduce bias, and (3) evaluate pre-trained model performance on debiased task instructions. To robustly evaluate LINGO, we conduct a user study with both novice and expert instruction creators, over a dataset of 1,616 linguistic tasks and their natural language instructions, spanning 55 different languages. For both user groups, LINGO promotes the creation of more difficult tasks for pre-trained models, that contain higher linguistic diversity and lower instruction bias. We additionally discuss how the insights learned in developing and evaluating LINGO can aid in the design of future dashboards that aim to minimize the effort involved in prompt creation across multiple domains.

AB - Cross-task generalization is a significant outcome that defines mastery in natural language understanding. Humans show a remarkable aptitude for this, and can solve many different types of tasks, given definitions in the form of textual instructions and a small set of examples. Recent work with pre-trained language models mimics this learning style: users can define and exemplify a task for the model to attempt as a series of natural language prompts or instructions. While prompting approaches have led to higher cross-task generalization compared to traditional supervised learning, analyzing ‘bias’ in the task instructions given to the model is a difficult problem, and has thus been relatively unexplored. For instance, are we truly modeling a task, or are we modeling a user's instructions? To help investigate this, we develop LINGO, a novel visual analytics interface that supports an effective, task-driven workflow to (1) help identify bias in natural language task instructions, (2) alter (or create) task instructions to reduce bias, and (3) evaluate pre-trained model performance on debiased task instructions. To robustly evaluate LINGO, we conduct a user study with both novice and expert instruction creators, over a dataset of 1,616 linguistic tasks and their natural language instructions, spanning 55 different languages. For both user groups, LINGO promotes the creation of more difficult tasks for pre-trained models, that contain higher linguistic diversity and lower instruction bias. We additionally discuss how the insights learned in developing and evaluating LINGO can aid in the design of future dashboards that aim to minimize the effort involved in prompt creation across multiple domains.

KW - CCS Concepts

KW - Text input

KW - • Computing methodologies → Natural language processing

KW - • Human-centered computing → Visual analytics

UR - http://www.scopus.com/inward/record.url?scp=85163792438&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85163792438&partnerID=8YFLogxK

U2 - 10.1111/cgf.14840

DO - 10.1111/cgf.14840

M3 - Article

AN - SCOPUS:85163792438

SN - 0167-7055

VL - 42

SP - 409

EP - 421

JO - Computer Graphics Forum

JF - Computer Graphics Forum

IS - 3

ER -

LINGO: Visually Debiasing Natural Language Instructions to Support Task Diversity

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this