InstructExcel: A Benchmark for Natural Language Instruction in Excel

Justin Payan; Swaroop Mishra; Mukul Singh; Carina Negreanu; Christian Poelitz; Chitta Baral; Subhro Roy; Rasika Chakravarthy; Benjamin Van Durme; Elnaz Nouri

InstructExcel: A Benchmark for Natural Language Instruction in Excel

Justin Payan, Swaroop Mishra, Mukul Singh, Carina Negreanu, Christian Poelitz, Chitta Baral, Subhro Roy, Rasika Chakravarthy, Benjamin Van Durme, Elnaz Nouri

Engineering, Ira A. Fulton Schools of (IAFSE)

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Abstract

With the evolution of Large Language Models (LLMs) we can solve increasingly more complex NLP tasks across various domains, including spreadsheets. This work investigates whether LLMs can generate code (Excel OfficeScripts, a TypeScript API for executing many tasks in Excel) that solves Excel specific tasks provided via natural language user instructions. To do so we introduce a new large-scale benchmark, INSTRUCTEXCEL, created by leveraging the 'Automate' feature in Excel to automatically generate OfficeScripts from users' actions. Our benchmark includes over 10k samples covering 170+ Excel operations across 2,000 publicly available Excel spreadsheets. Experiments across various zero-shot and few-shot settings show that INSTRUCTEXCEL is a hard benchmark for state of the art models like GPT-4. We observe that (1) using GPT-4 over GPT-3.5, (2) providing more in-context examples, and (3) dynamic prompting can help improve performance on this benchmark.

Original language	English (US)
Title of host publication	Findings of the Association for Computational Linguistics
Subtitle of host publication	EMNLP 2023
Publisher	Association for Computational Linguistics (ACL)
Pages	4026-4043
Number of pages	18
ISBN (Electronic)	9798891760615
State	Published - 2023
Event	2023 Findings of the Association for Computational Linguistics: EMNLP 2023 - Singapore, Singapore Duration: Dec 6 2023 → Dec 10 2023

Publication series

Name	Findings of the Association for Computational Linguistics: EMNLP 2023

Conference

Conference	2023 Findings of the Association for Computational Linguistics: EMNLP 2023
Country/Territory	Singapore
City	Singapore
Period	12/6/23 → 12/10/23

ASJC Scopus subject areas

Computational Theory and Mathematics
Computer Science Applications
Information Systems
Language and Linguistics
Linguistics and Language

Cite this

Payan, J., Mishra, S., Singh, M., Negreanu, C., Poelitz, C., Baral, C., Roy, S., Chakravarthy, R., Van Durme, B., & Nouri, E. (2023). InstructExcel: A Benchmark for Natural Language Instruction in Excel. In Findings of the Association for Computational Linguistics: EMNLP 2023 (pp. 4026-4043). (Findings of the Association for Computational Linguistics: EMNLP 2023). Association for Computational Linguistics (ACL).

InstructExcel: A Benchmark for Natural Language Instruction in Excel. / Payan, Justin; Mishra, Swaroop; Singh, Mukul et al.
Findings of the Association for Computational Linguistics: EMNLP 2023. Association for Computational Linguistics (ACL), 2023. p. 4026-4043 (Findings of the Association for Computational Linguistics: EMNLP 2023).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Payan, J, Mishra, S, Singh, M, Negreanu, C, Poelitz, C, Baral, C, Roy, S, Chakravarthy, R, Van Durme, B & Nouri, E 2023, InstructExcel: A Benchmark for Natural Language Instruction in Excel. in Findings of the Association for Computational Linguistics: EMNLP 2023. Findings of the Association for Computational Linguistics: EMNLP 2023, Association for Computational Linguistics (ACL), pp. 4026-4043, 2023 Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, Singapore, 12/6/23.

@inproceedings{207f4538112a4bb6951d4b957389d3f6,

title = "InstructExcel: A Benchmark for Natural Language Instruction in Excel",

abstract = "With the evolution of Large Language Models (LLMs) we can solve increasingly more complex NLP tasks across various domains, including spreadsheets. This work investigates whether LLMs can generate code (Excel OfficeScripts, a TypeScript API for executing many tasks in Excel) that solves Excel specific tasks provided via natural language user instructions. To do so we introduce a new large-scale benchmark, INSTRUCTEXCEL, created by leveraging the 'Automate' feature in Excel to automatically generate OfficeScripts from users' actions. Our benchmark includes over 10k samples covering 170+ Excel operations across 2,000 publicly available Excel spreadsheets. Experiments across various zero-shot and few-shot settings show that INSTRUCTEXCEL is a hard benchmark for state of the art models like GPT-4. We observe that (1) using GPT-4 over GPT-3.5, (2) providing more in-context examples, and (3) dynamic prompting can help improve performance on this benchmark.",

author = "Justin Payan and Swaroop Mishra and Mukul Singh and Carina Negreanu and Christian Poelitz and Chitta Baral and Subhro Roy and Rasika Chakravarthy and {Van Durme}, Benjamin and Elnaz Nouri",

note = "Publisher Copyright: {\textcopyright} 2023 Association for Computational Linguistics.; 2023 Findings of the Association for Computational Linguistics: EMNLP 2023 ; Conference date: 06-12-2023 Through 10-12-2023",

year = "2023",

language = "English (US)",

series = "Findings of the Association for Computational Linguistics: EMNLP 2023",

publisher = "Association for Computational Linguistics (ACL)",

pages = "4026--4043",

booktitle = "Findings of the Association for Computational Linguistics",

}

TY - GEN

T1 - InstructExcel

T2 - 2023 Findings of the Association for Computational Linguistics: EMNLP 2023

AU - Payan, Justin

AU - Mishra, Swaroop

AU - Singh, Mukul

AU - Negreanu, Carina

AU - Poelitz, Christian

AU - Baral, Chitta

AU - Roy, Subhro

AU - Chakravarthy, Rasika

AU - Van Durme, Benjamin

AU - Nouri, Elnaz

PY - 2023

Y1 - 2023

N2 - With the evolution of Large Language Models (LLMs) we can solve increasingly more complex NLP tasks across various domains, including spreadsheets. This work investigates whether LLMs can generate code (Excel OfficeScripts, a TypeScript API for executing many tasks in Excel) that solves Excel specific tasks provided via natural language user instructions. To do so we introduce a new large-scale benchmark, INSTRUCTEXCEL, created by leveraging the 'Automate' feature in Excel to automatically generate OfficeScripts from users' actions. Our benchmark includes over 10k samples covering 170+ Excel operations across 2,000 publicly available Excel spreadsheets. Experiments across various zero-shot and few-shot settings show that INSTRUCTEXCEL is a hard benchmark for state of the art models like GPT-4. We observe that (1) using GPT-4 over GPT-3.5, (2) providing more in-context examples, and (3) dynamic prompting can help improve performance on this benchmark.

AB - With the evolution of Large Language Models (LLMs) we can solve increasingly more complex NLP tasks across various domains, including spreadsheets. This work investigates whether LLMs can generate code (Excel OfficeScripts, a TypeScript API for executing many tasks in Excel) that solves Excel specific tasks provided via natural language user instructions. To do so we introduce a new large-scale benchmark, INSTRUCTEXCEL, created by leveraging the 'Automate' feature in Excel to automatically generate OfficeScripts from users' actions. Our benchmark includes over 10k samples covering 170+ Excel operations across 2,000 publicly available Excel spreadsheets. Experiments across various zero-shot and few-shot settings show that INSTRUCTEXCEL is a hard benchmark for state of the art models like GPT-4. We observe that (1) using GPT-4 over GPT-3.5, (2) providing more in-context examples, and (3) dynamic prompting can help improve performance on this benchmark.

UR - http://www.scopus.com/inward/record.url?scp=85183307568&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85183307568&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85183307568

T3 - Findings of the Association for Computational Linguistics: EMNLP 2023

SP - 4026

EP - 4043

BT - Findings of the Association for Computational Linguistics

PB - Association for Computational Linguistics (ACL)

Y2 - 6 December 2023 through 10 December 2023

ER -

InstructExcel: A Benchmark for Natural Language Instruction in Excel

Abstract

Publication series

Conference

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this