LogicAttack: Adversarial Attacks for Evaluating Logical Consistency of Natural Language Inference

Mutsumi Nakamura; Santosh Mashetty; Mihir Parmar; Neeraj Varshney; Chitta Baral

LogicAttack: Adversarial Attacks for Evaluating Logical Consistency of Natural Language Inference

Mutsumi Nakamura, Santosh Mashetty, Mihir Parmar, Neeraj Varshney, Chitta Baral

Engineering, Ira A. Fulton Schools of (IAFSE)

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Abstract

Recently Large Language Models (LLMs) such as GPT-3, ChatGPT, and FLAN have led to impressive progress in Natural Language Inference (NLI) tasks. However, these models may rely on simple heuristics or artifacts in the evaluation data to achieve their high performance, which suggests that they still suffer from logical inconsistency. To assess the logical consistency of these models, we propose a LogicAttack, a method to attack NLI models using diverse logical forms of premise and hypothesis, providing a more robust evaluation of their performance. Our approach leverages a range of inference rules from propositional logic, such as Modus Tollens and Bidirectional Dilemma, to generate effective adversarial attacks and identify common vulnerabilities across multiple NLI models. We achieve an average ∼ 53% Attack Success Rate (ASR) across multiple logic-based attacks. Moreover, we demonstrate that incorporating generated attack samples into training enhances the logical reasoning ability of the target model and decreases its vulnerability to logic-based attacks.

Original language	English (US)
Title of host publication	Findings of the Association for Computational Linguistics
Subtitle of host publication	EMNLP 2023
Publisher	Association for Computational Linguistics (ACL)
Pages	13322-13334
Number of pages	13
ISBN (Electronic)	9798891760615
State	Published - 2023
Event	2023 Findings of the Association for Computational Linguistics: EMNLP 2023 - Singapore, Singapore Duration: Dec 6 2023 → Dec 10 2023

Publication series

Name	Findings of the Association for Computational Linguistics: EMNLP 2023

Conference

Conference	2023 Findings of the Association for Computational Linguistics: EMNLP 2023
Country/Territory	Singapore
City	Singapore
Period	12/6/23 → 12/10/23

ASJC Scopus subject areas

Computational Theory and Mathematics
Computer Science Applications
Information Systems
Language and Linguistics
Linguistics and Language

Cite this

Nakamura, M., Mashetty, S., Parmar, M., Varshney, N., & Baral, C. (2023). LogicAttack: Adversarial Attacks for Evaluating Logical Consistency of Natural Language Inference. In Findings of the Association for Computational Linguistics: EMNLP 2023 (pp. 13322-13334). (Findings of the Association for Computational Linguistics: EMNLP 2023). Association for Computational Linguistics (ACL).

LogicAttack: Adversarial Attacks for Evaluating Logical Consistency of Natural Language Inference. / Nakamura, Mutsumi; Mashetty, Santosh; Parmar, Mihir et al.
Findings of the Association for Computational Linguistics: EMNLP 2023. Association for Computational Linguistics (ACL), 2023. p. 13322-13334 (Findings of the Association for Computational Linguistics: EMNLP 2023).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Nakamura, M, Mashetty, S, Parmar, M, Varshney, N & Baral, C 2023, LogicAttack: Adversarial Attacks for Evaluating Logical Consistency of Natural Language Inference. in Findings of the Association for Computational Linguistics: EMNLP 2023. Findings of the Association for Computational Linguistics: EMNLP 2023, Association for Computational Linguistics (ACL), pp. 13322-13334, 2023 Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, Singapore, 12/6/23.

Nakamura M, Mashetty S, Parmar M, Varshney N, Baral C. LogicAttack: Adversarial Attacks for Evaluating Logical Consistency of Natural Language Inference. In Findings of the Association for Computational Linguistics: EMNLP 2023. Association for Computational Linguistics (ACL). 2023. p. 13322-13334. (Findings of the Association for Computational Linguistics: EMNLP 2023).

Nakamura, Mutsumi ; Mashetty, Santosh ; Parmar, Mihir et al. / LogicAttack : Adversarial Attacks for Evaluating Logical Consistency of Natural Language Inference. Findings of the Association for Computational Linguistics: EMNLP 2023. Association for Computational Linguistics (ACL), 2023. pp. 13322-13334 (Findings of the Association for Computational Linguistics: EMNLP 2023).

@inproceedings{85079e9355a24ef7b1abd08b126a78a4,

title = "LogicAttack: Adversarial Attacks for Evaluating Logical Consistency of Natural Language Inference",

abstract = "Recently Large Language Models (LLMs) such as GPT-3, ChatGPT, and FLAN have led to impressive progress in Natural Language Inference (NLI) tasks. However, these models may rely on simple heuristics or artifacts in the evaluation data to achieve their high performance, which suggests that they still suffer from logical inconsistency. To assess the logical consistency of these models, we propose a LogicAttack, a method to attack NLI models using diverse logical forms of premise and hypothesis, providing a more robust evaluation of their performance. Our approach leverages a range of inference rules from propositional logic, such as Modus Tollens and Bidirectional Dilemma, to generate effective adversarial attacks and identify common vulnerabilities across multiple NLI models. We achieve an average ∼ 53% Attack Success Rate (ASR) across multiple logic-based attacks. Moreover, we demonstrate that incorporating generated attack samples into training enhances the logical reasoning ability of the target model and decreases its vulnerability to logic-based attacks.",

author = "Mutsumi Nakamura and Santosh Mashetty and Mihir Parmar and Neeraj Varshney and Chitta Baral",

note = "Publisher Copyright: {\textcopyright} 2023 Association for Computational Linguistics.; 2023 Findings of the Association for Computational Linguistics: EMNLP 2023 ; Conference date: 06-12-2023 Through 10-12-2023",

year = "2023",

language = "English (US)",

series = "Findings of the Association for Computational Linguistics: EMNLP 2023",

publisher = "Association for Computational Linguistics (ACL)",

pages = "13322--13334",

booktitle = "Findings of the Association for Computational Linguistics",

}

TY - GEN

T1 - LogicAttack

T2 - 2023 Findings of the Association for Computational Linguistics: EMNLP 2023

AU - Nakamura, Mutsumi

AU - Mashetty, Santosh

AU - Parmar, Mihir

AU - Varshney, Neeraj

AU - Baral, Chitta

PY - 2023

Y1 - 2023

N2 - Recently Large Language Models (LLMs) such as GPT-3, ChatGPT, and FLAN have led to impressive progress in Natural Language Inference (NLI) tasks. However, these models may rely on simple heuristics or artifacts in the evaluation data to achieve their high performance, which suggests that they still suffer from logical inconsistency. To assess the logical consistency of these models, we propose a LogicAttack, a method to attack NLI models using diverse logical forms of premise and hypothesis, providing a more robust evaluation of their performance. Our approach leverages a range of inference rules from propositional logic, such as Modus Tollens and Bidirectional Dilemma, to generate effective adversarial attacks and identify common vulnerabilities across multiple NLI models. We achieve an average ∼ 53% Attack Success Rate (ASR) across multiple logic-based attacks. Moreover, we demonstrate that incorporating generated attack samples into training enhances the logical reasoning ability of the target model and decreases its vulnerability to logic-based attacks.

AB - Recently Large Language Models (LLMs) such as GPT-3, ChatGPT, and FLAN have led to impressive progress in Natural Language Inference (NLI) tasks. However, these models may rely on simple heuristics or artifacts in the evaluation data to achieve their high performance, which suggests that they still suffer from logical inconsistency. To assess the logical consistency of these models, we propose a LogicAttack, a method to attack NLI models using diverse logical forms of premise and hypothesis, providing a more robust evaluation of their performance. Our approach leverages a range of inference rules from propositional logic, such as Modus Tollens and Bidirectional Dilemma, to generate effective adversarial attacks and identify common vulnerabilities across multiple NLI models. We achieve an average ∼ 53% Attack Success Rate (ASR) across multiple logic-based attacks. Moreover, we demonstrate that incorporating generated attack samples into training enhances the logical reasoning ability of the target model and decreases its vulnerability to logic-based attacks.

UR - http://www.scopus.com/inward/record.url?scp=85183307206&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85183307206&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85183307206

T3 - Findings of the Association for Computational Linguistics: EMNLP 2023

SP - 13322

EP - 13334

BT - Findings of the Association for Computational Linguistics

PB - Association for Computational Linguistics (ACL)

Y2 - 6 December 2023 through 10 December 2023

ER -

LogicAttack: Adversarial Attacks for Evaluating Logical Consistency of Natural Language Inference

Abstract

Publication series

Conference

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this