How Reliable Are AI-Generated-Text Detectors? An Assessment Framework Using Evasive Soft Prompts

Tharindu Kumarage; Paras Sheth; Raha Moraffah; Joshua Garland; Huan Liu

How Reliable Are AI-Generated-Text Detectors? An Assessment Framework Using Evasive Soft Prompts

Tharindu Kumarage, Paras Sheth, Raha Moraffah, Joshua Garland, Huan Liu

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Abstract

In recent years, there has been a rapid proliferation of AI-generated text, primarily driven by the release of powerful pre-trained language models (PLMs). To address the issue of misuse associated with AI-generated text, various high-performing detectors have been developed, including the OpenAI detector and the Stanford DetectGPT. In our study, we ask how reliable these detectors are. We answer the question by designing a novel approach that can prompt any PLM to generate text that evades these high-performing detectors. The proposed approach suggests a universal evasive prompt, a novel type of soft prompt, which guides PLMs in producing "human-like" text that can mislead the detectors. The novel universal evasive prompt is achieved in two steps: First, we create an evasive soft prompt tailored to a specific PLM through prompt tuning; and then, we leverage the transferability of soft prompts to transfer the learned evasive soft prompt from one PLM to another. Employing multiple PLMs in various writing tasks, we conduct extensive experiments to evaluate the efficacy of the evasive soft prompts in their evasion of state-of-the-art detectors.

Original language	English (US)
Title of host publication	Findings of the Association for Computational Linguistics
Subtitle of host publication	EMNLP 2023
Publisher	Association for Computational Linguistics (ACL)
Pages	1337-1349
Number of pages	13
ISBN (Electronic)	9798891760615
State	Published - 2023
Externally published	Yes
Event	2023 Findings of the Association for Computational Linguistics: EMNLP 2023 - Singapore, Singapore Duration: Dec 6 2023 → Dec 10 2023

Publication series

Name	Findings of the Association for Computational Linguistics: EMNLP 2023

Conference

Conference	2023 Findings of the Association for Computational Linguistics: EMNLP 2023
Country/Territory	Singapore
City	Singapore
Period	12/6/23 → 12/10/23

ASJC Scopus subject areas

Computational Theory and Mathematics
Computer Science Applications
Information Systems
Language and Linguistics
Linguistics and Language

Cite this

Kumarage, T., Sheth, P., Moraffah, R., Garland, J., & Liu, H. (2023). How Reliable Are AI-Generated-Text Detectors? An Assessment Framework Using Evasive Soft Prompts. In Findings of the Association for Computational Linguistics: EMNLP 2023 (pp. 1337-1349). (Findings of the Association for Computational Linguistics: EMNLP 2023). Association for Computational Linguistics (ACL).

How Reliable Are AI-Generated-Text Detectors? An Assessment Framework Using Evasive Soft Prompts. / Kumarage, Tharindu; Sheth, Paras; Moraffah, Raha et al.
Findings of the Association for Computational Linguistics: EMNLP 2023. Association for Computational Linguistics (ACL), 2023. p. 1337-1349 (Findings of the Association for Computational Linguistics: EMNLP 2023).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Kumarage, T, Sheth, P, Moraffah, R, Garland, J & Liu, H 2023, How Reliable Are AI-Generated-Text Detectors? An Assessment Framework Using Evasive Soft Prompts. in Findings of the Association for Computational Linguistics: EMNLP 2023. Findings of the Association for Computational Linguistics: EMNLP 2023, Association for Computational Linguistics (ACL), pp. 1337-1349, 2023 Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, Singapore, 12/6/23.

Kumarage, Tharindu ; Sheth, Paras ; Moraffah, Raha et al. / How Reliable Are AI-Generated-Text Detectors? An Assessment Framework Using Evasive Soft Prompts. Findings of the Association for Computational Linguistics: EMNLP 2023. Association for Computational Linguistics (ACL), 2023. pp. 1337-1349 (Findings of the Association for Computational Linguistics: EMNLP 2023).

@inproceedings{64cbf2282d1d46f7bfdaea6054b135c5,

title = "How Reliable Are AI-Generated-Text Detectors? An Assessment Framework Using Evasive Soft Prompts",

abstract = "In recent years, there has been a rapid proliferation of AI-generated text, primarily driven by the release of powerful pre-trained language models (PLMs). To address the issue of misuse associated with AI-generated text, various high-performing detectors have been developed, including the OpenAI detector and the Stanford DetectGPT. In our study, we ask how reliable these detectors are. We answer the question by designing a novel approach that can prompt any PLM to generate text that evades these high-performing detectors. The proposed approach suggests a universal evasive prompt, a novel type of soft prompt, which guides PLMs in producing {"}human-like{"} text that can mislead the detectors. The novel universal evasive prompt is achieved in two steps: First, we create an evasive soft prompt tailored to a specific PLM through prompt tuning; and then, we leverage the transferability of soft prompts to transfer the learned evasive soft prompt from one PLM to another. Employing multiple PLMs in various writing tasks, we conduct extensive experiments to evaluate the efficacy of the evasive soft prompts in their evasion of state-of-the-art detectors.",

author = "Tharindu Kumarage and Paras Sheth and Raha Moraffah and Joshua Garland and Huan Liu",

note = "Publisher Copyright: {\textcopyright} 2023 Association for Computational Linguistics.; 2023 Findings of the Association for Computational Linguistics: EMNLP 2023 ; Conference date: 06-12-2023 Through 10-12-2023",

year = "2023",

language = "English (US)",

series = "Findings of the Association for Computational Linguistics: EMNLP 2023",

publisher = "Association for Computational Linguistics (ACL)",

pages = "1337--1349",

booktitle = "Findings of the Association for Computational Linguistics",

}

TY - GEN

T1 - How Reliable Are AI-Generated-Text Detectors? An Assessment Framework Using Evasive Soft Prompts

AU - Kumarage, Tharindu

AU - Sheth, Paras

AU - Moraffah, Raha

AU - Garland, Joshua

AU - Liu, Huan

PY - 2023

Y1 - 2023

N2 - In recent years, there has been a rapid proliferation of AI-generated text, primarily driven by the release of powerful pre-trained language models (PLMs). To address the issue of misuse associated with AI-generated text, various high-performing detectors have been developed, including the OpenAI detector and the Stanford DetectGPT. In our study, we ask how reliable these detectors are. We answer the question by designing a novel approach that can prompt any PLM to generate text that evades these high-performing detectors. The proposed approach suggests a universal evasive prompt, a novel type of soft prompt, which guides PLMs in producing "human-like" text that can mislead the detectors. The novel universal evasive prompt is achieved in two steps: First, we create an evasive soft prompt tailored to a specific PLM through prompt tuning; and then, we leverage the transferability of soft prompts to transfer the learned evasive soft prompt from one PLM to another. Employing multiple PLMs in various writing tasks, we conduct extensive experiments to evaluate the efficacy of the evasive soft prompts in their evasion of state-of-the-art detectors.

AB - In recent years, there has been a rapid proliferation of AI-generated text, primarily driven by the release of powerful pre-trained language models (PLMs). To address the issue of misuse associated with AI-generated text, various high-performing detectors have been developed, including the OpenAI detector and the Stanford DetectGPT. In our study, we ask how reliable these detectors are. We answer the question by designing a novel approach that can prompt any PLM to generate text that evades these high-performing detectors. The proposed approach suggests a universal evasive prompt, a novel type of soft prompt, which guides PLMs in producing "human-like" text that can mislead the detectors. The novel universal evasive prompt is achieved in two steps: First, we create an evasive soft prompt tailored to a specific PLM through prompt tuning; and then, we leverage the transferability of soft prompts to transfer the learned evasive soft prompt from one PLM to another. Employing multiple PLMs in various writing tasks, we conduct extensive experiments to evaluate the efficacy of the evasive soft prompts in their evasion of state-of-the-art detectors.

UR - http://www.scopus.com/inward/record.url?scp=85183288578&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85183288578&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85183288578

T3 - Findings of the Association for Computational Linguistics: EMNLP 2023

SP - 1337

EP - 1349

BT - Findings of the Association for Computational Linguistics

PB - Association for Computational Linguistics (ACL)

T2 - 2023 Findings of the Association for Computational Linguistics: EMNLP 2023

Y2 - 6 December 2023 through 10 December 2023

ER -

How Reliable Are AI-Generated-Text Detectors? An Assessment Framework Using Evasive Soft Prompts

Abstract

Publication series

Conference

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this