TY - GEN
T1 - Neural Authorship Attribution
T2 - 15th International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, CyberC 2023
AU - Kumarage, Tharindu
AU - Liu, Huan
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Large language models (LLMs) such as GPT-4, PaLM, and Llama have significantly propelled the generation of AI-crafted text. With rising concerns about their potential misuse, there is a pressing need for AI-generated-text forensics. Neural authorship attribution is a forensic effort, seeking to trace AIgenerated text back to its originating LLM. The LLM landscape can be divided into two primary categories: proprietary and open-source. In this work, we delve into these emerging categories of LLMs, focusing on the nuances of neural authorship attribution. To enrich our understanding, we carry out an empirical analysis of LLM writing signatures, highlighting the contrasts between proprietary and open-source models, and scrutinizing variations within each group. By integrating stylometric features across lexical, syntactic, and structural aspects of language, we explore their potential to yield interpretable results and augment pre-trained language model-based classifiers utilized in neural authorship attribution. Our findings, based on a range of state-of the-art LLMs, provide empirical insights into neural authorship attribution, paving the way for future investigations aimed at mitigating the threats posed by AI-generated misinformation.
AB - Large language models (LLMs) such as GPT-4, PaLM, and Llama have significantly propelled the generation of AI-crafted text. With rising concerns about their potential misuse, there is a pressing need for AI-generated-text forensics. Neural authorship attribution is a forensic effort, seeking to trace AIgenerated text back to its originating LLM. The LLM landscape can be divided into two primary categories: proprietary and open-source. In this work, we delve into these emerging categories of LLMs, focusing on the nuances of neural authorship attribution. To enrich our understanding, we carry out an empirical analysis of LLM writing signatures, highlighting the contrasts between proprietary and open-source models, and scrutinizing variations within each group. By integrating stylometric features across lexical, syntactic, and structural aspects of language, we explore their potential to yield interpretable results and augment pre-trained language model-based classifiers utilized in neural authorship attribution. Our findings, based on a range of state-of the-art LLMs, provide empirical insights into neural authorship attribution, paving the way for future investigations aimed at mitigating the threats posed by AI-generated misinformation.
KW - large language models
KW - neural authorship attribution
KW - stylometric analysis
UR - http://www.scopus.com/inward/record.url?scp=85186768549&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85186768549&partnerID=8YFLogxK
U2 - 10.1109/CyberC58899.2023.00019
DO - 10.1109/CyberC58899.2023.00019
M3 - Conference contribution
AN - SCOPUS:85186768549
T3 - Proceedings - 2023 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, CyberC 2023
SP - 51
EP - 54
BT - Proceedings - 2023 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, CyberC 2023
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 2 November 2023 through 4 November 2023
ER -