Leveraging Financial Social Media Data for Corporate Fraud Detection

Wei Dong; Shaoyi Liao; Zhongju Zhang

doi:10.1080/07421222.2018.1451954

Leveraging Financial Social Media Data for Corporate Fraud Detection

Wei Dong, Shaoyi Liao, Zhongju Zhang

Information Systems

Research output: Contribution to journal › Article › peer-review

118 Scopus citations

Abstract

Corporate fraud can lead to significant financial losses and cause immeasurable damage to investor confidence and the overall economy. Detection of such frauds is a time-consuming and challenging task. Traditionally, researchers have been relying on financial data and/or textual content from financial statements to detect corporate fraud. Guided by systemic functional linguistics (SFL) theory, we propose an analytic framework that taps into unstructured data from financial social media platforms to assess the risk of corporate fraud. We assemble a unique data set including 64 fraudulent firms and a matched sample of 64 nonfraudulent firms, as well as the social media data prior to the firm’s alleged fraud violation in Accounting and Auditing Enforcement Releases (AAERs). Our framework automatically extracts signals such as sentiment features, emotion features, topic features, lexical features, and social network features, which are then fed into machine learning classifiers for fraud detection. We evaluate and compare the performance of our algorithm against baseline approaches using only financial ratios and language-based features respectively. We further validate the robustness of our algorithm by detecting leaked information and rumors, testing the algorithm on a new data set, and conducting an applicability check. Our results demonstrate the value of financial social media data and serve as a proof of concept of using such data to complement traditional fraud detection methods.

Original language	English (US)
Pages (from-to)	461-487
Number of pages	27
Journal	Journal of Management Information Systems
Volume	35
Issue number	2
DOIs	https://doi.org/10.1080/07421222.2018.1451954
State	Published - Apr 3 2018

ASJC Scopus subject areas

Management Information Systems
Computer Science Applications
Management Science and Operations Research
Information Systems and Management

Access to Document

10.1080/07421222.2018.1451954

Cite this

@article{6a080dd37f3a4c0182fbcd2f41d2d65d,

title = "Leveraging Financial Social Media Data for Corporate Fraud Detection",

abstract = "Corporate fraud can lead to significant financial losses and cause immeasurable damage to investor confidence and the overall economy. Detection of such frauds is a time-consuming and challenging task. Traditionally, researchers have been relying on financial data and/or textual content from financial statements to detect corporate fraud. Guided by systemic functional linguistics (SFL) theory, we propose an analytic framework that taps into unstructured data from financial social media platforms to assess the risk of corporate fraud. We assemble a unique data set including 64 fraudulent firms and a matched sample of 64 nonfraudulent firms, as well as the social media data prior to the firm{\textquoteright}s alleged fraud violation in Accounting and Auditing Enforcement Releases (AAERs). Our framework automatically extracts signals such as sentiment features, emotion features, topic features, lexical features, and social network features, which are then fed into machine learning classifiers for fraud detection. We evaluate and compare the performance of our algorithm against baseline approaches using only financial ratios and language-based features respectively. We further validate the robustness of our algorithm by detecting leaked information and rumors, testing the algorithm on a new data set, and conducting an applicability check. Our results demonstrate the value of financial social media data and serve as a proof of concept of using such data to complement traditional fraud detection methods.",

author = "Wei Dong and Shaoyi Liao and Zhongju Zhang",

note = "Funding Information: This work was supported by the National Science-Technology Support Plan of China (2015BAK18B02), a development grant from Shenzhen Science, Technology and Innovation Commission (JCYJ20160229165300897), and a Hong Kong GRF grant (193213). Publisher Copyright: Copyright {\textcopyright} Taylor & Francis Group, LLC.",

year = "2018",

month = apr,

day = "3",

doi = "10.1080/07421222.2018.1451954",

language = "English (US)",

volume = "35",

pages = "461--487",

journal = "Journal of Management Information Systems",

issn = "0742-1222",

publisher = "M.E. Sharpe Inc.",

number = "2",

}

TY - JOUR

T1 - Leveraging Financial Social Media Data for Corporate Fraud Detection

AU - Dong, Wei

AU - Liao, Shaoyi

AU - Zhang, Zhongju

N1 - Funding Information: This work was supported by the National Science-Technology Support Plan of China (2015BAK18B02), a development grant from Shenzhen Science, Technology and Innovation Commission (JCYJ20160229165300897), and a Hong Kong GRF grant (193213). Publisher Copyright: Copyright © Taylor & Francis Group, LLC.

PY - 2018/4/3

Y1 - 2018/4/3

N2 - Corporate fraud can lead to significant financial losses and cause immeasurable damage to investor confidence and the overall economy. Detection of such frauds is a time-consuming and challenging task. Traditionally, researchers have been relying on financial data and/or textual content from financial statements to detect corporate fraud. Guided by systemic functional linguistics (SFL) theory, we propose an analytic framework that taps into unstructured data from financial social media platforms to assess the risk of corporate fraud. We assemble a unique data set including 64 fraudulent firms and a matched sample of 64 nonfraudulent firms, as well as the social media data prior to the firm’s alleged fraud violation in Accounting and Auditing Enforcement Releases (AAERs). Our framework automatically extracts signals such as sentiment features, emotion features, topic features, lexical features, and social network features, which are then fed into machine learning classifiers for fraud detection. We evaluate and compare the performance of our algorithm against baseline approaches using only financial ratios and language-based features respectively. We further validate the robustness of our algorithm by detecting leaked information and rumors, testing the algorithm on a new data set, and conducting an applicability check. Our results demonstrate the value of financial social media data and serve as a proof of concept of using such data to complement traditional fraud detection methods.

AB - Corporate fraud can lead to significant financial losses and cause immeasurable damage to investor confidence and the overall economy. Detection of such frauds is a time-consuming and challenging task. Traditionally, researchers have been relying on financial data and/or textual content from financial statements to detect corporate fraud. Guided by systemic functional linguistics (SFL) theory, we propose an analytic framework that taps into unstructured data from financial social media platforms to assess the risk of corporate fraud. We assemble a unique data set including 64 fraudulent firms and a matched sample of 64 nonfraudulent firms, as well as the social media data prior to the firm’s alleged fraud violation in Accounting and Auditing Enforcement Releases (AAERs). Our framework automatically extracts signals such as sentiment features, emotion features, topic features, lexical features, and social network features, which are then fed into machine learning classifiers for fraud detection. We evaluate and compare the performance of our algorithm against baseline approaches using only financial ratios and language-based features respectively. We further validate the robustness of our algorithm by detecting leaked information and rumors, testing the algorithm on a new data set, and conducting an applicability check. Our results demonstrate the value of financial social media data and serve as a proof of concept of using such data to complement traditional fraud detection methods.

UR - http://www.scopus.com/inward/record.url?scp=85047254879&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85047254879&partnerID=8YFLogxK

U2 - 10.1080/07421222.2018.1451954

DO - 10.1080/07421222.2018.1451954

M3 - Article

AN - SCOPUS:85047254879

SN - 0742-1222

VL - 35

SP - 461

EP - 487

JO - Journal of Management Information Systems

JF - Journal of Management Information Systems

IS - 2

ER -

Leveraging Financial Social Media Data for Corporate Fraud Detection

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this