Discovering, assessing, and mitigating data bias in social media

Fred Morstatter; Huan Liu

doi:10.1016/j.osnem.2017.01.001

Discovering, assessing, and mitigating data bias in social media

Fred Morstatter, Huan Liu

Research output: Contribution to journal › Article › peer-review

42 Scopus citations

Abstract

Social media has generated a wealth of data. Billions of people tweet, sharing, post, and discuss everyday. Due to this increased activity, social media platforms provide new opportunities for research about human behavior, information diffusion, and influence propagation at a scale that is otherwise impossible. Social media data is a new treasure trove for data mining and predictive analytics. Since social media data differs from conventional data, it is imperative to study its unique characteristics. This work investigates data collection bias associated with social media. In particular, we propose computational methods to assess if there is bias due to the way a social media site makes its data available, to detect bias from data samples without access to the full data, and to mitigate bias by designing data collection strategies that maximize coverage to minimize bias. We also present a new kind of data bias stemming from API attacks with both algorithms, data, and validation results. This work demonstrates how some characteristics of social media data can be extensively studied and verified and how corresponding intervention mechanisms can be designed to overcome negative effects. The methods and findings of this work could be helpful in studying different characteristics of social media data.

Original language	English (US)
Pages (from-to)	1-13
Number of pages	13
Journal	Online Social Networks and Media
Volume	1
DOIs	https://doi.org/10.1016/j.osnem.2017.01.001
State	Published - Jun 2017

Keywords

Data collection
Data collection bias
Data mining
Machine learning
Social data bias
Social media mining
Twitter

ASJC Scopus subject areas

Information Systems
Communication
Computer Networks and Communications

Access to Document

10.1016/j.osnem.2017.01.001

Cite this

@article{e85fa8cd58cb464588ea9d9e8626e116,

title = "Discovering, assessing, and mitigating data bias in social media",

abstract = "Social media has generated a wealth of data. Billions of people tweet, sharing, post, and discuss everyday. Due to this increased activity, social media platforms provide new opportunities for research about human behavior, information diffusion, and influence propagation at a scale that is otherwise impossible. Social media data is a new treasure trove for data mining and predictive analytics. Since social media data differs from conventional data, it is imperative to study its unique characteristics. This work investigates data collection bias associated with social media. In particular, we propose computational methods to assess if there is bias due to the way a social media site makes its data available, to detect bias from data samples without access to the full data, and to mitigate bias by designing data collection strategies that maximize coverage to minimize bias. We also present a new kind of data bias stemming from API attacks with both algorithms, data, and validation results. This work demonstrates how some characteristics of social media data can be extensively studied and verified and how corresponding intervention mechanisms can be designed to overcome negative effects. The methods and findings of this work could be helpful in studying different characteristics of social media data.",

keywords = "Data collection, Data collection bias, Data mining, Machine learning, Social data bias, Social media mining, Twitter",

author = "Fred Morstatter and Huan Liu",

note = "Funding Information: This work is sponsored, in part, by Office of Naval Research (ONR) grant N000141410095 and by the Department of Defense under the MINERVA initiative through the ONR N00014131083 . Publisher Copyright: {\textcopyright} 2017 Elsevier B.V.",

year = "2017",

month = jun,

doi = "10.1016/j.osnem.2017.01.001",

language = "English (US)",

volume = "1",

pages = "1--13",

journal = "Online Social Networks and Media",

issn = "2468-6964",

publisher = "Elsevier BV",

}

TY - JOUR

T1 - Discovering, assessing, and mitigating data bias in social media

AU - Morstatter, Fred

AU - Liu, Huan

N1 - Funding Information: This work is sponsored, in part, by Office of Naval Research (ONR) grant N000141410095 and by the Department of Defense under the MINERVA initiative through the ONR N00014131083 . Publisher Copyright: © 2017 Elsevier B.V.

PY - 2017/6

Y1 - 2017/6

N2 - Social media has generated a wealth of data. Billions of people tweet, sharing, post, and discuss everyday. Due to this increased activity, social media platforms provide new opportunities for research about human behavior, information diffusion, and influence propagation at a scale that is otherwise impossible. Social media data is a new treasure trove for data mining and predictive analytics. Since social media data differs from conventional data, it is imperative to study its unique characteristics. This work investigates data collection bias associated with social media. In particular, we propose computational methods to assess if there is bias due to the way a social media site makes its data available, to detect bias from data samples without access to the full data, and to mitigate bias by designing data collection strategies that maximize coverage to minimize bias. We also present a new kind of data bias stemming from API attacks with both algorithms, data, and validation results. This work demonstrates how some characteristics of social media data can be extensively studied and verified and how corresponding intervention mechanisms can be designed to overcome negative effects. The methods and findings of this work could be helpful in studying different characteristics of social media data.

AB - Social media has generated a wealth of data. Billions of people tweet, sharing, post, and discuss everyday. Due to this increased activity, social media platforms provide new opportunities for research about human behavior, information diffusion, and influence propagation at a scale that is otherwise impossible. Social media data is a new treasure trove for data mining and predictive analytics. Since social media data differs from conventional data, it is imperative to study its unique characteristics. This work investigates data collection bias associated with social media. In particular, we propose computational methods to assess if there is bias due to the way a social media site makes its data available, to detect bias from data samples without access to the full data, and to mitigate bias by designing data collection strategies that maximize coverage to minimize bias. We also present a new kind of data bias stemming from API attacks with both algorithms, data, and validation results. This work demonstrates how some characteristics of social media data can be extensively studied and verified and how corresponding intervention mechanisms can be designed to overcome negative effects. The methods and findings of this work could be helpful in studying different characteristics of social media data.

KW - Data collection

KW - Data collection bias

KW - Data mining

KW - Machine learning

KW - Social data bias

KW - Social media mining

KW - Twitter

UR - http://www.scopus.com/inward/record.url?scp=85050187929&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85050187929&partnerID=8YFLogxK

U2 - 10.1016/j.osnem.2017.01.001

DO - 10.1016/j.osnem.2017.01.001

M3 - Article

AN - SCOPUS:85050187929

SN - 2468-6964

VL - 1

SP - 1

EP - 13

JO - Online Social Networks and Media

JF - Online Social Networks and Media

ER -

Discovering, assessing, and mitigating data bias in social media

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this