Hype and heavy tails: A closer look at data breaches

Benjamin Edwards, Steven Hofmeyr, Stephanie Forrest

Research output: Contribution to journalArticlepeer-review

130 Scopus citations


Recent widely publicized data breaches have exposed the personal information of hundreds of millions of people. Some reports point to alarming increases in both the size and frequency of data breaches, spurring institutions around the world to address what appears to be a worsening situation. But, is the problem actually growing worse? In this article, we study a popular public dataset and develop Bayesian Generalized Linear Models to investigate trends in data breaches. Analysis of the model shows that neither size nor frequency of data breaches has increased over the past decade. We find that the increases that have attracted attention can be explained by the heavytailed statistical distributions underlying the dataset. Specifically, we find that the size of data breaches is well modeled by the log-normal family of distributions and that the daily frequency of breaches is described by a negative binomial distribution. These distributions may provide clues to the generative mechanisms that are responsible for the breaches. Additionally, our model predicts the likelihood of breaches of a particular size in the future. For example, we find that between 15 September 2015 and 16 September 2016 there is only a 53.6% chance of a breach of 10 million records or more in the USA. Regardless of any trend, data breaches are costly, and we combine the model with two different cost models to project that in the next 3 years breaches could cost up to $179 billion.

Original languageEnglish (US)
Pages (from-to)3-14
Number of pages12
JournalJournal of Cybersecurity
Issue number1
StatePublished - Dec 1 2016
Externally publishedYes


  • Bayesian linear model
  • Data breaches
  • Heavy tails
  • Log-normal
  • Negative binomial

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Safety, Risk, Reliability and Quality
  • Social Psychology
  • Law
  • Political Science and International Relations
  • Computer Networks and Communications


Dive into the research topics of 'Hype and heavy tails: A closer look at data breaches'. Together they form a unique fingerprint.

Cite this