Comparing random forest with logistic regression for predicting class-imbalanced civil war onset data

David Muchlinski, David Siroky, Jingrui He, Matthew Kocher

Research output: Contribution to journalArticlepeer-review

146 Scopus citations

Abstract

The most commonly used statistical models of civil war onset fail to correctly predict most occurrences of this rare event in out-of-sample data. Statistical methods for the analysis of binary data, such as logistic regression, even in their rare event and regularized forms, perform poorly at prediction. We compare the performance of Random Forests with three versions of logistic regression (classic logistic regression, Firth rare events logistic regression, and L1-regularized logistic regression), and find that the algorithmic approach provides significantly more accurate predictions of civil war onset in out-of-sample data than any of the logistic regression models. The article discusses these results and the ways in which algorithmic statistical methods like Random Forests can be useful to more accurately predict rare events in conflict data.

Original languageEnglish (US)
Pages (from-to)87-103
Number of pages17
JournalPolitical Analysis
Volume24
Issue number1
DOIs
StatePublished - Dec 1 2016

ASJC Scopus subject areas

  • Sociology and Political Science
  • Political Science and International Relations

Fingerprint

Dive into the research topics of 'Comparing random forest with logistic regression for predicting class-imbalanced civil war onset data'. Together they form a unique fingerprint.

Cite this