Comparing random forest with logistic regression for predicting class-imbalanced civil war onset data

David Muchlinski; David Siroky; Jingrui He; Matthew Kocher

doi:10.1093/pan/mpv024

Comparing random forest with logistic regression for predicting class-imbalanced civil war onset data

David Muchlinski, David Siroky, Jingrui He, Matthew Kocher

Research output: Contribution to journal › Article › peer-review

146 Scopus citations

Abstract

The most commonly used statistical models of civil war onset fail to correctly predict most occurrences of this rare event in out-of-sample data. Statistical methods for the analysis of binary data, such as logistic regression, even in their rare event and regularized forms, perform poorly at prediction. We compare the performance of Random Forests with three versions of logistic regression (classic logistic regression, Firth rare events logistic regression, and L₁-regularized logistic regression), and find that the algorithmic approach provides significantly more accurate predictions of civil war onset in out-of-sample data than any of the logistic regression models. The article discusses these results and the ways in which algorithmic statistical methods like Random Forests can be useful to more accurately predict rare events in conflict data.

Original language	English (US)
Pages (from-to)	87-103
Number of pages	17
Journal	Political Analysis
Volume	24
Issue number	1
DOIs	https://doi.org/10.1093/pan/mpv024
State	Published - Dec 1 2016

ASJC Scopus subject areas

Sociology and Political Science
Political Science and International Relations

Access to Document

10.1093/pan/mpv024

Cite this

@article{6f5aed4e2a144bf191a7224dcd9a25c5,

title = "Comparing random forest with logistic regression for predicting class-imbalanced civil war onset data",

abstract = "The most commonly used statistical models of civil war onset fail to correctly predict most occurrences of this rare event in out-of-sample data. Statistical methods for the analysis of binary data, such as logistic regression, even in their rare event and regularized forms, perform poorly at prediction. We compare the performance of Random Forests with three versions of logistic regression (classic logistic regression, Firth rare events logistic regression, and L1-regularized logistic regression), and find that the algorithmic approach provides significantly more accurate predictions of civil war onset in out-of-sample data than any of the logistic regression models. The article discusses these results and the ways in which algorithmic statistical methods like Random Forests can be useful to more accurately predict rare events in conflict data.",

author = "David Muchlinski and David Siroky and Jingrui He and Matthew Kocher",

note = "Publisher Copyright: {\textcopyright} The Author 2016.",

year = "2016",

month = dec,

day = "1",

doi = "10.1093/pan/mpv024",

language = "English (US)",

volume = "24",

pages = "87--103",

journal = "Political Analysis",

issn = "1047-1987",

publisher = "Oxford University Press",

number = "1",

}

TY - JOUR

T1 - Comparing random forest with logistic regression for predicting class-imbalanced civil war onset data

AU - Muchlinski, David

AU - Siroky, David

AU - He, Jingrui

AU - Kocher, Matthew

N1 - Publisher Copyright: © The Author 2016.

PY - 2016/12/1

Y1 - 2016/12/1

N2 - The most commonly used statistical models of civil war onset fail to correctly predict most occurrences of this rare event in out-of-sample data. Statistical methods for the analysis of binary data, such as logistic regression, even in their rare event and regularized forms, perform poorly at prediction. We compare the performance of Random Forests with three versions of logistic regression (classic logistic regression, Firth rare events logistic regression, and L1-regularized logistic regression), and find that the algorithmic approach provides significantly more accurate predictions of civil war onset in out-of-sample data than any of the logistic regression models. The article discusses these results and the ways in which algorithmic statistical methods like Random Forests can be useful to more accurately predict rare events in conflict data.

AB - The most commonly used statistical models of civil war onset fail to correctly predict most occurrences of this rare event in out-of-sample data. Statistical methods for the analysis of binary data, such as logistic regression, even in their rare event and regularized forms, perform poorly at prediction. We compare the performance of Random Forests with three versions of logistic regression (classic logistic regression, Firth rare events logistic regression, and L1-regularized logistic regression), and find that the algorithmic approach provides significantly more accurate predictions of civil war onset in out-of-sample data than any of the logistic regression models. The article discusses these results and the ways in which algorithmic statistical methods like Random Forests can be useful to more accurately predict rare events in conflict data.

UR - http://www.scopus.com/inward/record.url?scp=84963538255&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84963538255&partnerID=8YFLogxK

U2 - 10.1093/pan/mpv024

DO - 10.1093/pan/mpv024

M3 - Article

AN - SCOPUS:84963538255

SN - 1047-1987

VL - 24

SP - 87

EP - 103

JO - Political Analysis

JF - Political Analysis

IS - 1

ER -

Comparing random forest with logistic regression for predicting class-imbalanced civil war onset data

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this