Replication Data for: Disaggregating Repression: Identifying Physical Integrity Rights Allegations in Human Rights Reports

  • Rebecca Cordell (Creator)
  • K. Chad Clay (Creator)
  • Christopher J. Fariss (Creator)
  • Reed M. Wood (Creator)
  • Thorin Wright (Creator)
  • Christopher J. Fariss (Creator)

Dataset

Description

Most cross-national human rights datasets rely on human coding to produce yearly, country-level indicators of state human rights practices. Hand-coding the documents that contain the information on which these scores are based is tedious and time consuming but has been viewed as necessary given the complexity and detail of the information contained in the text. However, advances in automated text analysis have the potential to streamline this process without sacrificing accuracy. In this research note, we take the first step in creating this streamlined process by employing a supervised machine learning automated coding method that extracts specific allegations of physical integrity rights violations from the original text of country reports of human rights. This method produces a dataset including 163,512 unique abuse allegations in 196 countries between 1999 and 2016. This dataset and method will assist researchers of physical integrity rights abuse because it will allow them to produce allegation-level human rights measures that have previously not existed, and provide a jumping-off point for future projects aimed at using supervised machine learning to create global human rights metrics.
Date made available2021
PublisherHarvard Dataverse

Cite this