Analyzing Arizona OSHA Injury Reports Using Unsupervised Machine Learning

Abbas Chokor, Hariharan Naganathan, Oswald Chong, Mounir El Asmar

Research output: Contribution to journalConference articlepeer-review

47 Scopus citations


As the construction continue to be a leading industry in the number of injuries and fatalities annually, several organizations and agencies are working avidly to ensure the number of injuries and fatalities is minimized. The Occupational Safety and Health Administration (OSHA) is one such effort to assure safe and healthful working conditions for working men and women by setting and enforcing standards and by providing training, outreach, education and assistance. Given the large databases of OSHA historical events and reports, a manual analysis of the fatality and catastrophe investigations content is a time consuming and expensive process. This paper aims to evaluate the strength of unsupervised machine learning and Natural Language Processing (NLP) in supporting safety inspections and reorganizing accidents database on a state level. After collecting construction accident reports from the OSHA Arizona office, the methodology consists of preprocessing the accident reports and weighting terms in order to apply a data-driven unsupervised K-Means-based clustering approach. The proposed method classifies the collected reports in four clusters, each reporting a type of accident. The results show the construction accidents in the state of Arizona to be caused by falls (42.9%), struck by objects (34.3%), electrocutions (12.5%), and trenches collapse (10.3%). The findings of this research empower state and local agencies with a customized presentation of the accidents fitting their regulations and weather conditions. What is applicable to one climate might not be suitable for another; therefore, such rearrangement of the accidents database on a state based level is a necessary prerequisite to enhance the local safety applications and standards.

Original languageEnglish (US)
Pages (from-to)1588-1593
Number of pages6
JournalProcedia Engineering
StatePublished - 2016
EventInternational Conference on Sustainable Design, Engineering and Construction, ICSDEC 2016 - Tempe, United States
Duration: May 18 2016May 20 2016


  • Accident
  • Construction
  • Data Analysis
  • Injury
  • Natural Language Processing
  • OSHA
  • Safety

ASJC Scopus subject areas

  • Engineering(all)


Dive into the research topics of 'Analyzing Arizona OSHA Injury Reports Using Unsupervised Machine Learning'. Together they form a unique fingerprint.

Cite this