High-dimensional disease outbreak detection using tree-based ensembles

Saylisse Dávila, George Runger, Eugene Tuv, Paola Pacheco

Research output: Chapter in Book/Report/Conference proceedingConference contribution


A common goal of most public health surveillance programs is to detect disease outbreaks before they become a threat to the public. In this work, we propose a novel and computationally feasible approach to this problem. By tackling public health surveillance with a supervised learner that can handle high-dimensional, mixed-type data, and even missing values; we developed a method that can accurately detect changes in disease incidence rates, even in high-dimensions. We use probability estimates from random forests to develop an alternative signal criterion that can detect when there is a concentration of disease incidences within a particular geographic region and/or subpopulation that is unlikely to have occurred by chance. A series of simulated experiments suggest this method is able to accurately detect the presence of disease clusters, on average, 88% of time. Simulated results also suggest a feasible combination of the method's parameters that can significantly reduce the computational complexity of the method to an average system time of 1.9 minutes (s = 0.48 minutes) for a data set containing 1,000 incidences running on an Intel Core i5 processor.

Original languageEnglish (US)
Title of host publicationIIE Annual Conference and Expo 2013
PublisherInstitute of Industrial Engineers
Number of pages10
StatePublished - 2013
EventIIE Annual Conference and Expo 2013 - San Juan, Puerto Rico
Duration: May 18 2013May 22 2013


OtherIIE Annual Conference and Expo 2013
Country/TerritoryPuerto Rico
CitySan Juan

ASJC Scopus subject areas

  • Industrial and Manufacturing Engineering


Dive into the research topics of 'High-dimensional disease outbreak detection using tree-based ensembles'. Together they form a unique fingerprint.

Cite this