Feature extraction and classification models for high-dimensional profile data

Amit Shinde, George Church, Mani Janakiram, George Runger

Research output: Contribution to journalArticlepeer-review

2 Scopus citations


As manufacturing transitions to real-time sensing, it becomes more important to handle multiple, high-dimensional (non-stationary) time series that generate thousands of measurements for each batch. Predictive models are often challenged by such high-dimensional data and it is important to reduce the dimensionality for better performance. With thousands of measurements, even wavelet coefficients do not reduce the dimensionality sufficiently. We propose a two-stage method that uses energy statistics from a discrete wavelet transform to identify process variables and appropriate resolutions of wavelet coefficients in an initial (screening) model. Variable importance scores from a modern random forest classifier are exploited in this stage. Coefficients that correspond to the identified variables and resolutions are then selected for a second-stage predictive model. The approach is shown to provide good performance, along with interpretable results, in an example where multiple time series are used to indicate the need for preventive maintenance. In general, the two-stage approach can handle high dimensionality and still provide interpretable features linked to the relevant process variables and wavelet resolutions that can be used for further analysis.

Original languageEnglish (US)
Pages (from-to)885-893
Number of pages9
JournalQuality and Reliability Engineering International
Issue number7
StatePublished - Nov 2011


  • discrete wavelet transformation
  • preventive maintenance
  • random forest

ASJC Scopus subject areas

  • Safety, Risk, Reliability and Quality
  • Management Science and Operations Research


Dive into the research topics of 'Feature extraction and classification models for high-dimensional profile data'. Together they form a unique fingerprint.

Cite this