Segmentation, indexing, and retrieval for environmental and natural sounds

Gordon Wichern, Jiachen Xue, Harvey Thornburg, Brandon Mechtley, Andreas Spanias

Research output: Contribution to journalArticlepeer-review

58 Scopus citations


We propose a method for characterizing sound activity in fixed spaces through segmentation, indexing, and retrieval of continuous audio recordings. Regarding segmentation, we present a dynamic Bayesian network (DBN) that jointly infers onsets and end times of the most prominent sound events in the space, along with an extension of the algorithm for covering large spaces with distributed microphone arrays. Each segmented sound event is indexed with a hidden Markov model (HMM) that models the distribution of example-based queries that a user would employ to retrieve the event (or similar events). In order to increase the efficiency of the retrieval search, we recursively apply a modified spectral clustering algorithm to group similar sound events based on the distance between their corresponding HMMs. We then conduct a formal user study to obtain the relevancy decisions necessary for evaluation of our retrieval algorithm on both automatically and manually segmented sound clips. Furthermore, our segmentation and retrieval algorithms are shown to be effective in both quiet indoor and noisy outdoor recording conditions.

Original languageEnglish (US)
Article number5410056
Pages (from-to)688-707
Number of pages20
JournalIEEE Transactions on Audio, Speech and Language Processing
Issue number3
StatePublished - Mar 2010


  • Acoustic signal analysis
  • Acoustic signal detection
  • Bayes procedures
  • Clustering methods
  • Database query processing

ASJC Scopus subject areas

  • Acoustics and Ultrasonics
  • Electrical and Electronic Engineering


Dive into the research topics of 'Segmentation, indexing, and retrieval for environmental and natural sounds'. Together they form a unique fingerprint.

Cite this