TY - JOUR
T1 - Segmentation, indexing, and retrieval for environmental and natural sounds
AU - Wichern, Gordon
AU - Xue, Jiachen
AU - Thornburg, Harvey
AU - Mechtley, Brandon
AU - Spanias, Andreas
N1 - Funding Information:
Manuscript received January 06, 2008; revised December 02, 2009. Current version published February 10, 2010. This work was supported by the National Science Foundation under Grants NSF IGERT DGE-05-04647 and NSF CISE Research Infrastructure 04-03428. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Bertrand David.
PY - 2010/3
Y1 - 2010/3
N2 - We propose a method for characterizing sound activity in fixed spaces through segmentation, indexing, and retrieval of continuous audio recordings. Regarding segmentation, we present a dynamic Bayesian network (DBN) that jointly infers onsets and end times of the most prominent sound events in the space, along with an extension of the algorithm for covering large spaces with distributed microphone arrays. Each segmented sound event is indexed with a hidden Markov model (HMM) that models the distribution of example-based queries that a user would employ to retrieve the event (or similar events). In order to increase the efficiency of the retrieval search, we recursively apply a modified spectral clustering algorithm to group similar sound events based on the distance between their corresponding HMMs. We then conduct a formal user study to obtain the relevancy decisions necessary for evaluation of our retrieval algorithm on both automatically and manually segmented sound clips. Furthermore, our segmentation and retrieval algorithms are shown to be effective in both quiet indoor and noisy outdoor recording conditions.
AB - We propose a method for characterizing sound activity in fixed spaces through segmentation, indexing, and retrieval of continuous audio recordings. Regarding segmentation, we present a dynamic Bayesian network (DBN) that jointly infers onsets and end times of the most prominent sound events in the space, along with an extension of the algorithm for covering large spaces with distributed microphone arrays. Each segmented sound event is indexed with a hidden Markov model (HMM) that models the distribution of example-based queries that a user would employ to retrieve the event (or similar events). In order to increase the efficiency of the retrieval search, we recursively apply a modified spectral clustering algorithm to group similar sound events based on the distance between their corresponding HMMs. We then conduct a formal user study to obtain the relevancy decisions necessary for evaluation of our retrieval algorithm on both automatically and manually segmented sound clips. Furthermore, our segmentation and retrieval algorithms are shown to be effective in both quiet indoor and noisy outdoor recording conditions.
KW - Acoustic signal analysis
KW - Acoustic signal detection
KW - Bayes procedures
KW - Clustering methods
KW - Database query processing
UR - http://www.scopus.com/inward/record.url?scp=76949085351&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=76949085351&partnerID=8YFLogxK
U2 - 10.1109/TASL.2010.2041384
DO - 10.1109/TASL.2010.2041384
M3 - Article
AN - SCOPUS:76949085351
SN - 1558-7916
VL - 18
SP - 688
EP - 707
JO - IEEE Transactions on Audio, Speech and Language Processing
JF - IEEE Transactions on Audio, Speech and Language Processing
IS - 3
M1 - 5410056
ER -