Online speaking rate estimation using recurrent neural networks

Yishan Jiao, Ming Tu, Visar Berisha, Julie Liss

Research output: Chapter in Book/Report/Conference proceedingConference contribution

10 Scopus citations

Abstract

A reliable online speaking rate estimation tool is useful in many domains, including speech recognition, speech therapy intervention, speaker identification, etc. This paper proposes an online speaking rate estimation model based on recurrent neural networks (RNNs). Speaking rate is a long-term feature of speech, which depends on how many syllables were spoken over an extended time window (seconds). We posit that since RNNs can capture long-term dependencies through the memory of previous hidden states, they are a good match for the speaking rate estimation task. Here we train a long short-term memory (LSTM) RNN on a set of speech features that are known to correlate with speech rhythm. An evaluation on spontaneous speech shows that the method yields a higher correlation between the estimated rate and the ground-truth rate when compared to the state-of-the-art alternatives. The evaluation on longitudinal pathological speech shows that the proposed method can capture long-term and short-term changes in speaking rate.

Original languageEnglish (US)
Title of host publication2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages5245-5249
Number of pages5
ISBN (Electronic)9781479999880
DOIs
StatePublished - May 18 2016
Event41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Shanghai, China
Duration: Mar 20 2016Mar 25 2016

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume2016-May
ISSN (Print)1520-6149

Other

Other41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016
Country/TerritoryChina
CityShanghai
Period3/20/163/25/16

Keywords

  • clinical tool
  • recurrent neural networks
  • speaking rate estimation

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Online speaking rate estimation using recurrent neural networks'. Together they form a unique fingerprint.

Cite this