Abstract
Improved performance in speech applications using deep neural networks (DNNs) has come at the expense of reduced model interpretability. For consumer applications this is not a problem; however, for health applications, clinicians must be able to interpret why a predictive model made the decision that it did. In this paper, we propose an interpretable model for objective assessment of dysarthric speech for speech therapy applications based on DNNs. Our model aims to predict a general impression of the severity of the speech disorder; however, instead of directly generating a severity prediction from a highdimensional input acoustic feature space, we add an intermediate interpretable layer that acts as a bottle-neck feature extractor and constrains the solution space of the DNNs. During inference, the model provides an estimate of severity at the output of the network and a set of explanatory features from the intermediate layer of the network that explain the final decision. We evaluate the performance of the model on a dysarthric speech dataset and show that the proposed model provides an interpretable output that is highly correlated with the subjective evaluation of Speech-Language Pathologists (SLPs).
Original language | English (US) |
---|---|
Pages (from-to) | 1849-1853 |
Number of pages | 5 |
Journal | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
Volume | 2017-August |
DOIs | |
State | Published - 2017 |
Event | 18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017 - Stockholm, Sweden Duration: Aug 20 2017 → Aug 24 2017 |
Keywords
- Deep neural networks
- Dysarthric speech
- Model interpretability
- Objective assessment
ASJC Scopus subject areas
- Language and Linguistics
- Human-Computer Interaction
- Signal Processing
- Software
- Modeling and Simulation