Interpretable objective assessment of dysarthric speech based on deep neural networks

Ming Tu; Visar Berisha; Julie Liss

doi:10.21437/Interspeech.2017-1222

Interpretable objective assessment of dysarthric speech based on deep neural networks

Ming Tu, Visar Berisha, Julie Liss

Research output: Contribution to journal › Conference article › peer-review

28 Scopus citations

Abstract

Improved performance in speech applications using deep neural networks (DNNs) has come at the expense of reduced model interpretability. For consumer applications this is not a problem; however, for health applications, clinicians must be able to interpret why a predictive model made the decision that it did. In this paper, we propose an interpretable model for objective assessment of dysarthric speech for speech therapy applications based on DNNs. Our model aims to predict a general impression of the severity of the speech disorder; however, instead of directly generating a severity prediction from a highdimensional input acoustic feature space, we add an intermediate interpretable layer that acts as a bottle-neck feature extractor and constrains the solution space of the DNNs. During inference, the model provides an estimate of severity at the output of the network and a set of explanatory features from the intermediate layer of the network that explain the final decision. We evaluate the performance of the model on a dysarthric speech dataset and show that the proposed model provides an interpretable output that is highly correlated with the subjective evaluation of Speech-Language Pathologists (SLPs).

Original language	English (US)
Pages (from-to)	1849-1853
Number of pages	5
Journal	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume	2017-August
DOIs	https://doi.org/10.21437/Interspeech.2017-1222
State	Published - 2017
Event	18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017 - Stockholm, Sweden Duration: Aug 20 2017 → Aug 24 2017

Keywords

Deep neural networks
Dysarthric speech
Model interpretability
Objective assessment

ASJC Scopus subject areas

Language and Linguistics
Human-Computer Interaction
Signal Processing
Software
Modeling and Simulation

Access to Document

10.21437/Interspeech.2017-1222

Cite this

@article{bde0b438af6f4aeb988cdb6a9c47c726,

title = "Interpretable objective assessment of dysarthric speech based on deep neural networks",

abstract = "Improved performance in speech applications using deep neural networks (DNNs) has come at the expense of reduced model interpretability. For consumer applications this is not a problem; however, for health applications, clinicians must be able to interpret why a predictive model made the decision that it did. In this paper, we propose an interpretable model for objective assessment of dysarthric speech for speech therapy applications based on DNNs. Our model aims to predict a general impression of the severity of the speech disorder; however, instead of directly generating a severity prediction from a highdimensional input acoustic feature space, we add an intermediate interpretable layer that acts as a bottle-neck feature extractor and constrains the solution space of the DNNs. During inference, the model provides an estimate of severity at the output of the network and a set of explanatory features from the intermediate layer of the network that explain the final decision. We evaluate the performance of the model on a dysarthric speech dataset and show that the proposed model provides an interpretable output that is highly correlated with the subjective evaluation of Speech-Language Pathologists (SLPs).",

keywords = "Deep neural networks, Dysarthric speech, Model interpretability, Objective assessment",

author = "Ming Tu and Visar Berisha and Julie Liss",

note = "Publisher Copyright: Copyright {\textcopyright} 2017 ISCA.; 18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017 ; Conference date: 20-08-2017 Through 24-08-2017",

year = "2017",

doi = "10.21437/Interspeech.2017-1222",

language = "English (US)",

volume = "2017-August",

pages = "1849--1853",

journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",

issn = "2308-457X",

}

TY - JOUR

T1 - Interpretable objective assessment of dysarthric speech based on deep neural networks

AU - Tu, Ming

AU - Berisha, Visar

AU - Liss, Julie

PY - 2017

Y1 - 2017

N2 - Improved performance in speech applications using deep neural networks (DNNs) has come at the expense of reduced model interpretability. For consumer applications this is not a problem; however, for health applications, clinicians must be able to interpret why a predictive model made the decision that it did. In this paper, we propose an interpretable model for objective assessment of dysarthric speech for speech therapy applications based on DNNs. Our model aims to predict a general impression of the severity of the speech disorder; however, instead of directly generating a severity prediction from a highdimensional input acoustic feature space, we add an intermediate interpretable layer that acts as a bottle-neck feature extractor and constrains the solution space of the DNNs. During inference, the model provides an estimate of severity at the output of the network and a set of explanatory features from the intermediate layer of the network that explain the final decision. We evaluate the performance of the model on a dysarthric speech dataset and show that the proposed model provides an interpretable output that is highly correlated with the subjective evaluation of Speech-Language Pathologists (SLPs).

AB - Improved performance in speech applications using deep neural networks (DNNs) has come at the expense of reduced model interpretability. For consumer applications this is not a problem; however, for health applications, clinicians must be able to interpret why a predictive model made the decision that it did. In this paper, we propose an interpretable model for objective assessment of dysarthric speech for speech therapy applications based on DNNs. Our model aims to predict a general impression of the severity of the speech disorder; however, instead of directly generating a severity prediction from a highdimensional input acoustic feature space, we add an intermediate interpretable layer that acts as a bottle-neck feature extractor and constrains the solution space of the DNNs. During inference, the model provides an estimate of severity at the output of the network and a set of explanatory features from the intermediate layer of the network that explain the final decision. We evaluate the performance of the model on a dysarthric speech dataset and show that the proposed model provides an interpretable output that is highly correlated with the subjective evaluation of Speech-Language Pathologists (SLPs).

KW - Deep neural networks

KW - Dysarthric speech

KW - Model interpretability

KW - Objective assessment

UR - http://www.scopus.com/inward/record.url?scp=85039159944&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85039159944&partnerID=8YFLogxK

U2 - 10.21437/Interspeech.2017-1222

DO - 10.21437/Interspeech.2017-1222

M3 - Conference article

AN - SCOPUS:85039159944

SN - 2308-457X

VL - 2017-August

SP - 1849

EP - 1853

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

T2 - 18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017

Y2 - 20 August 2017 through 24 August 2017

ER -

Interpretable objective assessment of dysarthric speech based on deep neural networks

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this