TY - GEN
T1 - Deep Learning Based Prediction of Hypernasality for Clinical Applications
AU - Mathad, Vikram C.
AU - Chapman, Kathy
AU - Liss, Julie
AU - Scherer, Nancy
AU - Berisha, Visar
N1 - Funding Information:
This work was funded in part by NIH grant NIDCR DE026252.
Publisher Copyright:
© 2020 IEEE.
PY - 2020/5
Y1 - 2020/5
N2 - Hypernasality refers to the perception of excessive nasal resonance during the production of oral sounds. Existing methods for automatic assessment of hypernasality from speech are based on machine learning models trained on disordered speech databases rated by speech-language pathologists. However, the performance of such systems critically depends on the availability of hypernasal speech samples and the reliability of clinical ratings. In this paper, we propose a new approach that uses the speech samples from healthy controls to model the acoustic characteristics of nasalized speech. Using healthy speech samples, we develop a 4-class deep neural network classifier for the classification of nasal consonants, oral consonants, nasalized vowels, and oral vowels. We use the classifier to compute nasalization scores for clinical speech samples and show that the resulting scores correlate with clinical perception of hypernasality. The proposed approach is evaluated on the speech samples of speakers with dysarthria and cleft lip and palate speakers.
AB - Hypernasality refers to the perception of excessive nasal resonance during the production of oral sounds. Existing methods for automatic assessment of hypernasality from speech are based on machine learning models trained on disordered speech databases rated by speech-language pathologists. However, the performance of such systems critically depends on the availability of hypernasal speech samples and the reliability of clinical ratings. In this paper, we propose a new approach that uses the speech samples from healthy controls to model the acoustic characteristics of nasalized speech. Using healthy speech samples, we develop a 4-class deep neural network classifier for the classification of nasal consonants, oral consonants, nasalized vowels, and oral vowels. We use the classifier to compute nasalization scores for clinical speech samples and show that the resulting scores correlate with clinical perception of hypernasality. The proposed approach is evaluated on the speech samples of speakers with dysarthria and cleft lip and palate speakers.
KW - Cleft lip and palate
KW - deep neural network
KW - dysarthric speech
KW - hypernasality
KW - velopharyngeal dysfunction
UR - http://www.scopus.com/inward/record.url?scp=85089224450&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85089224450&partnerID=8YFLogxK
U2 - 10.1109/ICASSP40776.2020.9054041
DO - 10.1109/ICASSP40776.2020.9054041
M3 - Conference contribution
AN - SCOPUS:85089224450
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 6554
EP - 6558
BT - 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020
Y2 - 4 May 2020 through 8 May 2020
ER -