TY - JOUR
T1 - An attention model for hypernasality prediction in children with cleft palate
AU - Mathad, Vikram C.
AU - Scherer, Nancy
AU - Chapman, Kathy
AU - Liss, Julie
AU - Berisha, Visar
N1 - Funding Information:
This work was funded in part by NIH-NIDCR grant DE026252 and NIH-NIDCD grant R01DC006859.
Funding Information:
This work was funded in part by NIH-NIDCR grant DE026252 and NIHNIDCD grant R01DC006859.
Publisher Copyright:
© 2021 IEEE
PY - 2021
Y1 - 2021
N2 - Hypernasality refers to the perception of abnormal nasal resonances in vowels and voiced consonants. Estimation of hypernasality severity from connected speech samples involves learning a mapping between the frame-level features and utterance-level clinical ratings of hypernasality. However, not all speech frames contribute equally to the perception of hypernasality. In this work, we propose an attention-based bidirectional long-short memory (BLSTM) model that directly maps the frame-level features to utterance-level ratings by focusing only on specific speech frames carrying hypernasal cues. The models performance is evaluated on the Americleft database containing speech samples of children with cleft palate and clinical ratings of hypernasality. We analyzed the attention weights over broad phonetic categories and found that the model yields results consistent with what is known in the speech science literature. Further, the correlation between the predicted and perceptual rating is found to be significant (r = 0.684, p < 0.001) and better than conventional BLSTMs trained using frame-wise and last-frame approaches.
AB - Hypernasality refers to the perception of abnormal nasal resonances in vowels and voiced consonants. Estimation of hypernasality severity from connected speech samples involves learning a mapping between the frame-level features and utterance-level clinical ratings of hypernasality. However, not all speech frames contribute equally to the perception of hypernasality. In this work, we propose an attention-based bidirectional long-short memory (BLSTM) model that directly maps the frame-level features to utterance-level ratings by focusing only on specific speech frames carrying hypernasal cues. The models performance is evaluated on the Americleft database containing speech samples of children with cleft palate and clinical ratings of hypernasality. We analyzed the attention weights over broad phonetic categories and found that the model yields results consistent with what is known in the speech science literature. Further, the correlation between the predicted and perceptual rating is found to be significant (r = 0.684, p < 0.001) and better than conventional BLSTMs trained using frame-wise and last-frame approaches.
KW - Attention
KW - Cleft palate
KW - Hypernasality
KW - Recurrent neural networks
UR - http://www.scopus.com/inward/record.url?scp=85115061564&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85115061564&partnerID=8YFLogxK
U2 - 10.1109/ICASSP39728.2021.9414860
DO - 10.1109/ICASSP39728.2021.9414860
M3 - Conference article
AN - SCOPUS:85115061564
SN - 1520-6149
VL - 2021-June
SP - 7248
EP - 7252
JO - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
JF - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
T2 - 2021 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2021
Y2 - 6 June 2021 through 11 June 2021
ER -