An Evaluation of Unsupervised Acoustic Model Training for a Dysarthric Speech Interface

Walter, Oliver; DESPOTOVIC, Vladimir; Haeb-Umbach, Reinhold; Gemmeke, Jort; Van hamme, Hugo; Ons, Bart

Paper published in a book (Scientific congresses, symposiums and conference proceedings)

Walter, Oliver; DESPOTOVIC, Vladimir; Haeb-Umbach, Reinhold et al.

2014 • In Proceedings of the 15th Annual Conference of the International Speech Communication Association (INTERSPEECH 2014)

Peer reviewed

Permalink
https://hdl.handle.net/10993/40878

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

INTERSPEECH 2014.pdf

Publisher postprint (143.26 kB)

Download

All documents in ORBilu are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

unsupervised learning; acoustic unit descriptors; dysarthric speech; non-negative matrix factorization

Abstract :

[en] In this paper, we investigate unsupervised acoustic model training approaches for dysarthric-speech recognition. These models are first, frame-based Gaussian posteriorgrams, obtained from Vector Quantization (VQ), second, so-called Acoustic Unit Descriptors (AUDs), which are hidden Markov models of phone-like units, that are trained in an unsupervised fashion, and, third, posteriorgrams computed on the AUDs. Experiments were carried out on a database collected from a home automation task and containing nine speakers, of which seven are considered to utter dysarthric speech. All unsupervised modeling approaches delivered significantly better recognition rates than a speaker-independent phoneme recognition baseline, showing the suitability of unsupervised acoustic model training for dysarthric speech. While the AUD models led to the most compact representation of an utterance for the subsequent semantic inference stage, posteriorgram-based representations resulted in higher recognition rates, with the Gaussian posteriorgram achieving the highest slot filling F-score of 97.02%.

Disciplines :

Computer science

Author, co-author :

Walter, Oliver; University of Paderborn > Department of Communications Engineering

DESPOTOVIC, Vladimir ; University of Paderborn > Department of Communications Engineering

Haeb-Umbach, Reinhold; University of Paderborn > Department of Communications Engineering

Gemmeke, Jort; Katholieke Universiteit Leuven - KUL > ESAT - PSI, Processing Speech and Images

Van hamme, Hugo; Katholieke Universiteit Leuven - KUL > ESAT - PSI, Processing Speech and Images

Ons, Bart; Katholieke Universiteit Leuven - KUL > ESAT - PSI, Processing Speech and Images

External co-authors :

yes

Language :

English

Title :

An Evaluation of Unsupervised Acoustic Model Training for a Dysarthric Speech Interface

Publication date :

September 2014

Event name :

15th Annual Conference of the International Speech Communication Association (INTERSPEECH 2014)

Event place :

Singapore, Singapore

Event date :

from 14-09-2014 to 18-09-2014

Audience :

International

Main work title :

Proceedings of the 15th Annual Conference of the International Speech Communication Association (INTERSPEECH 2014)

Pages :

1013-1017

Peer reviewed :

Peer reviewed

Additional URL :

https://www.isca-speech.org/archive/archive_papers/interspeech_2014/i14_1013.pdf

Funders :

DFG - Deutsche Forschungsgemeinschaft
IWT-SBO
CE - Commission Européenne

Available on ORBilu :

since 05 November 2019

Statistics

Number of views

228 (1 by Unilu)

Number of downloads

160 (0 by Unilu)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

Bibliography

J. Noyes and C. Frankish, "Speech recognition technology for individuals with disabilities, " Augmentative and Alternative Communication, vol. 8, no. 4, pp. 297-303, 1992.
M. S. Hawley, S. P. Cunningham, P. D. Green, P. Enderby, R. Palmer, S. Sehgal, and P. O'Neill, "A voice-input voice-output communication aid for people with severe speech impairment, " Neural Systems and Rehabilitation Engineering, IEEE Transactions on, vol. 21, no. 1, pp. 23-31, 2013.
M. S. Hawley, P. Enderby, P. Green, S. Cunningham, S. Brownsell, J. Carmichael, M. Parker, A. Hatzis, P. O'Neill, and R. Palmer, "A speech-controlled environmental control system for people with severe dysarthria, " Medical Engineering & Physics, vol. 29, no. 5, pp. 586-593, 2007.
K. Rosen and S. Yampolsky, "Automatic speech recognition and a review of its functioning with dysarthric speech, " Augmentative and Alternative Communication, vol. 16, no. 1, pp. 48-60, 2000.
H. Christensen, S. Cunningham, C. Fox, P. Green, and T. Hain, "A comparative study of adaptive, automatic recognition of disordered speech." in INTERSPEECH, 2012.
K. T. Mengistu and F. Rudzicz, "Comparing humans and automatic speech recognition systems in recognizing dysarthric speech, " in Advances in Artificial Intelligence. Springer, 2011, pp. 291-300.
M. Hasegawa-Johnson, J. Gunderson, A. Penman, and T. Huang, "HMM-based and SVM-based recognition of the speech of talkers with spastic dysarthria, " in Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on, vol. 3. IEEE, 2006, pp. III-III.
P. Green, J. Carmichael, A. Hatzis, P. Enderby, M. S. Hawley, and M. Parker, "Automatic speech recognition with sparse training data for dysarthric speakers." in INTERSPEECH, 2003.
H. V. Sharma and M. Hasegawa-Johnson, "State-transition interpolation and MAP adaptation for HMM-based dysarthric speech recognition, " in Proceedings of the NAACL HLT 2010 Workshop on Speech and Language Processing for Assistive Technologies. Association for Computational Linguistics, 2010, pp. 72-79.
F. Rudzicz, "Acoustic transformations to improve the intelligibility of dysarthric speech, " in Proceedings of the Second Workshop on Speech and Language Processing for Assistive Technologies. Association for Computational Linguistics, 2011, pp. 11-21.
E. Sanders, M. B. Ruiter, L. Beijer, and H. Strik, "Automatic recognition of dutch dysarthric speech: A pilot study." in INTERSPEECH, 2002.
W. K. Seong, J. H. Park, and H. K. Kim, "Multiple pronunciation lexical modeling based on phoneme confusion matrix for dysarthric speech recognition, " Advanced Science and Technology Letters, vol. 14, pp. 57-60, 2012.
K. T. Mengistu and F. Rudzicz, "Adapting acoustic and lexical models to dysarthric speech, " in Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on. IEEE, 2011, pp. 4924-4927.
S. O. Caballero-Morales and F. Trujillo-Romero, "Dynamic estimation of phoneme confusion patterns with a genetic algorithm to improve the performance of metamodels for recognition of disordered speech, " in Advances in Computational Intelligence. Springer, 2013, pp. 175-187.
J. F. Gemmeke, J. V. D. Loo, G. D. Pauw, J. Driesen, H. V. hamme, and W. Daelemans, "A self-learning assistive vocal interface based on vocabulary learning and grammar induction, " in Proc. INTERSPEECH, 2012, pp. 1-4.
J. F. Gemmeke, B. Ons, H. Van hamme, J. van de Loo, W. D. G. De Pauw, J. Huyghe, J. Derboven, L. Vugen, B. van Den Broeck, P. Karsmakers, and B. Vanrumste, "Self-taught assistive vocal interfaces : An overview of the ALADIN project, " in Proc. INTERSPEECH, 2013, pp. 1-5.
B. Ons, N. Tessema, J. van de Loo, J. Gemmeke, G. De Pauw, W. Daelemans, and H. Van hamme, "A Self Learning Vocal Interface for Speech-impaired Users, " in Proc. Workshop on Speech and Language Processing for Assistive Technologies (SLPAT), 2013.
Y. Zhang and J. Glass, "Unsupervised spoken keyword spotting via segmental DTW on Gaussian posteriorgrams, " in IEEE Workshop on Automatic Speech Recognition Understanding (ASRU), Nov 2009, pp. 398-403.
M. Siu, H. Gish, A. Chan, W. Belfield, and S. Lowe, "Unsupervised Training of an HMM-Based Self-Organising Unit Recognizer with Applications to Topic Classification and Keyword Discovery, " Comput. Speech Lang., vol. 28, no. 1, pp. 210-223, Jan. 2013.
C.-Y. Lee and J. Glass, "A Nonparametric Bayesian Approach to Acoustic Model Discovery, " in Proc. of 50th Annual Meeting of the ACL, Stroudsburg, PA, USA, 2012, pp. 40-49.
O. Walter, T. Korthals, R. Haeb-Umbach, and B. Raj, "A hierarchical system for word discovery exploiting DTW-based initialization, " in IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Olomouc, Czech Republic, Dec. 2013.
S. Chaudhuri and B. Raj, "Unsupervised Structure Discovery for Semantic Analysis of Audio, " in Advances in Neural Information Processing Systems 25, 2012, pp. 1187-1195.
J. V. D. Loo, G. D. Pauw, J. F. Gemmeke, P. Karsmakers, B. Van, D. Broeck, W. Daelemans, and H. V. hamme, "Towards shallow grammar induction for an adaptive assistive vocal interface: A concept tagging approach, " in Proc. NLP4ITA, 2012, pp. 27-34.
J. Gemmeke and H. Van hamme, "NMF-Based Keyword Learning from Scarce Data, " in IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Olomouc, Czech Republic, Dec. 2013.
H. Van hamme, "HAC-models: A Novel Approach to Continuous Speech Recognition, " in Proc. INTERSPEECH, 2008.
M. Sun and H. V. HAMME, "Coding Methods for the NMF Approach to Speech Recognition and Vocabulary Acquisition." Journal of Systemics, Cybernetics & Informatics, vol. 10, no. 6, 2012.
S. Chaudhuri, M. Harvilla, and B. Raj, "Unsupervised Learning of Acoustic Unit Descriptors for Audio Content Representation and Classification, " in Proc. INTERSPEECH, 2011, pp. 2265-2268.
D. Arthur and S. Vassilvitskii, "k-means++: The advantages of careful seeding, " in Proc. ACM-SIAM symposium on discrete algorithms, 2007, pp. 1027-1035.
J. Schmalenstroeer, M. Bartek, and R. Haeb-Umbach, "Unsupervised learning of acoustic events using dynamic time warping and hierarchical K-means++ clustering, " in Proc. INTERSPEECH, 2011.
M. E. J. Newman and M. Girvan, "Finding and evaluating community structure in networks, " Physical Review E, vol. 69, no. 2, Feb. 2004.
C. Middag, "Automatic analysis of pathological speech, " Ph.D. dissertation, Ghent University, Belgium, 2012.
J. Heymann, O. Walter, R. Haeb-Umbach, and B. Raj, "Iterative Bayesian Word Segmentation for Unsupervised Vocabulary Discovery from Phoneme Lattices, " in Proc. ICASSP, Florence, Italy, May 2014.