Reference : An Evaluation of Unsupervised Acoustic Model Training for a Dysarthric Speech Interface
Scientific congresses, symposiums and conference proceedings : Paper published in a book
Engineering, computing & technology : Computer science
http://hdl.handle.net/10993/40878
An Evaluation of Unsupervised Acoustic Model Training for a Dysarthric Speech Interface
English
Walter, Oliver [University of Paderborn > Department of Communications Engineering]
Despotovic, Vladimir mailto [University of Paderborn > Department of Communications Engineering]
Haeb-Umbach, Reinhold [University of Paderborn > Department of Communications Engineering]
Gemmeke, Jort [Katholieke Universiteit Leuven - KUL > ESAT - PSI, Processing Speech and Images]
Van hamme, Hugo [Katholieke Universiteit Leuven - KUL > ESAT - PSI, Processing Speech and Images]
Ons, Bart [Katholieke Universiteit Leuven - KUL > ESAT - PSI, Processing Speech and Images]
Sep-2014
Proceedings of the 15th Annual Conference of the International Speech Communication Association (INTERSPEECH 2014)
1013-1017
Yes
International
15th Annual Conference of the International Speech Communication Association (INTERSPEECH 2014)
from 14-09-2014 to 18-09-2014
Singapore
Singapore
[en] unsupervised learning ; acoustic unit descriptors ; dysarthric speech ; non-negative matrix factorization
[en] In this paper, we investigate unsupervised acoustic model training approaches for dysarthric-speech recognition. These models are first, frame-based Gaussian posteriorgrams, obtained from Vector Quantization (VQ), second, so-called Acoustic Unit Descriptors (AUDs), which are hidden Markov models of phone-like units, that are trained in an unsupervised fashion, and, third, posteriorgrams computed on the AUDs. Experiments were carried out on a database collected from a home automation task and containing nine speakers, of which seven are considered to utter dysarthric speech. All unsupervised modeling approaches delivered significantly better recognition rates than a speaker-independent phoneme recognition baseline, showing the suitability of unsupervised acoustic model training for dysarthric speech. While the AUD models led to the most compact representation of an utterance for the subsequent semantic inference stage, posteriorgram-based representations resulted in higher recognition rates, with the Gaussian posteriorgram achieving the highest slot filling F-score of 97.02%.
Deutsche Forschungsgemeinschaft ; IWT-SBO ; European Commission - EC
http://hdl.handle.net/10993/40878
https://www.isca-speech.org/archive/archive_papers/interspeech_2014/i14_1013.pdf

File(s) associated to this reference

Fulltext file(s):

FileCommentaryVersionSizeAccess
Open access
INTERSPEECH 2014.pdfPublisher postprint139.9 kBView/Open

Bookmark and Share SFX Query

All documents in ORBilu are protected by a user license.