References of "Walter, Oliver"
     in
Bookmark and Share    
Full Text
Peer Reviewed
See detailMachine learning techniques for semantic analysis of dysarthric speech: An experimental study
Despotovic, Vladimir UL; Walter, Oliver; Haeb-Umbach, Reinhold

in Speech Communication (2018), 99

We present an experimental comparison of seven state-of-the-art machine learning algorithms for the task of semantic analysis of spoken input, with a special emphasis on applications for dysarthric speech ... [more ▼]

We present an experimental comparison of seven state-of-the-art machine learning algorithms for the task of semantic analysis of spoken input, with a special emphasis on applications for dysarthric speech. Dysarthria is a motor speech disorder, which is characterized by poor articulation of phonemes. In order to cater for these non- canonical phoneme realizations, we employed an unsupervised learning approach to estimate the acoustic models for speech recognition, which does not require a literal transcription of the training data. Even for the subsequent task of semantic analysis, only weak supervision is employed, whereby the training utterance is accompanied by a semantic label only, rather than a literal transcription. Results on two databases, one of them containing dysarthric speech, are presented showing that Markov logic networks and conditional random fields substantially outperform other machine learning approaches. Markov logic networks have proved to be espe- cially robust to recognition errors, which are caused by imprecise articulation in dysarthric speech. [less ▲]

Detailed reference viewed: 96 (9 UL)
Full Text
Peer Reviewed
See detailSemantic Analysis of Spoken Input Using Markov Logic Networks
Despotovic, Vladimir UL; Walter, Oliver; Haeb-Umbach, Reinhold

in Proceedings of the 16th Annual Conference of the International Speech Communication Association (INTERSPEECH 2015) (2015, September)

We present a semantic analysis technique for spoken input using Markov Logic Networks (MLNs). MLNs combine graphical models with first-order logic. They are particularly suitable for providing inference ... [more ▼]

We present a semantic analysis technique for spoken input using Markov Logic Networks (MLNs). MLNs combine graphical models with first-order logic. They are particularly suitable for providing inference in the presence of inconsistent and in- complete data, which are typical of an automatic speech recognizer’s (ASR) output in the presence of degraded speech. The target application is a speech interface to a home automation system to be operated by people with speech impairments, where the ASR output is particularly noisy. In order to cater for dysarthric speech with non-canonical phoneme realizations, acoustic representations of the input speech are learned in an unsupervised fashion. While training data transcripts are not required for the acoustic model training, the MLN training requires supervision, however, at a rather loose and abstract level. Results on two databases, one of them for dysarthric speech, show that MLN-based semantic analysis clearly outperforms baseline approaches employing non-negative matrix factorization, multinomial naive Bayes models, or support vector machines. [less ▲]

Detailed reference viewed: 53 (1 UL)
Full Text
Peer Reviewed
See detailAn Evaluation of Unsupervised Acoustic Model Training for a Dysarthric Speech Interface
Walter, Oliver; Despotovic, Vladimir UL; Haeb-Umbach, Reinhold et al

in Proceedings of the 15th Annual Conference of the International Speech Communication Association (INTERSPEECH 2014) (2014, September)

In this paper, we investigate unsupervised acoustic model training approaches for dysarthric-speech recognition. These models are first, frame-based Gaussian posteriorgrams, obtained from Vector ... [more ▼]

In this paper, we investigate unsupervised acoustic model training approaches for dysarthric-speech recognition. These models are first, frame-based Gaussian posteriorgrams, obtained from Vector Quantization (VQ), second, so-called Acoustic Unit Descriptors (AUDs), which are hidden Markov models of phone-like units, that are trained in an unsupervised fashion, and, third, posteriorgrams computed on the AUDs. Experiments were carried out on a database collected from a home automation task and containing nine speakers, of which seven are considered to utter dysarthric speech. All unsupervised modeling approaches delivered significantly better recognition rates than a speaker-independent phoneme recognition baseline, showing the suitability of unsupervised acoustic model training for dysarthric speech. While the AUD models led to the most compact representation of an utterance for the subsequent semantic inference stage, posteriorgram-based representations resulted in higher recognition rates, with the Gaussian posteriorgram achieving the highest slot filling F-score of 97.02%. [less ▲]

Detailed reference viewed: 44 (2 UL)