[en] We present a semantic analysis technique for spoken input using Markov Logic Networks (MLNs). MLNs combine graphical models with first-order logic. They are particularly suitable for providing inference in the presence of inconsistent and in- complete data, which are typical of an automatic speech recognizer’s (ASR) output in the presence of degraded speech. The target application is a speech interface to a home automation system to be operated by people with speech impairments, where the ASR output is particularly noisy. In order to cater for dysarthric speech with non-canonical phoneme realizations, acoustic representations of the input speech are learned in an unsupervised fashion. While training data transcripts are not required for the acoustic model training, the MLN training requires supervision, however, at a rather loose and abstract level. Results on two databases, one of them for dysarthric speech, show that MLN-based semantic analysis clearly outperforms baseline approaches employing non-negative matrix factorization, multinomial naive Bayes models, or support vector machines.
B. Coppola, A. Moschitti, and G. Riccardi, "Shallow semanticparsing for spoken language understand ing, " in Proceedings ofHuman Language Technologies: The 2009 Annual Conference ofthe North American Chapter of the Association for ComputationalLinguistics, Companion Volume: Short Papers, ser. NAACLShort'09, 2009, pp. 85-88.
Y.-Y. Wang, "Strategies for statistical spoken language understand ingwith small amount of data-an empirical study. " in INTERSPEECH2010, 11th Annual Conference of the InternationalSpeech Communication Association, Makuhari, Chiba, Japan, 2010, pp. 2498-2501.
R. D. Mori, F. Béchet, D. Hakkani-Tür, M. McTear, G. Riccardi, and G. Tur, "Spoken language understand ing-interpreting thesigns given by a speech signal, " IEEE Signal Processing Magazine, pp. 50-58, 2008.
E. Sand ers, M. B. Ruiter, L. Beijer, and H. Strik, "Automaticrecognition of dutch dysarthric speech: A pilot study, " in 7thInternational Conference on Spoken Language Processing, ICSLP2002-INTERSPEECH 2002, Denver, Colorado, USA, 2002.
J. F. Gemmeke, J. V. D. Loo, G. D. Pauw, J. Driesen, H. V. hamme, and W. Daelemans, "A self-learning assistive vocal interfacebased on vocabulary learning and grammar induction, " Proc. INTERSPEECH, 2012, pp. 1-4.
J. F. Gemmeke, B. Ons, H. Van hamme, J. van de Loo, W. D. G. De Pauw, J. Huyghe, J. Derboven, L. Vugen, B. van Den Broeck, P. Karsmakers, and B. Vanrumste, "Self-taught assistive vocal interfaces: An overview of the ALADIN project, " in Proc. INTERSPEECH, 2013, pp. 1-5.
O. Walter, V. Despotovic, R. Haeb-Umbach, J. Gemmeke, B. Ons, and H. Van hamme, "An evaluation of unsupervisedacoustic model training for a dysarthric speech interface, " inINTERSPEECH 2014, 2014. [Online]. Available: http: //nt. unipaderborn. de/public/pubs/2014/WaDeHaebGeOnVa14. pdf
Z. Chen, S. Tamang, A. Lee, X. Li, M. Passantino, and H. Ji, "Topdownand bottom-up: A combined approach to slot filling. " inAAIRS, ser. Lecture Notes in Computer Science, vol. 6458, 2010, pp. 300-309.
S. Riedel and I. Meza-Ruiz, "Collective semantic role labellingwith markov logic, " in Proceedings of the Twelfth Conferenceon Computational Natural Language Learning, ser. CoNLL '08, 2008, pp. 193-197.
I. Meza-Ruiz and S. Riedel, "Multilingual semantic role labellingwith markov logic, " in Proceedings of the Thirteenth Conferenceon Computational Natural Language Learning (CoNLL 2009): Shared Task, June 2009, pp. 85-90.
W. Che and T. Liu, "Jointly modeling wsd and srl with markovlogic, " in Proceedings of the 23rd International Conference onComputational Linguistics, ser. COLING '10, 2010, pp. 161-169.
M.-J. Meurs, F. Duvert, F. Lefevre, and R. D. Mori, "Markovlogic networks for spoken language interpretation, " InformationSystems Journal, pp. 535-544, 2008.
C. Kennington and D. Schlangen, "Markov logic networks for situatedincremental natural language understand ing. " in SIGDIALConference, 2012, pp. 314-323.
H. Poon and P. Domingos, "Unsupervised semantic parsing, " Proceedings of the 2009 Conference on Empirical Methods inNatural Language Processing: Volume 1, ser. EMNLP '09, 2009, pp. 1-10.
C. R. Kennington and D. Schlangen, "Situated incremental naturallanguage understand ing using markov logic networks, " ComputerSpeech and Language, vol. 28, no. 1, pp. 240-255, 2014.
T. Netsanet, O. Bart, van de Loo Janneke, G. Jort, D. P. Guy, D. Walter, and V. hamme Hugo, "Metadata for corpora patcor and domotica-2, " KU Leuven, Tech. Rep., 2013.
M. Richardson and P. Domingos, "Markov logic networks, "Mach. Learn., vol. 62, no. 1-2, pp. 107-136, 2006.
S. Kok, M. Sumner, M. Richardson, P. Singla, H. Poon, D. Lowd, J. Wang, and P. Domingos, "The alchemy system for statisticalrelational AI, " Department of Computer Science and Engineering, University of Washington, Seattle, WA., Tech. Rep., 2009.
H. Poon and P. Domingos, "Sound and efficient inference withprobabilistic and deterministic dependencies, " in Proc. of the 21stNational Conference on Artificial Intelligence (AAAI '06), Boston, Massachusetts, USA, 2006, pp. 458-463.
S. Chaudhuri and B. Raj, "Unsupervised structure discovery forsemantic analysis of audio, " in Advances in Neural InformationProcessing Systems 25: 26th Annual Conference on Neural InformationProcessing Systems 2012, Lake Tahoe, Nevada, UnitedStates., 2012, pp. 1187-1195.
S. Chaudhuri, M. Harvilla, and B. Raj, "Unsupervised learning ofacoustic unit descriptors for audio content representation and classification. "in INTERSPEECH 2011, 12th Annual Conference ofthe International Speech Communication Association, Florence, Italy, 2011, pp. 2265-2268.
M. Siu, H. Gish, A. Chan, W. Belfield, and S. Lowe, "UnsupervisedTraining of an HMM-Based Self-Organising Unit Recognizerwith Applications to Topic Classification and Keyword Discovery, "Comput. Speech Lang., vol. 28, no. 1, pp. 210-223, Jan. 2013.
O. Walter, V. Despotovic, R. Haeb-Umbach, J. Gemmeke, B. Ons, and H. Van hamme, "An evaluation of unsupervised acousticmodel training for a dysarthric speech interface, " in INTERSPEECH2014, 15th Annual Conference of the InternationalSpeech Communication Association, Singapore, 2014, pp. 1013-1017.
O. Walter, T. Korthals, R. Haeb-Umbach, and B. Raj, "HierarchicalSystem for Word Discovery Exploiting DTW-Based Initialization, "in Automatic Speech Recognition and Understand ingWorkshop (ASRU 2013), Dec. 2013, pp. 386-391.
J. F. Gemmeke, B. Ons, N. Tessema, H. V. hamme, J. van de Loo, G. D. Pauw, W. Daelemans, J. Huyghe, J. Derboven, L. Vuegen, B. V. D. Broeck, P. Karsmakers, and B. Vanrumste, "Self-taughtassistive vocal interfaces: An overview of the ALADIN project, "in INTERSPEECH 2013, 14th Annual Conference of the InternationalSpeech Communication Association, Lyon, France, 2013, pp. 2039-2043.
C. Middag, "Automatic analysis of pathological speech, " Ph. D. dissertation, Ghent University, Belgium, 2012.
B. Ons, N. Tessema, J. van de Loo, J. Gemmeke, G. D. Pauw, W. Daelemans, and H. V. hamme, "A self learning vocal interfacefor speech-impaired users, " in 4th Workshop on Speech and Language Processing for Assistive Technologies (SLPAT), Lyon, France, 2013, pp. 78-81.
A. M. Kibriya, E. Frank, B. Pfahringer, and G. Holmes, "Multinomialnaive bayes for text categorization revisited, " in Proceedingsof the 17th Australian Joint Conference on Advances in ArtificialIntelligence, Cairns, Australia. Berlin, Heidelberg: Springer-Verlag, 2004, pp. 488-499.
C. Cortes and V. Vapnik, "Support-vector network, " MachineLearning, vol. 20, pp. 273-297, 1995.
T. Joachims, "Text categorization with suport vector machines: Learning with many relevant features, " in Proceedings of the10th European Conference on Machine Learning, ser. ECML '98. London, UK, UK: Springer-Verlag, 1998, pp. 137-142.