[en] Human communication in multilingual communities often leads to code-switching, where individuals seamlessly alternate between two or more languages in their daily interactions. While this phenomenon has been increasingly prevalent thanks to linguistic globalization, it presents challenges for Automatic Speech Recognition (ASR) systems since they are designed with the assumption of transcribing a single language at a time. In this work, we propose a simple yet unexplored approach to tackle this challenge by fine-tuning the Whisper pre-trained model jointly on language identification (LID) and transcription tasks through the introduction of an auxiliary LID loss term. Our results show significant improvements in transcription errors, ranging between 14 and 36 percentage points of difference. Ultimately, our work opens a new direction for research on code-switching speech, offering an opportunity to enhance current capabilities of conversational agents.
HE - 101071147 - SYMBIOTIK - Context-aware adaptive visualizations for critical decision making
Projet FnR :
FNR15722813 - Brainsourcing For Affective Attention Estimation, 2021 (01/02/2022-31/01/2025) - Luis Leiva
Organisme subsidiant :
Union Européenne
Subventionnement (détails) :
This work is supported by the Horizon 2020 FET program of the European Union through the ERA-NET Cofund funding (BANANA, grant CHIST-ERA-20-BCI-001) and Horizon Europe's European Innovation Council through the Pathfinder program (SYMBIOTIK, grant 101071147).
[n. d.] . OpenSLR Open Speech and Language Resources. https://www.openslr. org/. Accessed: 2024-02-11.
Emily Ahn, Cecilia Jimenez, Yulia Tsvetkov, and Alan Black. 2020. What codeswitching strategies are effective in dialogue systems' Society for Computation in Linguistics 3, 1 (2020).
Larissa Aronin and David Singleton. 2012. Multilingualism. Vol. 30. John Benjamins Publishing.
Alexei Baevski, Yuhao Zhou, Abdelrahman Mohamed, and Michael Auli. 2020. wav2vec 2.0: A framework for self-supervised learning of speech representations. Advances in Neural Information Processing Systems 33 (2020), 12449-12460.
Utsab Barman, Amitava Das, Joachim Wagner, and Jennifer Foster. 2014. Code-Mixing: A Challenge for Language Identification in the Language of Social Media. In Proceedings of the First Workshop on Computational Approaches to Code-Switching. 13-23.
Aisha Bhatti, Sarimah Shamsudin, and Seriaznita Hj Mat Said. 2018. Code-Switching: A Useful Foreign Language Teaching Tool in EFL Classrooms. English Language Teaching 11 (2018), 93-101.
Ellen Bialystok and Gigi Luk. 2012. Receptive vocabulary differences in monolingual and bilingual adults. Bilingualism: Language and Cognition 15, 2 (2012), 397-401.
Dan Biderman, Jose Gonzalez Ortiz, Jacob Portes, Mansheej Paul, Philip Greengard, Connor Jennings, Daniel King, Sam Havens, Vitaliy Chiley, Jonathan Frankle, Cody Blakeney, and John P. Cunningham. 2024. LoRA Learns Less and Forgets Less. arXiv:2405.09673 [cs.LG]
Barbara Bullock, Wally Guzmán, Jacqueline Serigos, Vivek Sharath, and Almeida Jacqueline Toribio. 2018. Predicting the presence of a Matrix Language in code-switching. In Proceedings of the Third Workshop on Computational Approaches to Linguistic Code-Switching, Gustavo Aguilar, Fahad AlGhamdi, Victor Soto, Thamar Solorio, Mona Diab, and Julia Hirschberg (Eds.). Association for Computational Linguistics, Melbourne, Australia, 68-75. https: //doi.org/10.18653/v1/W18-3208
Yunjae J Choi, Minha Lee, and Sangsu Lee. 2023. Toward a Multilingual Conversational Agent: Challenges and Expectations of Code-mixing Multilingual Users. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1-17.
Helin Cihan, YunhanWu, Paola Peña, Justin Edwards, and Benjamin Cowan. 2022. Bilingual by default: Voice Assistants and the role of code-switching in creating a bilingual user experience. In Proceedings of the 4th Conference on Conversational User Interfaces. 1-4.
Alexis Conneau, Min Ma, Simran Khanuja, Yu Zhang, Vera Axelrod, Siddharth Dalmia, Jason Riesa, Clara Rivera, and Ankur Bapna. 2022. FLEURS: Few-shot Learning Evaluation of Universal Representations of Speech. arXiv:2205.12446 [cs.CL]
Margaret Deuchar, Peredur Davies, Jon Russell Herring, M. Carmen Parafita Couto, and Diana Carter. 2014. Building Bilingual Corpora. Multilingual Matters, Bristol, Blue Ridge Summit, 93-110. https://doi.org/doi:10.21832/9781783091713-008
Kunal Dhawan, KDimating Rekesh, and Boris Ginsburg. 2023. Unified Model for Code-Switching Speech Recognition and Language Identification Based on Concatenated Tokenizer. In Proceedings of the 6th Workshop on Computational Approaches to Linguistic Code-Switching, Genta Winata, Sudipta Kar, Marina Zhukova, Thamar Solorio, Mona Diab, Sunayana Sitaram, Monojit Choudhury, and Kalika Bali (Eds.). Association for Computational Linguistics, Singapore, 74-82. https://doi.org/10.18653/v1/2023.calcs-1.7
Anuj Diwan, Rakesh Vaideeswaran, Sanket Shah, Ankita Singh, Srinivasa Raghavan, Shreya Khare, Vinit Unni, Saurabh Vyas, Akash Rajpuria, Chiranjeevi Yarra, Ashish Mittal, Prasanta Kumar Ghosh, Preethi Jyothi, Kalika Bali, Vivek Seshadri, Sunayana Sitaram, Samarth Bharadwaj, Jai Nanavati, Raoul Nanavati, and Karthik Sankaranarayanan. 2021. MUCS 2021: Multilingual and Code-Switching ASR Challenges for Low Resource Indian Languages. In Interspeech 2021. ISCA. https://doi.org/10.21437/interspeech.2021-1339
A Seza Dogruoz, Sunayana Sitaram, Barbara E Bullock, and Almeida Jacqueline Toribio. 2021. A Survey of Code-switching: Linguistic and Social Perspectives for Language Technologies. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, Online, 1654-1666. https://doi.org/10.18653/v1/2021.acllong. 131
Daniel Galvez, Greg Diamos, Juan Ciro, Juan Felipe Cerón, Keith Achorn, Anjali Gopi, David Kanter, Maximilian Lam, Mark Mazumder, and Vijay Janapa Reddi. 2021. The People's Speech: A Large-Scale Diverse English Speech Recognition Dataset for Commercial Usage. CoRR abs/2111.09344 (2021). arXiv:2111.09344 https://arxiv.org/abs/2111.09344
Penelope Gardner-Chloros. 2009. Code-Switching. Cambridge university press.
Abhinav Goyal and Nikesh Garera. 2023. Building Accurate Low Latency ASR for Streaming Voice Search in E-commerce. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track). 276-283.
François Grosjean. 2021. Life as a bilingual: Knowing and using two or more languages. Cambridge University Press.
Adriana Guevara-Rukoz, Isin Demirsahin, Fei He, Shan-Hui Cathy Chu, Supheakmungkol Sarin, Knot Pipatsrisawat, Alexander Gutkin, Alena Butryna, and Oddur Kjartansson. 2020. Crowdsourcing Latin American Spanish for Low-Resource Text-To-Speech. In Proceedings of The 12th Language Resources and Evaluation Conference (LREC). European Language Resources Association (ELRA), Marseille, France, 6504-6513. https://www.aclweb.org/anthology/2020.lrec-1.801
John J Gumperz. 1982. Discourse strategies. Number 1. Cambridge University Press.
Junxian He, Chunting Zhou, Xuezhe Ma, Taylor Berg-Kirkpatrick, and Graham Neubig. 2021. Towards a unified view of parameter-efficient transfer learning. arXiv preprint arXiv:2110.04366 (2021).
Carlos D. Hernandez-Mena. 2019. TEDx Spanish Corpus. Audio and transcripts in Spanish taken from the TEDx Talks; shared under the CC BY-NC-ND 4.0 license. Web Download.
Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. 2019. Parameter-efficient transfer learning for NLP. In International Conference on Machine Learning. PMLR, 2790-2799.
Ke Hu, Tara N. Sainath, Bo Li, Yu Zhang, Yong Cheng, Tao Wang, Yujing Zhang, and Frederick Liu. 2023. Improving Multilingual and Code-Switching ASR Using Large Language Model Generated Text. In 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). 1-7. https://doi.org/10.1109/ASRU57964. 2023.10389644
Iva Ivanova and Albert Costa. 2008. Does bilingualism hamper lexical access in speech production' Acta psychologica 127, 2 (2008), 277-288.
Robert A Jacobs, Michael I Jordan, Steven J Nowlan, and Geoffrey E Hinton. 1991. Adaptive mixtures of local experts. Neural computation 3, 1 (1991), 79-87.
Michael I Jordan and Robert A Jacobs. 1994. Hierarchical mixtures of experts and the EM algorithm. Neural computation 6, 2 (1994), 181-214.
Amanda Kann. 2022. Voice Assistants Have a Plurilingualism Problem. In Proceedings of the 4th Conference on Conversational User Interfaces. 1-5.
Evangelos Kazakos, Arsha Nagrani, Andrew Zisserman, and Dima Damen. 2021. Slow-fast auditory streams for audio recognition. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 855-859.
Diederik P. Kingma and Jimmy Ba. 2017. Adam: A Method for Stochastic Optimization. (2017). arXiv:1412.6980 [cs.LG]
Bo Li, Ruoming Pang, Tara N. Sainath, Anmol Gulati, Yu Zhang, James Qin, Parisa Haghani, W. Ronny Huang, Min Ma, and Junwen Bai. 2021. Scaling End-To-End Models for Large-Scale Multilingual ASR. (2021). arXiv:2104.14830 [cs.CL]
Hexin Liu, Xiangyu Zhang, Leibny Paola Garcia, Andy W. H. Khong, Eng Siong Chng, and Shinji Watanabe. 2024. Aligning Speech to Languages to Enhance Code-switching Speech Recognition. https://api.semanticscholar.org/CorpusID: 268351705
Yun Luo, Zhen Yang, Fandong Meng, Yafu Li, Jie Zhou, and Yue Zhang. 2023. An Empirical Study of Catastrophic Forgetting in Large Language Models During Continual Fine-Tuning. (2023). arXiv:2308.08747 [cs.CL]
Brian MacWhinney and Catherine Snow. 1990. The Child Language Data Exchange System: An update. Journal of Child Language 17, 2 (1990), 457-472. https://doi.org/10.1017/S0305000900013866
Carol Myers-Scotton. 1989. Codeswitching with English: Types of switching, types of communities. World Englishes 8, 3 (1989), 333-346.
Tanmay Parekh, Emily Ahn, Yulia Tsvetkov, and AlanWBlack. 2020. Understanding linguistic accommodation in code-switched human-machine dialogues. In Proceedings of the 24th Conference on Computational Natural Language Learning. 565-577.
Yifan Peng, Jinchuan Tian, Brian Yan, Dan Berrebbi, Xuankai Chang, Xinjian Li, Jiatong Shi, Siddhant Arora, William Chen, Roshan Sharma, Wangyou Zhang, Yui Sudo, Muhammad Shakeel, Jee weon Jung, Soumi Maiti, and ShinjiWatanabe. 2023. Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data. (2023). arXiv:2309.13876 [cs.CL]
Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine Mcleavey, and Ilya Sutskever. 2023. Robust Speech Recognition via Large-Scale Weak Supervision. In Proceedings of the 40th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 202), Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (Eds.). PMLR, 28492-28518. https://proceedings.mlr.press/v202/radford23a.html
Johan Schalkwyk and Ignacio Lopez Moreno. 2018. Teaching the Google Assistant to be multilingual. Google AI Blog (2018).
Thomas Scialom, Tuhin Chakrabarty, and Smaranda Muresan. 2022. Fine-Tuned language models are continual learners. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 6107-6122.
Silero Team. 2021. Silero VAD: pre-Trained enterprise-grade Voice Activity Detector (VAD), Number Detector and Language Classifier. https://github.com/snakers4/silero-vad.
Jorgen Valk and Tanel Alumae. 2020. VoxLingua107: A Dataset for Spoken Language Recognition. arXiv:2011.12998 [eess.AS]
Genta Indra Winata, Alham Fikri Aji, Zheng-Xin Yong, and Thamar Solorio. 2022. The Decades Progress on Code-Switching Research in NLP: A Systematic Survey on Trends and Challenges. (12 2022). https://arxiv.org/abs/2212.09660
YunhanWu, Daniel Rough, Anna Bleakley, Justin Edwards, Orla Cooney, Philip R Doyle, Leigh Clark, and Benjamin R Cowan. 2020. See what I'm saying' Comparing intelligent personal assistant use for native and non-native language speakers. In 22nd international conference on human-computer interaction with mobile devices and services. 1-9.
Emre Ylmaz, Mitchell McLaren, Henk van den Heuvel, and David A van Leeuwen. 2018. Semi-supervised acoustic model training for speech with code-switching. Speech Communication 105 (2018), 12-22.
Zheng Xin Yong, Ruochen Zhang, Jessica Forde, Skyler Wang, Arjun Subramonian, Holy Lovenia, Samuel Cahyawijaya, Genta Winata, Lintang Sutawika, Jan Christian Blaise Cruz, Yin Lin Tan, Long Phan, Long Phan, Rowena Garcia, Thamar Solorio, and Alham Aji. 2023. Prompting Multilingual Large Language Models to Generate Code-Mixed Texts: The Case of South East Asian Languages. In Proceedings of the 6th Workshop on Computational Approaches to Linguistic Code-Switching, Genta Winata, Sudipta Kar, Marina Zhukova, Thamar Solorio, Mona Diab, Sunayana Sitaram, Monojit Choudhury, and Kalika Bali (Eds.). Association for Computational Linguistics, Singapore, 43-63. https://doi.org/10.18653/v1/2023.calcs-1.5
Xianghu Yue, Grandee Lee, Emre Yilmaz, Fang Deng, and Haizhou Li. 2019. End-To-End Code-Switching ASR for Low-Resourced Language Pairs. (2019). arXiv:1909.12681 [cs.CL]
Elad Ben Zaken, Shauli Ravfogel, and Yoav Goldberg. 2021. Bitfit: Simple parameter-efficient fine-Tuning for transformer-based masked language-models. arXiv preprint arXiv:2106.10199 (2021).
Ruochen Zhang, Samuel Cahyawijaya, Jan Christian Blaise Cruz, Genta Winata, and Alham Aji. 2023. Multilingual Large Language Models Are Not (Yet) Code-Switchers. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational Linguistics, Singapore, 12567-12582. https: //doi.org/10.18653/v1/2023.emnlp-main.774