Abstract :
[en] Human communication in multilingual communities often leads to code-switching, where individuals seamlessly alternate between two or more languages in their daily interactions. While this phenomenon has been increasingly prevalent thanks to linguistic globalization, it presents challenges for Automatic Speech Recognition (ASR) systems since they are designed with the assumption of transcribing a single language at a time. In this work, we propose a simple yet unexplored approach to tackle this challenge by fine-tuning the Whisper pre-trained model jointly on language identification (LID) and transcription tasks through the introduction of an auxiliary LID loss term. Our results show significant improvements in transcription errors, ranging between 14 and 36 percentage points of difference. Ultimately, our work opens a new direction for research on code-switching speech, offering an opportunity to enhance current capabilities of conversational agents.
Publisher :
Association for Computing Machinery, New York, NY, United States
Funding text :
This work is supported by the Horizon 2020 FET program of the European Union through the ERA-NET Cofund funding (BANANA, grant CHIST-ERA-20-BCI-001) and Horizon Europe's European Innovation Council through the Pathfinder program (SYMBIOTIK, grant 101071147).
Scopus citations®
without self-citations
1