Paper published in a journal (Scientific congresses, symposiums and conference proceedings)
Text Generation Models for Luxembourgish with Limited Data: A Balanced Multilingual Strategy
PLUM, Alistair; Ranasinghe, Tharindu; PURSCHKE, Christoph
2025In International Conference on Computational Linguistics (COLING), p. 93–104
Peer reviewed
 

Files


Full Text
2025.vardial-1.7.pdf
Publisher postprint (270.46 kB) Creative Commons License - Attribution
Download

All documents in ORBilu are protected by a user license.

Send to



Details



Keywords :
CuCo Lab
Abstract :
[en] This paper addresses the challenges in developing language models for less-represented languages, with a focus on Luxembourgish. Despite its active development, Luxembourgish faces a digital data scarcity, exacerbated by Luxembourg`s multilingual context. We propose a novel text generation model based on the T5 architecture, combining limited Luxembourgish data with equal amounts, in terms of size and type, of German and French data. We hypothesise that a model trained on Luxembourgish, German, and French will improve the model`s cross-lingual transfer learning capabilities and outperform monolingual and large multilingual models. To verify this, the study at hand explores whether multilingual or monolingual training is more beneficial for Luxembourgish language generation. For the evaluation, we introduce LuxGen, a text generation benchmark that is the first of its kind for Luxembourgish.
Disciplines :
Languages & linguistics
Computer science
Author, co-author :
PLUM, Alistair  ;  University of Luxembourg > Faculty of Humanities, Education and Social Sciences (FHSE) > Department of Humanities (DHUM) > Luxembourg Studies
Ranasinghe, Tharindu
PURSCHKE, Christoph  ;  University of Luxembourg > Faculty of Humanities, Education and Social Sciences (FHSE) > Department of Humanities (DHUM) > Luxembourg Studies
External co-authors :
yes
Language :
English
Title :
Text Generation Models for Luxembourgish with Limited Data: A Balanced Multilingual Strategy
Publication date :
January 2025
Event name :
VarDial @ COLING
Event date :
2025
Audience :
International
Journal title :
International Conference on Computational Linguistics (COLING)
Publisher :
Association for Computational Linguistics, Abu dhabi uae
Pages :
93–104
Peer reviewed :
Peer reviewed
Available on ORBilu :
since 29 January 2025

Statistics


Number of views
116 (9 by Unilu)
Number of downloads
43 (3 by Unilu)

Bibliography


Similar publications



Contact ORBilu