Paper published in a book (Scientific congresses, symposiums and conference proceedings)
LuxEmbedder: A Cross-Lingual Approach to Enhanced Luxembourgish Sentence Embeddings
PHILIPPY, Fred; Guo, Siwen; KLEIN, Jacques et al.
2025In Proceedings of the 31st International Conference on Computational Linguistics
Peer reviewed Dataset
 

Files


Full Text
LuxEmbedder__camera_ready_.pdf
Author preprint (1.94 MB) Creative Commons License - Attribution
Download

All documents in ORBilu are protected by a user license.

Send to



Details



Keywords :
NLP; Computational Linguistics; Sentence Embeddings; Luxembourgish
Abstract :
[en] Sentence embedding models play a key role in various Natural Language Processing tasks, such as in Topic Modeling, Document Clustering and Recommendation Systems. However, these models rely heavily on parallel data, which can be scarce for many low-resource languages, including Luxembourgish. This scarcity results in suboptimal performance of monolingual and cross-lingual sentence embedding models for these languages. To address this issue, we compile a relatively small but high-quality human-generated cross-lingual parallel dataset to train LuxEmbedder, an enhanced sentence embedding model for Luxembourgish with strong cross-lingual capabilities. Additionally, we present evidence suggesting that including low-resource languages in parallel training datasets can be more advantageous for other low-resource languages than relying solely on high-resource language pairs. Furthermore, recognizing the lack of sentence embedding benchmarks for low-resource languages, we create a paraphrase detection benchmark specifically for Luxembourgish, aiming to partially fill this gap and promote further research.
Disciplines :
Computer science
Author, co-author :
PHILIPPY, Fred  ;  University of Luxembourg
Guo, Siwen;  Zortify S.A.
KLEIN, Jacques  ;  University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > TruX
BISSYANDE, Tegawendé François d Assise  ;  University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > TruX
External co-authors :
no
Language :
English
Title :
LuxEmbedder: A Cross-Lingual Approach to Enhanced Luxembourgish Sentence Embeddings
Publication date :
January 2025
Event name :
International Conference on Computational Linguistics
Event organizer :
International Committee on Computational Linguistics (ICCL)
Event place :
Abu Dhabi, United Arab Emirates
Event date :
January 19 – 24, 2025
Audience :
International
Main work title :
Proceedings of the 31st International Conference on Computational Linguistics
Publisher :
International Committee on Computational Linguistics (ICCL)
Peer reviewed :
Peer reviewed
Focus Area :
Computational Sciences
Available on ORBilu :
since 18 December 2024

Statistics


Number of views
109 (10 by Unilu)
Number of downloads
43 (3 by Unilu)

Scopus citations®
 
0
Scopus citations®
without self-citations
0

Bibliography


Similar publications



Contact ORBilu