Paper published in a book (Scientific congresses, symposiums and conference proceedings)
Letz Translate: Low-Resource Machine Translation for Luxembourgish
SONG, Yewei; EZZINI, Saad; KLEIN, Jacques et al.
2023In Proceedings - 2023 5th International Conference on Natural Language Processing, ICNLP 2023
Peer reviewed
 

Files


Full Text
2303.01347.pdf
Author preprint (201.79 kB)
Download

All documents in ORBilu are protected by a user license.

Send to



Details



Keywords :
Knowledge distillation; Low-resource Languages; Low-resource Translation; Luxembourgish; Neural Machine Translation; Language processing; Low resource languages; Low-resource translation; Machine translations; Natural languages; Practical solutions; Real problems; Resources environments; Artificial Intelligence; Computer Science Applications; Computer Vision and Pattern Recognition; Signal Processing
Abstract :
[en] Natural language processing of Low-Resource Languages (LRL) is often challenged by the lack of data. Therefore, achieving accurate machine translation (MT) in a low-resource environment is a real problem that requires practical solutions. Research in multilingual models have shown that some LRLs can be handled with such models. However, their large size and computational needs make their use in constrained environments (e.g., mobile/IoT devices or limited/old servers) impractical. In this paper, we address this problem by leveraging the power of large multilingual MT models using knowledge distillation. Knowledge distillation can transfer knowledge from a large and complex teacher model to a simpler and smaller student model without losing much in performance. We also make use of high-resource languages that are related or share the same linguistic root as the target LRL. For our evaluation, we consider Luxembourgish as the LRL that shares some roots and properties with German. We build multiple resource-efficient models based on German, knowledge distillation from the multilingual No Language Left Behind (NLLB) model, and pseudo-translation. We find that our efficient models are more than 30% faster and perform only 4% lower compared to the large state-of-the-art NLLB model.
Disciplines :
Computer science
Author, co-author :
SONG, Yewei  ;  University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > TruX
EZZINI, Saad ;  University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust > TruX > Team Jacques KLEIN
KLEIN, Jacques ;  University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > TruX
BISSYANDE, Tegawendé François d Assise  ;  University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > TruX
Lefebvre, Clément;  Banque Bgl Bnp Paribas, Luxembourg
Goujon, Anne;  Banque Bgl Bnp Paribas, Luxembourg
External co-authors :
yes
Language :
English
Title :
Letz Translate: Low-Resource Machine Translation for Luxembourgish
Publication date :
24 March 2023
Event name :
2023 5th International Conference on Natural Language Processing (ICNLP)
Event place :
Guangzhou, Chn
Event date :
24-03-2023 => 26-03-2023
Main work title :
Proceedings - 2023 5th International Conference on Natural Language Processing, ICNLP 2023
Publisher :
Institute of Electrical and Electronics Engineers Inc.
ISBN/EAN :
9798350302219
Peer reviewed :
Peer reviewed
Funders :
Guangdong University of Technology
Available on ORBilu :
since 25 November 2023

Statistics


Number of views
16 (3 by Unilu)
Number of downloads
7 (2 by Unilu)

Scopus citations®
 
0
Scopus citations®
without self-citations
0

Bibliography


Similar publications



Contact ORBilu