Communication publiée dans un ouvrage (Colloques, congrès, conférences scientifiques et actes)
Comparing Pre-Training Schemes for Luxembourgish BERT Models
LOTHRITZ, Cedric; EZZINI, Saad; PURSCHKE, Christoph et al.
2023In Proceedings of the 19th Conference on Natural Language Processing (KONVENS 2023)
Peer reviewed
 

Documents


Texte intégral
Qualitative_Assessment_paper_KONVENS-2.pdf
Postprint Auteur (1.18 MB)
Télécharger

Tous les documents dans ORBilu sont protégés par une licence d'utilisation.

Envoyer vers



Détails



Mots-clés :
natural language processing; luxembourgish; NLP; BERT; pre-training; language model; computational linguistics; datasets; low-resource language; luxembert
Résumé :
[en] Despite the widespread use of pre-trained models in NLP, well-performing pre-trained models for low-resource languages are scarce. To address this issue, we propose two novel BERT models for the Luxembourgish language that improve on the state of the art. We also present an empirical study on both the performance and robustness of the investigated BERT models. We compare the models on a set of downstream NLP tasks and evaluate their robustness against different types of data perturbations. Additionally, we provide novel datasets to evaluate the performance of Luxembourgish language models. Our findings reveal that pre-training a pre-loaded model has a positive effect on both the performance and robustness of fine-tuned models and that using the German GottBERT model yields a higher performance while the multilingual mBERT results in a more robust model. This study provides valuable insights for researchers and practitioners working with low-resource languages and highlights the importance of considering pre-training strategies when building language models.
Centre de recherche :
Interdisciplinary Centre for Security, Reliability and Trust (SnT) > TruX - Trustworthy Software Engineering
Disciplines :
Sciences informatiques
Auteur, co-auteur :
LOTHRITZ, Cedric  ;  University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > TruX
EZZINI, Saad ;  University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SVV
PURSCHKE, Christoph  ;  University of Luxembourg > Faculty of Humanities, Education and Social Sciences (FHSE) > Department of Humanities (DHUM)
BISSYANDE, Tegawendé François D Assise  ;  University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > TruX
KLEIN, Jacques  ;  University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > TruX
Olariu, Isabella;  Zortify SA
Boytsov, Andrey;  BGL BNP Paribas
Lefebvre, Clement;  BGL BNP Paribas
Goujon, Anne;  BGL BNP Paribas
Co-auteurs externes :
no
Langue du document :
Anglais
Titre :
Comparing Pre-Training Schemes for Luxembourgish BERT Models
Date de publication/diffusion :
septembre 2023
Nom de la manifestation :
19th Conference on Natural Language Processing (KONVENS 2023)
Lieu de la manifestation :
Ingolstadt, Allemagne
Date de la manifestation :
from 18-09-2023 to 22-09-2023
Titre de l'ouvrage principal :
Proceedings of the 19th Conference on Natural Language Processing (KONVENS 2023)
Peer reviewed :
Peer reviewed
Focus Area :
Computational Sciences
Disponible sur ORBilu :
depuis le 13 août 2023

Statistiques


Nombre de vues
397 (dont 14 Unilu)
Nombre de téléchargements
181 (dont 14 Unilu)

citations Scopus®
 
2
citations Scopus®
sans auto-citations
0

Bibliographie


Publications similaires



Contacter ORBilu