Paper published in a book (Scientific congresses, symposiums and conference proceedings)
Empirical Evaluation of Pre-trained Language Models for Summarizing Moroccan Darija News Articles
Aftiss, Azzedine; LAMSIYAH, Salima; SCHOMMER, Christoph et al.
2025In Empirical Evaluation of Pre-trained Language Models for Summarizing Moroccan Darija News Articles
Peer reviewed Dataset
 

Files


Full Text
2025.wacl-1.9.pdf
Author postprint (395.14 kB)
Download

All documents in ORBilu are protected by a user license.

Send to



Details



Abstract :
[en] Moroccan Dialect (MD), or \textquotedblleftDarija,\textquotedblright is a primary spoken variant of Arabic in Morocco, yet remains underrepresented in Natural Language Processing (NLP) research, particularly in tasks like summarization. Despite a growing volume of MD textual data online, there is a lack of robust resources and NLP models tailored to handle the unique linguistic challenges posed by MD. In response, we introduce .MA_v2, an expanded version of the GOUD.MA dataset, containing over 50k articles with their titles across 11 categories. This dataset provides a more comprehensive resource for developing summarization models. We evaluate the application of large language models (LLMs) for MD summarization, utilizing both fine-tuning and zero-shot prompting with encoder-decoder and causal LLMs, respectively. Our findings demonstrate that an expanded dataset improves summarization performance and highlights the capabilities of recent LLMs in handling MD text. We open-source our dataset, fine-tuned models, and all experimental code, establishing a foundation for future advancements in MD NLP. We release the code at https://github.com/AzzedineAftiss/Moroccan-Dialect-Summarization.
Disciplines :
Computer science
Author, co-author :
Aftiss, Azzedine
LAMSIYAH, Salima  ;  University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)
SCHOMMER, Christoph  ;  University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)
El Alaoui, Said Ouatik
External co-authors :
yes
Language :
English
Title :
Empirical Evaluation of Pre-trained Language Models for Summarizing Moroccan Darija News Articles
Publication date :
January 2025
Event name :
The 31st International Conference on Computational Linguistics
Event place :
Abu Dhabi, United Arab Emirates
Event date :
January 19 – 24, 2025
By request :
Yes
Audience :
International
Main work title :
Empirical Evaluation of Pre-trained Language Models for Summarizing Moroccan Darija News Articles
Publisher :
ACL Anthology
Peer reviewed :
Peer reviewed
Available on ORBilu :
since 12 March 2025

Statistics


Number of views
105 (1 by Unilu)
Number of downloads
38 (0 by Unilu)

Scopus citations®
 
0
Scopus citations®
without self-citations
0

Bibliography


Similar publications



Contact ORBilu