Paper published in a book (Scientific congresses, symposiums and conference proceedings)
ArabicSense: A Benchmark for Evaluating Commonsense Reasoning in Arabic with Large Language Models
LAMSIYAH, Salima; Zeinalipour, Kamyar; El amrany, Samir et al.
2025In ArabicSense: A Benchmark for Evaluating Commonsense Reasoning in Arabic with Large Language Models
Peer reviewed Dataset
 

Files


Full Text
2025.wacl-1.1.pdf
Author postprint (443.66 kB)
Download

All documents in ORBilu are protected by a user license.

Send to



Details



Abstract :
[en] Recent efforts in natural language processing (NLP) commonsense reasoning research have led to the development of numerous new datasets and benchmarks. However, these resources have predominantly been limited to English, leaving a gap in evaluating commonsense reasoning in other languages. In this paper, we introduce the ArabicSense Benchmark, which is designed to thoroughly evaluate the world-knowledge commonsense reasoning abilities of large language models (LLMs) in Arabic. This benchmark includes three main tasks: first, it tests whether a system can distinguish between natural language statements that make sense and those that do not; second, it requires a system to identify the most crucial reason why a nonsensical statement fails to make sense; and third, it involves generating explanations for why statements do not make sense. We evaluate several Arabic BERT-based models and causal LLMs on these tasks. Experimental results demonstrate improvements after fine-tuning on our dataset. For instance, AraBERT v2 achieved an 87% F1 score on the second task, while Gemma and Mistral-7b achieved F1 scores of 95.5% and 94.8%, respectively. For the generation task, LLaMA-3 achieved the best performance with a BERTScore F1 of 77.3%, closely followed by Mistral-7b at 77.1%. All codes and the benchmark will be made publicly available at https://github.com/.
Disciplines :
Computer science
Author, co-author :
LAMSIYAH, Salima  ;  University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)
Zeinalipour, Kamyar
El amrany, Samir;  Unilu - University of Luxembourg > Computer Science
BRUST, Matthias ;  University of Luxembourg > Faculty of Science, Technology and Medicine > Department of Computer Science > Team Pascal BOUVRY
Maggini, Marco
BOUVRY, Pascal ;  University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)
SCHOMMER, Christoph  ;  University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)
External co-authors :
yes
Language :
English
Title :
ArabicSense: A Benchmark for Evaluating Commonsense Reasoning in Arabic with Large Language Models
Publication date :
January 2025
Event name :
The 31st International Conference on Computational Linguistics
Event place :
Abu Dhabi, United Arab Emirates
Event date :
January 19 – 24, 2025
By request :
Yes
Audience :
International
Main work title :
ArabicSense: A Benchmark for Evaluating Commonsense Reasoning in Arabic with Large Language Models
Publisher :
ACL Anthology, Abu Dhabi, United Arab Emirates
Peer reviewed :
Peer reviewed
Available on ORBilu :
since 12 March 2025

Statistics


Number of views
107 (3 by Unilu)
Number of downloads
31 (1 by Unilu)

Scopus citations®
 
1
Scopus citations®
without self-citations
1

Bibliography


Similar publications



Contact ORBilu