Paper published on a website (Scientific congresses, symposiums and conference proceedings)
Reddit-V: A Virality Prediction Dataset and Zero-Shot Evaluation with Large Language Models
EL-AMRANY, Samir; R. Brust, Matthias; LAMSIYAH, Salima et al.
2025Recent Advances in Natural Language Processing
Peer reviewed
 

Files


Full Text
2025.ranlp-1.41.pdf
Publisher postprint (3.25 MB)
Download

All documents in ORBilu are protected by a user license.

Send to



Details



Keywords :
Reddit-V; virality prediction; pre-engagement metadata; zero-shot LLM evaluation; multimodal fine-tuning; AI; Social media
Abstract :
[en] We present Reddit-V, a new dataset designed to advance research on social media virality prediction in natural language processing. The dataset consists of over 27,000 Reddit posts, each enriched with images, textual content, and pre-engagement metadata such as post titles, categories, sentiment scores, and posting times. As an initial benchmark, we evaluate several instruction-tuned large language models (LLMs) in a zero-shot setting, prompting them with post titles and metadata to predict post virality. We then fine-tune two multimodal models, CLIP and IDEFICS, to assess whether incorporating visual context enhances predictive performance. Our results show that zero-shot LLMs perform poorly, whereas the fine-tuned multimodal models achieve better performance. Specifically, CLIP outperforms the best-performing zero-shot LLM (CodeLLaMA) by 3%, while IDEFICS achieves an 7% improvement over the same baseline, highlighting the importance of visual features in virality prediction. We release the Reddit-V dataset and our evaluation results to facilitate further research on multimodal and text-based virality prediction. Our dataset and code will be made publicly available on Github
Disciplines :
Computer science
Author, co-author :
EL-AMRANY, Samir  ;  University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)
R. Brust, Matthias;  University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)
LAMSIYAH, Salima  ;  University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)
BOUVRY, Pascal ;  University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)
External co-authors :
no
Language :
English
Title :
Reddit-V: A Virality Prediction Dataset and Zero-Shot Evaluation with Large Language Models
Publication date :
08 September 2025
Event name :
Recent Advances in Natural Language Processing
Event place :
Varna, Bulgaria
Event date :
2025
Audience :
International
Peer reviewed :
Peer reviewed
Source :
Available on ORBilu :
since 26 March 2026

Statistics


Number of views
27 (1 by Unilu)
Number of downloads
8 (0 by Unilu)

OpenCitations
 
0
OpenAlex citations
 
0

Bibliography


Similar publications



Contact ORBilu