Reddit-V: A Virality Prediction Dataset and Zero-Shot Evaluation with Large Language Models

EL-AMRANY, Samir; R. Brust, Matthias; LAMSIYAH, Salima; BOUVRY, Pascal

doi:10.26615/978-954-452-098-4-041

Download

Paper published on a website (Scientific congresses, symposiums and conference proceedings)

Reddit-V: A Virality Prediction Dataset and Zero-Shot Evaluation with Large Language Models

EL-AMRANY, Samir; R. Brust, Matthias; LAMSIYAH, Salima et al.

2025 • Recent Advances in Natural Language Processing

Peer reviewed

Permalink
https://hdl.handle.net/10993/68082

DOI
10.26615/978-954-452-098-4-041

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

2025.ranlp-1.41.pdf

Publisher postprint (3.25 MB)

Download

All documents in ORBilu are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

Reddit-V; virality prediction; pre-engagement metadata; zero-shot LLM evaluation; multimodal fine-tuning; AI; Social media

Abstract :

[en] We present Reddit-V, a new dataset designed to advance research on social media virality prediction in natural language processing. The dataset consists of over 27,000 Reddit posts, each enriched with images, textual content, and pre-engagement metadata such as post titles, categories, sentiment scores, and posting times. As an initial benchmark, we evaluate several instruction-tuned large language models (LLMs) in a zero-shot setting, prompting them with post titles and metadata to predict post virality. We then fine-tune two multimodal models, CLIP and IDEFICS, to assess whether incorporating visual context enhances predictive performance. Our results show that zero-shot LLMs perform poorly, whereas the fine-tuned multimodal models achieve better performance. Specifically, CLIP outperforms the best-performing zero-shot LLM (CodeLLaMA) by 3%, while IDEFICS achieves an 7% improvement over the same baseline, highlighting the importance of visual features in virality prediction. We release the Reddit-V dataset and our evaluation results to facilitate further research on multimodal and text-based virality prediction. Our dataset and code will be made publicly available on Github

Disciplines :

Computer science

Author, co-author :

EL-AMRANY, Samir ; University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)

R. Brust, Matthias; University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)

LAMSIYAH, Salima ; University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)

BOUVRY, Pascal ; University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)

External co-authors :

Language :

English

Title :

Reddit-V: A Virality Prediction Dataset and Zero-Shot Evaluation with Large Language Models

Publication date :

08 September 2025

Event name :

Recent Advances in Natural Language Processing

Event place :

Varna, Bulgaria

Event date :

2025

Audience :

International

Peer reviewed :

Peer reviewed

Source :

ACL Anthology

Available on ORBilu :

since 26 March 2026

Statistics

Number of views

27 (1 by Unilu)

Number of downloads

8 (0 by Unilu)

More statistics

OpenCitations

OpenAlex citations