Article (Scientific journals)
Evaluating the effectiveness of LLMs for explainable deep reinforcement learning
BELOUADAH, Ayoub; RUIZ RODRIGUEZ, Marcelo Luis; KUBLER, Sylvain et al.
2025In Machine Learning with Applications, 22, p. 100795
Peer reviewed
 

Files


Full Text
1-s2.0-S2666827025001781-main.pdf
Author postprint (2.09 MB) Creative Commons License - Attribution
Download

All documents in ORBilu are protected by a user license.

Send to



Details



Keywords :
Large language models; Reinforcement learning; Explainable Reinforcement learning; Monte Carlo Tree Search; LLM as a judge
Abstract :
[en] Understanding the decision-making of reinforcement learning (RL) agents is essential for real-world deployment. Existing eXplainable RL (XRL) techniques, such as feature attribution and policy visualization, provide insight but remain inaccessible to non-experts. Large Language Models (LLMs) offer a natural-language alternative, yet often lack logical consistency and alignment with agent goals. This study benchmarks three explanation generation methods: Chain-of-Thought (CoT) prompting as the standard baseline used in prior work, Monte Carlo Tree Search (MCTS) augmentation, and supervised fine-tuning (SFT) across various models. Evaluations using Soundness and Fidelity show that CoT frequently produces reasoning errors, whereas MCTS improves quality for larger models (avg. +23% Soundness, +17% Fidelity), while SFT yields greater and more consistent gains for smaller ones (+58% Soundness, +52% Fidelity), underscoring the need to align methods with model capacity. An LLM-as-a-Judge framework further validates these findings, showing strong agreement with human assessments (weighted Cohen’s 𝜅 = 0.77, Spearman 𝜌 = 0.88), supporting scalable and reliable assessment of textual explanations.
Research center :
Interdisciplinary Centre for Security, Reliability and Trust (SnT) > SerVal - Security, Reasoning & Validation
Disciplines :
Computer science
Author, co-author :
BELOUADAH, Ayoub  ;  University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SerVal
RUIZ RODRIGUEZ, Marcelo Luis ;  University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SerVal
KUBLER, Sylvain ;  University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SerVal
Le Traon, Yves
External co-authors :
no
Language :
English
Title :
Evaluating the effectiveness of LLMs for explainable deep reinforcement learning
Publication date :
14 November 2025
Journal title :
Machine Learning with Applications
ISSN :
2666-8270
Publisher :
Elsevier BV
Volume :
22
Pages :
100795
Peer reviewed :
Peer reviewed
FnR Project :
FNR16756339 - UPTIME4.0 - Robust Predictive Maintenance For Industry 4.0, 2021 (01/06/2022-31/05/2025) - Yves Le Traon
Funders :
National Research Fund
Funding number :
16756339
Funding text :
This research was funded in whole or in part by the Luxembourg National Research Fund (FNR) , grant reference 16756339. For the purpose of open access, the author has applied a Creative Commons Attribution 4.0 International (CC BY 4.0) license to any Author Accepted Manuscript version arising from this submission.
Available on ORBilu :
since 04 December 2025

Statistics


Number of views
11 (7 by Unilu)
Number of downloads
11 (2 by Unilu)

OpenCitations
 
0
OpenAlex citations
 
0

Bibliography


Similar publications



Contact ORBilu