Large language models; Reinforcement learning; Explainable Reinforcement learning; Monte Carlo Tree Search; LLM as a judge
Abstract :
[en] Understanding the decision-making of reinforcement learning (RL) agents is essential for real-world deployment. Existing eXplainable RL (XRL) techniques, such as feature attribution and policy visualization, provide insight but remain inaccessible to non-experts. Large Language Models (LLMs) offer a natural-language alternative, yet often lack logical consistency and alignment with agent goals. This study benchmarks three explanation generation methods: Chain-of-Thought (CoT) prompting as the standard baseline used in prior work, Monte Carlo Tree Search (MCTS) augmentation, and supervised fine-tuning (SFT) across various models. Evaluations using Soundness and Fidelity show that CoT frequently produces reasoning errors, whereas MCTS improves quality for larger models (avg. +23% Soundness, +17% Fidelity), while SFT yields greater and more consistent gains for smaller ones (+58% Soundness, +52% Fidelity), underscoring the need to align methods with model capacity. An LLM-as-a-Judge framework further validates these findings, showing strong agreement with human assessments (weighted Cohen’s 𝜅 = 0.77, Spearman 𝜌 = 0.88), supporting scalable and reliable assessment of textual explanations.
Research center :
Interdisciplinary Centre for Security, Reliability and Trust (SnT) > SerVal - Security, Reasoning & Validation
Disciplines :
Computer science
Author, co-author :
BELOUADAH, Ayoub ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SerVal
RUIZ RODRIGUEZ, Marcelo Luis ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SerVal
KUBLER, Sylvain ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SerVal
Le Traon, Yves
External co-authors :
no
Language :
English
Title :
Evaluating the effectiveness of LLMs for explainable deep reinforcement learning
FNR16756339 - UPTIME4.0 - Robust Predictive Maintenance For Industry 4.0, 2021 (01/06/2022-31/05/2025) - Yves Le Traon
Funders :
National Research Fund
Funding number :
16756339
Funding text :
This research was funded in whole or in part by the Luxembourg National Research Fund (FNR) , grant reference 16756339. For the purpose of open access, the author has applied a Creative Commons Attribution 4.0 International (CC BY 4.0) license to any Author Accepted Manuscript version arising from this submission.