Metaheuristics-based Exploration Strategies for Multi-Objective Reinforcement Learning
English
Felten, Florian[University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > PCOG]
Danoy, Grégoire[University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS) > ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > PCOG]
Talbi, El-Ghazali[University of Lille, CNRS/CRIStAL, Inria Lille, France]
Bouvry, Pascal[University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS) > ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > PCOG]
2022
Proceedings of the 14th International Conference on Agents and Artificial Intelligence
SCITEPRESS - Science and Technology Publications
662--673
Yes
No
International
978-989-758-547-0
Online Streaming
14th International Conference on Agents and Artificial Intelligence
[en] The fields of Reinforcement Learning (RL) and Optimization aim at finding an optimal solution to a problem, characterized by an objective function. The exploration-exploitation dilemma (EED) is a well known subject in those fields. Indeed, a consequent amount of literature has already been proposed on the subject and shown it is a non-negligible topic to consider to achieve good performances. Yet, many problems in real life involve the optimization of multiple objectives. Multi-Policy Multi-Objective Reinforcement Learning (MPMORL) offers a way to learn various optimised behaviours for the agent in such problems. This work introduces a modular framework for the learning phase of such algorithms, allowing to ease the study of the EED in Inner- Loop MPMORL algorithms. We present three new exploration strategies inspired from the metaheuristics domain. To assess the performance of our methods on various environments, we use a classical benchmark - the Deep Sea Treasure (DST) - as well as propose a harder version of it. Our experiments show all of the proposed strategies outperform the current state-of-the-art ε-greedy based methods on the studied benchmarks.
Interdisciplinary Centre for Security, Reliability and Trust (SnT) > Parallel Computing & Optimization Group (PCOG)
Fonds National de la Recherche - FnR
Researchers ; Professionals ; Students ; General public