Reference : Metaheuristics-based Exploration Strategies for Multi-Objective Reinforcement Learning
Scientific congresses, symposiums and conference proceedings : Paper published in a book
Engineering, computing & technology : Computer science
http://hdl.handle.net/10993/50373
Metaheuristics-based Exploration Strategies for Multi-Objective Reinforcement Learning
English
Felten, Florian mailto [University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > PCOG]
Danoy, Grégoire mailto [University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS) > ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > PCOG]
Talbi, El-Ghazali mailto [University of Lille, CNRS/CRIStAL, Inria Lille, France]
Bouvry, Pascal mailto [University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS) > ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > PCOG]
2022
Proceedings of the 14th International Conference on Agents and Artificial Intelligence
SCITEPRESS - Science and Technology Publications
662--673
Yes
No
International
978-989-758-547-0
Online Streaming
14th International Conference on Agents and Artificial Intelligence
from 3-02-2022 to 5-02-2022
[en] Reinforcement Learning ; Multi-objective ; Metaheuristics ; Pareto Sets
[en] The fields of Reinforcement Learning (RL) and Optimization aim at finding an optimal solution to a problem, characterized by an objective function. The exploration-exploitation dilemma (EED) is a well known subject in those fields. Indeed, a consequent amount of literature has already been proposed on the subject and shown it is a non-negligible topic to consider to achieve good performances. Yet, many problems in real life involve the optimization of multiple objectives. Multi-Policy Multi-Objective Reinforcement Learning (MPMORL) offers a way to learn various optimised behaviours for the agent in such problems. This work introduces a modular framework for the learning phase of such algorithms, allowing to ease the study of the EED in Inner- Loop MPMORL algorithms. We present three new exploration strategies inspired from the metaheuristics domain. To assess the performance of our methods on various environments, we use a classical benchmark - the Deep Sea Treasure (DST) - as well as propose a harder version of it. Our experiments show all of the proposed strategies outperform the current state-of-the-art ε-greedy based methods on the studied benchmarks.
Interdisciplinary Centre for Security, Reliability and Trust (SnT) > Parallel Computing & Optimization Group (PCOG)
Fonds National de la Recherche - FnR
Researchers ; Professionals ; Students ; General public
http://hdl.handle.net/10993/50373
10.5220/0010989100003116
https://www.scitepress.org/DigitalLibrary/Link.aspx?doi=10.5220/0010989100003116
FnR ; FNR14762457 > Gregoire Danoy > ADARS > Automating The Design Of Autonomous Robot Swarms > 01/05/2021 > 30/04/2024 > 2020

File(s) associated to this reference

Fulltext file(s):

FileCommentaryVersionSizeAccess
Limited access
109891.pdfPublisher postprint930.73 kBRequest a copy

Bookmark and Share SFX Query

All documents in ORBilu are protected by a user license.