Metaheuristics-based Exploration Strategies for Multi-Objective Reinforcement Learning

FELTEN, Florian; Danoy, Grégoire; TALBI, El-Ghazali; BOUVRY, Pascal

doi:10.5220/0010989100003116

Demander un accès

Communication publiée dans un ouvrage (Colloques, congrès, conférences scientifiques et actes)

Metaheuristics-based Exploration Strategies for Multi-Objective Reinforcement Learning

FELTEN, Florian; Danoy, Grégoire; TALBI, El-Ghazali et al.

2022 • In Proceedings of the 14th International Conference on Agents and Artificial Intelligence

Peer reviewed

Permalien
https://hdl.handle.net/10993/50373

DOI
10.5220/0010989100003116

Documents (1)Envoyer vers Détails Statistiques Bibliographie Publications similaires

Documents

Texte intégral

109891.pdf

Postprint Éditeur (953.07 kB)

Demander un accès

Tous les documents dans ORBilu sont protégés par une licence d'utilisation.

Envoyer vers

RIS BibTex APA Chicago Permalink X Linkedin

Détails

Mots-clés :

Reinforcement Learning; Multi-objective; Metaheuristics; Pareto Sets

Résumé :

[en] The fields of Reinforcement Learning (RL) and Optimization aim at finding an optimal solution to a problem, characterized by an objective function. The exploration-exploitation dilemma (EED) is a well known subject in those fields. Indeed, a consequent amount of literature has already been proposed on the subject and shown it is a non-negligible topic to consider to achieve good performances. Yet, many problems in real life involve the optimization of multiple objectives. Multi-Policy Multi-Objective Reinforcement Learning (MPMORL) offers a way to learn various optimised behaviours for the agent in such problems. This work introduces a modular framework for the learning phase of such algorithms, allowing to ease the study of the EED in Inner- Loop MPMORL algorithms. We present three new exploration strategies inspired from the metaheuristics domain. To assess the performance of our methods on various environments, we use a classical benchmark - the Deep Sea Treasure (DST) - as well as propose a harder version of it. Our experiments show all of the proposed strategies outperform the current state-of-the-art ε-greedy based methods on the studied benchmarks.

Centre de recherche :

Interdisciplinary Centre for Security, Reliability and Trust (SnT) > Parallel Computing & Optimization Group (PCOG)

Disciplines :

Sciences informatiques

Auteur, co-auteur :

FELTEN, Florian ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > PCOG

Danoy, Grégoire; University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS) ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > PCOG

TALBI, El-Ghazali ; University of Lille, CNRS/CRIStAL, Inria Lille, France

BOUVRY, Pascal ; University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS) ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > PCOG

Co-auteurs externes :

yes

Langue du document :

Anglais

Titre :

Metaheuristics-based Exploration Strategies for Multi-Objective Reinforcement Learning

Date de publication/diffusion :

2022

Nom de la manifestation :

14th International Conference on Agents and Artificial Intelligence

Date de la manifestation :

from 3-02-2022 to 5-02-2022

Manifestation à portée :

International

Titre de l'ouvrage principal :

Proceedings of the 14th International Conference on Agents and Artificial Intelligence

Maison d'édition :

SCITEPRESS - Science and Technology Publications, Online Streaming, Inconnu/non spécifié

ISBN/EAN :

978-989-758-547-0

Pagination :

662--673

Peer reviewed :

Peer reviewed

URL complémentaire :

https://www.scitepress.org/DigitalLibrary/Link.aspx?doi=10.5220/0010989100003116

Projet FnR :

FNR14762457 - Automating The Design Of Autonomous Robot Swarms, 2020 (01/05/2021-30/04/2024) - Gregoire Danoy

Organisme subsidiant :

FNR - Fonds National de la Recherche

Disponible sur ORBilu :

depuis le 22 février 2022

Statistiques

Nombre de vues

561 (dont 124 Unilu)

Nombre de téléchargements

4 (dont 4 Unilu)

Voir plus de statistiques

citations Scopus^®

citations Scopus^®
sans auto-citations

citations OpenAlex

citations WoS^™

Bibliographie

Amin, S., Gomrokchi, M., Satija, H., van Hoof, H., and Precup, D. (2021). A Survey of Exploration Methods in Reinforcement Learning. arXiv:2109.00157 [cs]. arXiv: 2109.00157.
Barrett, L. and Narayanan, S. (2008). Learning all optimal policies with multiple criteria. In Proceedings of the 25th international conference on Machine learning-ICML ’08, pages 41–47, Helsinki, Finland. ACM Press.
Gambardella, L. M. and Dorigo, M. (1995). Ant-Q: A Reinforcement Learning approach to the traveling salesman problem. In Prieditis, A. and Russell, S., editors, Machine Learning Proceedings 1995, pages 252–260. Morgan Kaufmann, San Francisco (CA).
Hayes, C., Rădulescu, R., Bargiacchi, E., Källström, J., Macfarlane, M., Reymond, M., Verstraeten, T., Zint-graf, L., Dazeley, R., Heintz, F., Howley, E., Irissap-pane, A., Mannion, P., Nowe, A., Ramos, G., Restelli, M., Vamplew, P., and Roijers, D. (2021). A Practical Guide to Multi-Objective Reinforcement Learning and Planning.
Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., Tunyasuvunakool, K., Bates, R., Žídek, A., Potapenko, A., Bridgland, A., Meyer, C., Kohl, S. A. A., Ballard, A. J., Cowie, A., Romera-Paredes, B., Nikolov, S., Jain, R., Adler, J., Back, T., Petersen, S., Reiman, D., Clancy, E., Zielinski, M., Steinegger, M., Pacholska, M., Berghammer, T., Bo-denstein, S., Silver, D., Vinyals, O., Senior, A. W., Kavukcuoglu, K., Kohli, P., and Hassabis, D. (2021). Highly accurate protein structure prediction with Al-phaFold. Nature, 596(7873):583–589.
Oliveira, T., Medeiros, L., Neto, A. D., and Melo, J. (2020). Q-Managed: A new algorithm for a multiobjective reinforcement learning. Expert Systems with Applications, 168:114228.
Parisi, S., Pirotta, M., and Restelli, M. (2016). Multi-objective Reinforcement Learning through Continuous Pareto Manifold Approximation. Journal of Artificial Intelligence Research, 57:187–227.
Roijers, D. M., Röpke, W., Nowe, A., and Radulescu, R. (2021). On Following Pareto-Optimal Policies in Multi-Objective Planning and Reinforcement Learning.
Roijers, D. M., Vamplew, P., Whiteson, S., and Dazeley, R. (2013). A survey of multi-objective sequential decision-making. Journal of Artificial Intelligence Research, 48(1):67–113.
Ruiz-Montiel, M., Mandow, L., and Pérez-de-la Cruz, J.-L. (2017). A temporal difference method for multi-objective reinforcement learning. Neurocomputing, 263:15–25.
Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T., and Hassabis, D. (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587):484–489.
Sutton, R. S. and Barto, A. G. (2018). Reinforcement Learning: An Introduction. Adaptive Computation and Machine Learning series. A Bradford Book, Cambridge, MA, USA, 2 edition.
Talbi, E.-G. (2009). Metaheuristics: From Design to Implementation, volume 74.
Vamplew, P., Dazeley, R., Berry, A., Issabekov, R., and Dekker, E. (2011). Empirical evaluation methods for multiobjective reinforcement learning algorithms. Machine Learning, 84(1):51–80.
Vamplew, P., Dazeley, R., and Foale, C. (2017a). Soft-max exploration strategies for multiobjective reinforcement learning. Neurocomputing, 263:74–86.
Vamplew, P., Issabekov, R., Dazeley, R., Foale, C., Berry, A., Moore, T., and Creighton, D. (2017b). Steering approaches to Pareto-optimal multiobjective reinforcement learning. Neurocomputing, 263:26–38.
Van Moffaert, K., Drugan, M. M., and Nowé, A. (2013). Hypervolume-Based Multi-Objective Reinforcement Learning. In Purshouse, R. C., Fleming, P. J., Fonseca, C. M., Greco, S., and Shaw, J., editors, Evolutionary Multi-Criterion Optimization, Lecture Notes in Computer Science, pages 352–366, Berlin, Heidelberg. Springer.
Van Moffaert, K. and Nowé, A. (2014). Multi-objective reinforcement learning using sets of pareto dominating policies. The Journal of Machine Learning Research, 15(1):3483–3512. Publisher: JMLR. org.
Wang, W. and Sebag, M. (2013). Hypervolume indicator and dominance reward based multi-objective Monte-Carlo Tree Search. Machine Learning, 92(2):403– 429.
White, D. (1982). Multi-objective infinite-horizon discounted Markov decision processes. Journal of Mathematical Analysis and Applications, 89(2):639–647.
Wiering, M. A. and de Jong, E. D. (2007). Computing Optimal Stationary Policies for Multi-Objective Markov Decision Processes. In 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, pages 158–165. ISSN: 2325-1867.
Wiering, M. A., Withagen, M., and Drugan, M. M. (2014). Model-based multi-objective reinforcement learning. In 2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), pages 1–6, Orlando, FL, USA. IEEE.