[en] Multi-objective reinforcement learning algorithms (MORL) extend standard reinforcement learning (RL) to scenarios where agents must optimize multiple---potentially conflicting---objectives, each represented by a distinct reward function. To facilitate and accelerate research and benchmarking in multi-objective RL problems, we introduce a comprehensive collection of software libraries that includes: (i) MO-Gymnasium, an easy-to-use and flexible API enabling the rapid construction of novel MORL environments. It also includes more than 20 environments under this API. This allows researchers to effortlessly evaluate any algorithms on any existing domains; (ii) MORL-Baselines, a collection of reliable and efficient implementations of state-of-the-art MORL algorithms, designed to provide a solid foundation for advancing research. Notably, all algorithms are inherently compatible with MO-Gymnasium; and (iii) a thorough and robust set of benchmark results and comparisons of MORL-Baselines algorithms, tested across various challenging MO-Gymnasium environments. These benchmarks were constructed to serve as guidelines for the research community, underscoring the properties, advantages, and limitations of each particular state-of-the-art method.
Research center :
ULHPC - University of Luxembourg: High Performance Computing
Disciplines :
Computer science
Author, co-author :
FELTEN, Florian ✱; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > PCOG ; Farama Foundation
Alegre, Lucas N. ✱; Federal University of Rio Grande do Sul > Institute of Informatics ; VUB - Vrije Universiteit Brussel [BE] > Artificial Intelligence Lab ; Farama Foundation
L. C. Bazzan, Ana; Federal University of Rio Grande do Sul > Institute of Informatics
TALBI, El-Ghazali ; University of Luxembourg ; ULille - University of Lille [FR] > CNRS/CRIStAL
DANOY, Grégoire ; University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS) ; Unilu - Université du Luxembourg [LU] > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > PCOG
C. da Silva, Bruno; University of Massachusetts
✱ These authors have contributed equally to this work.
External co-authors :
yes
Language :
English
Title :
A Toolkit for Reliable Benchmarking and Research in Multi-Objective Reinforcement Learning
Publication date :
2024
Event name :
Thirty-seventh Conference on Neural Information Processing Systems
Event place :
United States
Event date :
10/12/2023
Audience :
International
Main work title :
A Toolkit for Reliable Benchmarking and Research in Multi-Objective Reinforcement Learning
MO-Gymnasium and MORL-Baselines are available at https://github.com/Farama-Foundation/mo-gymnasium and https://github.com/LucasAlegre/morl-baselines, respectively.
The benchmark results are available on openrlbenchmark: https://wandb.ai/openrlbenchmark/MORL-Baselines.
The documentation of MO-Gymnasium is available at https://mo-gymnasium.farama.org.
MORL-Baselines documentation is available at: https://lucasalegre.github.io/morl-baselines.
If the true PF of the MOMDP, F, is known, then typically Z = F.
These results can be viewed and analyzed at https://wandb.ai/openrlbenchmark/MORL-Baselines.
We also include a description of the high-performance computers at the University of Luxembourg (Varrette et al., 2014) and Vrije Universiteit Brussel on which experiments were conducted. Training all algorithms on all environments, using various random seeds, required approximately 3 months of computation time.
An overview of the command lines used to conduct each experiment can be found at https://github.com/LucasAlegre/morl-baselines/issues/43.
https://farama.org/team.
See https://gymnasium.farama.org/environments/box2d/lunar_lander for details on the shaping reward.
We also support generating weight vectors based on OLS and GPI-LS methods.
Our implementation of PGMORL only supports environments with two objectives. This is because the implementation by the original authors relies on different methods for two and three objectives. Hence, PGMORL is only displayed in these two environments.
Abels, A., Roijers, D. M., Lenaerts, T., Nowé, A., and Steckelmacher, D. (2019). Dynamic weights in multiobjective deep reinforcement learning. In Proceedings of the 36th International Conference on Machine Learning, volume 97, pages 11-20. International Machine Learning Society (IMLS).
Agarwal, R., Schwarzer, M., Castro, P. S., Courville, A., and Bellemare, M. G. (2021). Deep reinforcement learning at the edge of the statistical precipice. In Beygelzimer, A., Dauphin, Y., Liang, P., and Vaughan, J. W., editors, Advances in Neural Information Processing Systems 34 (NeurIPS 2021).
Alegre, L. N., Bazzan, A. L. C., and da Silva, B. C. (2022a). Optimistic linear support and successor features as a basis for optimal policy transfer. In Chaudhuri, K., Jegelka, S., Song, L., Szepesvari, C., Niu, G., and Sabato, S., editors, Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 394-413. PMLR.
Alegre, L. N., Bazzan, A. L. C., Roijers, D. M., Nowé, A., and da Silva, B. C. (2023). Sample-efficient multiobjective learning via generalized policy improvement prioritization. In Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems, AAMAS'23, pages 2003-2012, Richland, SC. International Foundation for Autonomous Agents and Multiagent Systems.
Alegre, L. N., Felten, F., Talbi, E.-G., Danoy, G., Nowé, A., Bazzan, A. L., and da Silva, B. C. (2022b). MO-Gym: A Library of Multi-Objective Reinforcement Learning Environments. In Proceedings of the 34th Benelux Conference on Artificial Intelligence BNAIC/Benelearn.
Barreto, A., Dabney, W., Munos, R., Hunt, J. J., Schaul, T., van Hasselt, H. P., and Silver, D. (2017). Successor features for transfer in reinforcement learning. In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R., editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.
Barrett, L. and Narayanan, S. (2008). Learning all optimal policies with multiple criteria. In Proceedings of the 25th International Conference on Machine Learning, ICML'08, pages 41-47, New York, NY, USA. Association for Computing Machinery.
Bellemare, M. G., Candido, S., Castro, P. S., Gong, J., Machado, M. C., Moitra, S., Ponda, S. S., and Wang, Z. (2020). Autonomous navigation of stratospheric balloons using reinforcement learning. Nature, 588(7836):77-82.
Biewald, L. (2020). Experiment Tracking with Weights and Biases.
Blank, J., Deb, K., Dhebar, Y., Bandaru, S., and Seada, H. (2021). Generating Well-Spaced Points on a Unit Simplex for Evolutionary Many-Objective Optimization. IEEE Transactions on Evolutionary Computation, 25(1):48-60. Conference Name: IEEE Transactions on Evolutionary Computation.
Cassimon, T., Eyckerman, R., Mercelis, S., Latré, S., and Hellinckx, P. (2022). A survey on discrete multiobjective reinforcement learning benchmarks. In Proceedings of the Adaptive and Learning Agents Workshop (ALA 2022).
Castelletti, A., Pianosi, F., and Restelli, M. (2012). Tree-based fitted q-iteration for multi-objective markov decision problems. In The 2012 International Joint Conference on Neural Networks (IJCNN), pages 1-8.
Coello Coello, C. A. and Reyes Sierra, M. (2004). A Study of the Parallelization of a Coevolutionary Multi-objective Evolutionary Algorithm. In Monroy, R., Arroyo-Figueroa, G., Sucar, L. E., and Sossa, H., editors, MICAI 2004: Advances in Artificial Intelligence, Lecture Notes in Computer Science, pages 688-697, Berlin, Heidelberg. Springer.
Engstrom, L., Ilyas, A., Santurkar, S., Tsipras, D., Janoos, F., Rudolph, L., and Madry, A. (2020). Implementation Matters in Deep RL: A Case Study on PPO and TRPO. In Proceedings of the Eighth International Conference on Learning Representations.
Fujimoto, S., Hoof, H., and Meger, D. (2018a). Addressing Function Approximation Error in Actor-Critic Methods. In Proceedings of the 35th International Conference on Machine Learning, pages 1587-1596. PMLR. ISSN: 2640-3498.
Fujimoto, S., Hoof, H., and Meger, D. (2018b). Addressing function approximation error in actor-critic methods. In Proceedings of the 35th International Conference on Machine Learning, pages 1582-1591.
Haarnoja, T., Zhou, A., Abbeel, P., and Levine, S. (2018). Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Dy, J. and Krause, A., editors, Proceedings of the 35th International Conference on Machine Learning (ICML), volume 80 of Proceedings of Machine Learning Research, pages 1861-1870, Stockholmsmässan, Stockholm Sweden. PMLR.
Harris, C. R., Millman, K. J., Walt, S. J. v. d., Gommers, R., Virtanen, P., Cournapeau, D., Wieser, E., Taylor, J., Berg, S., Smith, N. J., Kern, R., Picus, M., Hoyer, S., Kerkwijk, M. H. v., Brett, M., Haldane, A., Río, J. F. d., Wiebe, M., Peterson, P., Gérard-Marchant, P., Sheppard, K., Reddy, T., Weckesser, W., Abbasi, H., Gohlke, C., and Oliphant, T. E. (2020). Array programming with NumPy. Nature, 585(7825):357-362. Publisher: Springer Science and Business Media LLC.
Hayes, C. F., Rădulescu, R., Bargiacchi, E., Källström, J., Macfarlane, M., Reymond, M., Verstraeten, T., Zintgraf, L. M., Dazeley, R., Heintz, F., Howley, E., Irissappane, A. A., Mannion, P., Nowé, A., Ramos, G., Restelli, M., Vamplew, P., and Roijers, D. M. (2022). A practical guide to multi-objective reinforcement learning and planning. Autonomous Agents and Multi-Agent Systems, 36(1):26.
Huang, S., Dossa, R. F. J., Raffin, A., Kanervisto, A., and Wang, W. (2022a). The 37 implementation details of proximal policy optimization. The ICLR Blog Track 2023.
Huang, S., Dossa, R. F. J., Ye, C., Braga, J., Chakraborty, D., Mehta, K., and Araújo, J. G. M. (2022b). CleanRL: High-quality Single-file Implementations of Deep Reinforcement Learning Algorithms. Journal of Machine Learning Research, 23(274):1-18.
Huang, S., Gallouédec, Q., Felten, F., Raffin, A., Dossa, R. F. J., Zhao, Y., Sullivan, R., Makoviychuk, V., Makoviichuk, D., Roumégous, C., Weng, J., Chen, C., Rahman, M., M. Araújo, J. G., Quan, G., Tan, D., Klein, T., Charakorn, R., Towers, M., Berthelot, Y., Mehta, K., Chakraborty, D., KG, A., Charraut, V., Ye, C., Liu, Z., Alegre, L. N., Choi, J., and Yi, B. (2023). openrlbenchmark.
Kauten, C. (2018). Super Mario Bros for OpenAI Gym. GitHub.
Leurent, E. (2018). An Environment for Autonomous Driving Decision-Making.
Lu, H., Herman, D., and Yu, Y. (2023). Multi-objective reinforcement learning: Convexity, stationarity and Pareto optimality. In Proceedings of The Eleventh International Conference on Learning Representations.
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., et al. (2015a). Human-level control through deep reinforcement learning. Nature, 518(7540):529-533.
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., and Hassabis, D. (2015b). Human-level control through deep reinforcement learning. Nature, 518(7540):529-533.
Moore, A. (1990). Efficient memory-based learning for robot control. Technical report, Carnegie Mellon University, Pittsburgh, PA.
Papoudakis, G., Christianos, F., Schäfer, L., and Albrecht, S. V. (2021). Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks. arXiv:2006.07869 [cs, stat].
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., and Chintala, S. (2019). PyTorch: An imperative style, high-performance deep learning library. In Wallach, H., Larochelle, H., Beygelzimer, A., d'Alché-Buc, F., Fox, E., and Garnett, R., editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc.
Patterson, A., Neumann, S., White, M., and White, A. (2023). Empirical Design in Reinforcement Learning. arXiv:2304.01315 [cs].
Pineau, J., Vincent-Lamarre, P., Sinha, K., Larivière, V., Beygelzimer, A., d'Alché Buc, F., Fox, E., and Larochelle, H. (2020). Improving Reproducibility in Machine Learning Research (A Report from the NeurIPS 2019 Reproducibility Program). arXiv:2003.12206 [cs, stat].
Raffin, A., Hill, A., Gleave, A., Kanervisto, A., Ernestus, M., and Dormann, N. (2021). Stable-baselines3: Reliable reinforcement learning implementations. Journal of Machine Learning Research, 22(268):1-8.
Reymond, M., Bargiacchi, E., and Nowé, A. (2022). Pareto conditioned networks. In Proc. of the 21st International Conference on Autonomous Agents and Multiagent Systems, Virtual. IFAAMAS.
Roijers, D. (2016). Multi-Objective Decision-Theoretic Planning. PhD thesis, University of Amsterdam.
Roijers, D., Steckelmacher, D., and Nowe, A. (2018a). Multi-objective Reinforcement Learning for the Expected Utility of the Return. In Proceedings of the ALA workshop at ICML/AAMAS/IJCAI 2018.
Roijers, D. M., Röpke, W., Nowé, A., and Rădulescu, R. (2021). On Following Pareto-Optimal Policies in Multi-Objective Planning and Reinforcement Learning. In Proceedings of the 1st Multi-Objective Decision Making Workshop (MODeM), page 8.
Roijers, D. M., Steckelmacher, D., and Nowé, A. (2018b). Multi-objective reinforcement learning for the expected utility of the return. In Proceedings of the ALA workshop at ICML/AAMAS/IJCAI 2018.
Roijers, D. M., Vamplew, P., Whiteson, S., and Dazeley, R. (2013). A survey of multi-objective sequential decision-making. J. Artificial Intelligence Research, 48(1):67-113.
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. (2017a). Proximal Policy Optimization Algorithms. arXiv:1707.06347 [cs]. arXiv: 1707.06347.
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. (2017b). Proximal policy optimization algorithms. CoRR, abs/1707.06347.
Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A., Chen, Y., Lillicrap, T., Hui, F., Sifre, L., van den Driessche, G., Graepel, T., and Hassabis, D. (2017). Mastering the game of go without human knowledge. Nature, 550(7676):354-359.
Sutton, R. S. and Barto, A. G. (2018). Reinforcement learning: An introduction. The MIT Press, second edition.
Terry, J. K., Black, B., Grammel, N., Jayakumar, M., Hari, A., Sullivan, R., Santos, L., Perez, R., Horsch, C., Dieffendahl, C., Williams, N. L., Lokesh, Y., and Ravi, P. (2021). PettingZoo: Gym for Multi-Agent Reinforcement Learning. arXiv:2009.14471 [cs, stat].
Todorov, E., Erez, T., and Tassa, Y. (2012). Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 5026-5033, Vilamoura, Portugal.
Towers, M., Terry, J. K., Kwiatkowski, A., Balis, J. U., Cola, G., Deleu, T., Goulão, M., Kallinteris, A., KG, A., Krimmel, M., Perez-Vicente, R., Pierré, A., Schulhoff, S., Tai, J. J., Tan, A. J. S., and Younis, O. G. (2023). Gymnasium.
Vamplew, P., Dazeley, R., Berry, A., Issabekov, R., and Dekker, E. (2011). Empirical evaluation methods for multiobjective reinforcement learning algorithms. Mach. Learn., 84(1-2):51-80.
Vamplew, P., Foale, C., Dazeley, R., and Bignold, A. (2021). Potential-based multiobjective reinforcement learning approaches to low-impact agents for AI safety. Engineering Applications of Artificial Intelligence, 100:104186.
Vamplew, P., Webb, D., Zintgraf, L. M., Roijers, D. M., Dazeley, R., Issabekov, R., and Dekker, E. (2017). MORL-Glue: a benchmark suite for multi-objective reinforcement learning. In Proceedings of the 29th Benelux Conference on Artificial Intelligence, pages 389-390.
Van Moffaert, K., Drugan, M. M., and Nowe, A. (2013a). Scalarized multi-objective reinforcement learning: Novel design techniques. In 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), pages 191-199, Singapore, Singapore. IEEE.
Van Moffaert, K., Drugan, M. M., and Nowé, A. (2013b). Scalarized multi-objective reinforcement learning: Novel design techniques. In 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), pages 191-199.
Van Moffaert, K. and Nowé, A. (2014). Multi-objective reinforcement learning using sets of Pareto dominating policies. J. Mach. Learn. Res., 15(1):3483-3512.
Van Moffaert, K. and Nowé, A. (2014). Multi-objective reinforcement learning using sets of pareto dominating policies. The Journal of Machine Learning Research, 15(1):3483-3512. Publisher: JMLR.org.
Varrette, S., Bouvry, P., Cartiaux, H., and Georgatos, F. (2014). Management of an academic HPC cluster: The UL experience. In 2014 International Conference on High Performance Computing & Simulation (HPCS), pages 959-967. IEEE.
Watkins, C. (1989). Learning from Delayed Rewards. PhD thesis, University of Cambridge.
Xu, J., Tian, Y., Ma, P., Rus, D., Sueda, S., and Matusik, W. (2020). Prediction-guided multi-objective reinforcement learning for continuous robot control. In Proceedings of the 37th International Conference on Machine Learning (ICML).
Yang, R., Sun, X., and Narasimhan, K. (2019). A generalized algorithm for multi-objective reinforcement learning and policy adaptation. In Wallach, H., Larochelle, H., Beygelzimer, A., d' Alché-Buc, F., Fox, E., and Garnett, R., editors, Advances in Neural Information Processing Systems 32, pages 14610-14621.
Yoo, A. B., Jette, M. A., and Grondona, M. (2003). SLURM: Simple Linux Utility for Resource Management. In Feitelson, D., Rudolph, L., and Schwiegelshohn, U., editors, Job Scheduling Strategies for Parallel Processing, Lecture Notes in Computer Science, pages 44-60, Berlin, Heidelberg. Springer.
Zhu, B., Dang, M., and Grover, A. (2023). Scaling pareto-efficient decision making via offline multi-objective RL. In In Proceedings of The Eleventh International Conference on Learning Representations.
Zintgraf, L. M., Kanters, T. V., Roijers, D. M., Oliehoek, F. A., and Beau, P. (2015). Quality assessment of MORL algorithms: A utility-based approach. In Benelearn 2015: Proceedings of the 24th Annual Machine Learning Conference of Belgium and the Netherlands.
Zitzler, E. (1999). Evolutionary algorithms for multiobjective optimization: methods and applications. In Ph.D. Dissertation. ETH Zurich, Switzerland.