[en] Multi-objective reinforcement learning (MORL) extends traditional RL by seeking policies making different compromises among conflicting objectives. The recent surge of interest in MORL has led to diverse studies and solving methods, often drawing from existing knowledge in multi-objective optimization based on decomposition (MOO/D). Yet, a clear categorization based on both RL and MOO/D is lacking in the existing literature. Consequently, MORL researchers face difficulties when trying to classify contributions within a broader context due to the absence of a standardized taxonomy. To tackle such an issue, this paper introduces multi-objective reinforcement learning based on decomposition (MORL/D), a novel methodology bridging the literature of RL and MOO. A comprehensive taxonomy for MORL/D is presented, providing a structured foundation for categorizing existing and potential MORL works. The introduced taxonomy is then used to scrutinize MORL research, enhancing clarity and conciseness through well-defined categorization. Moreover, a flexible framework derived from the taxonomy is introduced. This framework accommodates diverse instantiations using tools from both RL and MOO/D. Its versatility is demonstrated by implementing it in different configurations and assessing it on contrasting benchmark problems. Results indicate MORL/D instantiations achieve comparable performance to current state-of-the-art approaches on the studied problems. By presenting the taxonomy and framework, this paper offers a comprehensive perspective and a unified vocabulary for MORL. This not only facilitates the identification of algorithmic contributions but also lays the groundwork for novel research avenues in MORL.
Disciplines :
Computer science
Author, co-author :
FELTEN, Florian ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > PCOG
scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.
Bibliography
Abels, A., Roijers, D., Lenaerts, T., Nowé, A., & Steckelmacher, D. (2019). Dynamic Weights in Multi-Objective Deep Reinforcement Learning. In Proceedings of the 36th International Conference on Machine Learning, pp. 11-20. PMLR. ISSN: 2640-3498.
Alaya, I., Solnon, C., & Ghedira, K. (2007). Ant Colony Optimization for Multi-objective Optimization Problems. In 19th IEEE International Conference on Tools with Artificial Intelligence (ICTAI), pp. 450-457, Patras, Greece. IEEE Computer Society.
Alegre, L. N., Bazzan, A. L. C., Roijers, D. M., & Nowé, A. (2023). Sample-Efficient Multi-Objective Learning via Generalized Policy Improvement Prioritization. In Proc. of the 22nd International Conference on Autonomous Agents and Multiagent Systems.
Alegre, L. N., Felten, F., Talbi, E.-G., Danoy, G., Nowé, A., Bazzan, A. L., & da Silva, B. C. (2022). MO-Gym: A Library of Multi-Objective Reinforcement Learning Environments. In Proceedings of the 34th Benelux Conference on Artificial Intelligence BNAIC/Benelearn.
Andrychowicz, M., Wolski, F., Ray, A., Schneider, J., Fong, R., Welinder, P., McGrew, B., Tobin, J., Abbeel, P., & Zaremba, W. (2017). Hindsight experience replay. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS'17, pp. 5055-5065, Red Hook, NY, USA. Curran Associates Inc.
Blank, J., & Deb, K. (2020). pymoo: Multi-Objective Optimization in Python. IEEE Access, 8, 89497-89509.
Blank, J., Deb, K., Dhebar, Y., Bandaru, S., & Seada, H. (2021). Generating Well-Spaced Points on a Unit Simplex for Evolutionary Many-Objective Optimization. IEEE Transactions on Evolutionary Computation, 25 (1), 48-60. Conference Name: IEEE Transactions on Evolutionary Computation.
Burke, E. K., Gendreau, M., Hyde, M., Kendall, G., Ochoa, G., Özcan, E., & Qu, R. (2013). Hyper-heuristics: a survey of the state of the art. Journal of the Operational Research Society, 64 (12), 1695-1724.
Castelletti, A., Pianosi, F., & Restelli, M. (2013). A multiobjective reinforcement learning approach to water resources systems operation: Pareto frontier approximation in a single run. Water Resources Research, 49 (6), 3476-3486. eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/wrcr.20295.
Chen, D., Wang, Y., & Gao, W. (2020). Combining a gradient-based method and an evolution strategy for multi-objective reinforcement learning. Applied Intelligence, 50.
Coello Coello, C. A., & Reyes Sierra, M. (2004). A Study of the Parallelization of a Coevolutionary Multi-objective Evolutionary Algorithm. In Monroy, R., Arroyo-Figueroa, G., Sucar, L. E., & Sossa, H. (Eds.), MICAI 2004: Advances in Artificial Intelligence, Lecture Notes in Computer Science, pp. 688-697, Berlin, Heidelberg. Springer.
Czyzzak, P., & Jaszkiewicz, A. (1998). Pareto simulated annealing-a metaheuristic technique for multiple-objective combinatorial optimization. Journal of Multi-Criteria Decision Analysis, 7 (1), 34-47.
Das, I., & Dennis, J. (2000). Normal-Boundary Intersection: A New Method for Generating the Pareto Surface in Nonlinear Multicriteria Optimization Problems. SIAM Journal on Optimization, 8.
de Bruin, T., Kober, J., Tuyls, K., & Babuska, R. (2016). Improved deep reinforcement learning for robotics through distribution-based experience retention. In 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3947-3952, Daejeon, South Korea. IEEE.
Deb, K., Pratap, A., Agarwal, S., & Meyarivan, T. (2002). A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation, 6 (2), 182-197. Conference Name: IEEE Transactions on Evolutionary Computation.
Dubois-Lacoste, J., López-Ibáñez, M., & Stützle, T. (2011). Improving the anytime behavior of two-phase local search. Annals of Mathematics and Artificial Intelligence, 61 (2), 125-154.
Eimer, T., Lindauer, M., & Raileanu, R. (2023). Hyperparameters in Reinforcement Learning and How To Tune Them. In Proceedings of the 40th International Conference on Machine Learning (ICML 2023).
Emmerich, M. T. M., & Deutz, A. H. (2018). A tutorial on multiobjective optimization: fundamentals and evolutionary methods. Natural Computing, 17 (3), 585-609.
Felten, F., Alegre, L. N., Nowe, A., Bazzan, A. L. C., Talbi, E. G., Danoy, G., & Silva, B. C. d. (2023). A Toolkit for Reliable Benchmarking and Research in Multi-Objective Reinforcement Learning. In Proceedings of the 37th Conference on Neural Information Processing Systems (NeurIPS 2023).
Felten, F., Danoy, G., Talbi, E.-G., & Bouvry, P. (2022). Metaheuristics-based Exploration Strategies for Multi-Objective Reinforcement Learning:. In Proceedings of the 14th International Conference on Agents and Artificial Intelligence, pp. 662-673. SCITEPRESS - Science and Technology Publications.
Felten, F., Gareev, D., Talbi, E.-G., & Danoy, G. (2023). Hyperparameter Optimization for Multi-Objective Reinforcement Learning.. arXiv:2310.16487 [cs]. Felten, F., Talbi, E.-G., & Danoy, G. (2022). MORL/D: Multi-Objective Reinforcement Learning based on Decomposition. In International Conference in Optimization and Learning (OLA2022).
Haarnoja, T., Zhou, A., Abbeel, P., & Levine, S. (2018). Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. In Proceedings of the 35th International Conference on Machine Learning, pp. 1861-1870. PMLR. ISSN: 2640-3498.
Hansen, M. (2000). Tabu search for multiobjective combinatorial optimization: TAMOCO. Control and Cybernetics, 29.
Hayes, C., Radulescu, R., Bargiacchi, E., Källström, J., Macfarlane, M., Reymond, M., Verstraeten, T., Zintgraf, L., Dazeley, R., Heintz, F., Howley, E., Irissappane, A., Mannion, P., Nowe, A., Ramos, G., Restelli, M., Vamplew, P., & Roijers, D. (2022). A practical guide to multi-objective reinforcement learning and planning. Autonomous Agents and Multi-Agent Systems, 36.
Huang, S., Dossa, R. F. J., Ye, C., Braga, J., Chakraborty, D., Mehta, K., & Araújo, J. G. M. (2022). CleanRL: High-quality Single-file Implementations of Deep Reinforcement Learning Algorithms. Journal of Machine Learning Research, 23 (274), 1-18.
Huang, S., Gallouédec, Q., Felten, F., Raffin, A., Dossa, R. F. J., Zhao, Y., Sullivan, R., Makoviychuk, V., Makoviichuk, D., Danesh, M. H., Roumégous, C., Weng, J., Chen, C., Rahman, M. M., Araújo, J. G. M., Quan, G., Tan, D., Klein, T., Charakorn, R., Towers, M., Berthelot, Y., Mehta, K., Chakraborty, D., KG, A., Charraut, V., Ye, C., Liu, Z., Alegre, L. N., Nikulin, A., Hu, X., Liu, T., Choi, J., & Yi, B. (2024). Open RL Benchmark: Comprehensive Tracked Experiments for Reinforcement Learning.. arXiv:2402.03046 [cs]. Ishibuchi, H., & Kaige, S. (2004). Implementation of Simple Multiobjective Memetic Algorithms and Its Application to Knapsack Problems. International Journal of Hybrid Intelligent Systems - IJHIS, 1.
Ishibuchi, H., Sakane, Y., Tsukamoto, N., & Nojima, Y. (2010). Simultaneous Use of Different Scalarizing Functions in MOEA/D. In Proceedings of the 12th Annual Conference on Genetic and Evolutionary Computation, GECCO '10, pp. 519-526, New York, NY, USA. Association for Computing Machinery. event-place: Portland, Oregon, USA.
Ke, L., Zhang, Q., & Battiti, R. (2013). MOEA/D-ACO: A Multiobjective Evolutionary Algorithm Using Decomposition and AntColony. IEEE Transactions on Cybernetics, 43 (6), 1845-1859. Conference Name: IEEE Transactions on Cybernetics.
Liu, Y., Ishibuchi, H., Masuyama, N., & Nojima, Y. (2020). Adapting Reference Vectors and Scalarizing Functions by Growing Neural Gas to Handle Irregular Pareto Fronts. IEEE Transactions on Evolutionary Computation, 24 (3), 439-453. Conference Name: IEEE Transactions on Evolutionary Computation.
Lu, H., Herman, D., & Yu, Y. (2023). Multi-Objective Reinforcement Learning: Convexity, Stationarity and Pareto Optimality. In The Eleventh International Conference on Learning Representations.
Marler, R., & Arora, J. (2010). The weighted sum method for multi-objective optimization: New insights. Structural and Multidisciplinary Optimization, 41, 853-862.
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., & Hassabis, D. (2015). Human-level control through deep reinforcement learning. Nature, 518 (7540), 529-533.
Moerland, T. M., Broekens, J., Plaat, A., & Jonker, C. M. (2023). Model-based Reinforcement Learning: A Survey. Foundations and Trends® in Machine Learning, 16 (1), 1-118.
Mossalam, H., Assael, Y. M., Roijers, D. M., & Whiteson, S. (2016). Multi-Objective Deep Reinforcement Learning. CoRR, abs/1610.02707. arXiv: 1610.02707.
Murata, T., Ishibuchi, H., & Gen, M. (2001). Specification of Genetic Search Directions in Cellular Multi-objective Genetic Algorithms. In Zitzler, E., Thiele, L., Deb, K., Coello Coello, C. A., & Corne, D. (Eds.), Evolutionary Multi-Criterion Optimization, Lecture Notes in Computer Science, pp. 82-95, Berlin, Heidelberg. Springer.
Natarajan, S., & Tadepalli, P. (2005). Dynamic preferences in multi-criteria reinforcement learning. Association for Computing Machinery. Pages: 608.
Parker-Holder, J., Rajan, R., Song, X., Biedenkapp, A., Miao, Y., Eimer, T., Zhang, B., Nguyen, V., Calandra, R., Faust, A., Hutter, F., & Lindauer, M. (2022). Automated Reinforcement Learning (AutoRL): A Survey and Open Problems. Journal of Artificial Intelligence Research, 74.
Reymond, M., & Nowe, A. (2019). Pareto-DQN: Approximating the Pareto front in complex multi-objective decision problems. In Proceedings of the Adaptive and Learning Agents Workshop 2019 (ALA-19) at AAMAS.
Roijers, D., Steckelmacher, D., & Nowe, A. (2018). Multi-objective Reinforcement Learning for the Expected Utility of the Return. In Proceedings of the ALA workshop at ICML/AAMAS/IJCAI 2018.
Roijers, D., Whiteson, S., & Oliehoek, F. (2015). Computing Convex Coverage Sets for Faster Multi-objective Coordination. Journal of Artificial Intelligence Research, 52, 399-443.
Roijers, D. M., & Whiteson, S. (2017). Multi-Objective Decision Making. Synthesis Lectures on Artificial Intelligence and Machine Learning, 11 (1), 1-129. Publisher: Morgan & Claypool Publishers.
Roijers, D. M., Whiteson, S., & Oliehoek, F. A. (2015). Point-Based Planning for Multi-Objective POMDPs. In Proceedings of the 24th International Conference on Artificial Intelligence, IJCAI'15, pp. 1666-1672. AAAI Press. Place: Buenos Aires, Argentina.
Roijers, D. M., Vamplew, P., Whiteson, S., & Dazeley, R. (2013). A Survey of Multi-Objective Sequential Decision-Making. Journal of Artificial Intelligence Research, 48, 67-113. arXiv: 1402.0590.
Ruiz-Montiel, M., Mandow, L., & Pérez-de-la Cruz, J.-L. (2017). A temporal difference method for multi-objective reinforcement learning. Neurocomputing, 263, 15-25.
Régin, J.-C., Rezgui, M., & Malapert, A. (2013). Embarrassingly Parallel Search. In Schulte, C. (Ed.), Principles and Practice of Constraint Programming, Lecture Notes in Computer Science, pp. 596-610, Berlin, Heidelberg. Springer.
Röpke, W., Hayes, C. F., Mannion, P., Howley, E., Nowé, A., & Roijers, D. M. (2023). Distributional Multi-Objective Decision Making. In Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, pp. 5711-5719, Macau, SAR China. International Joint Conferences on Artificial Intelligence Organization.
Santiago, A., Huacuja, H. J. F., Dorronsoro, B., Pecero, J. E., Santillan, C. G., Barbosa, J. J. G., & Monterrubio, J. C. S. (2014). A Survey of Decomposition Methods for Multiobjective Optimization. In Castillo, O., Melin, P., Pedrycz, W., & Kacprzyk, J. (Eds.), Recent Advances on Hybrid Approaches for Designing Intelligent Systems, Studies in Computational Intelligence, pp. 453-465. Springer International Publishing, Cham.
Schaul, T., Quan, J., Antonoglou, I., & Silver, D. (2016). Prioritized Experience Replay. In Bengio, Y., & LeCun, Y. (Eds.), Proceedings of the 4th International Conference on Learning Representations, (ICLR 2016).
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal Policy Optimization Algorithms.. CoRR, abs/1707.06347.
Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T., & Hassabis, D. (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529 (7587), 484-489. Bandiera abtest: a Cg type: Nature Research Journals Number: 7587 Primary atype: Research Publisher: Nature Publishing Group Subject term: Computational science; Computer science; Reward Subject term id: computational-science; computer-science; reward.
Stanley, K. O., Clune, J., Lehman, J., & Miikkulainen, R. (2019). Designing neural networks through neuroevolution. Nature Machine Intelligence, 1 (1), 24-35. Number: 1 Publisher: Nature Publishing Group.
Sutton, R. S., & Barto, A. G. (2018). Reinforcement Learning: An Introduction (2 edition). Adaptive Computation and Machine Learning series. A Bradford Book, Cambridge, MA, USA.
Talbi, E.-G. (2009). Metaheuristics: From Design to Implementation. Wiley Publishing.
Todorov, E., Erez, T., & Tassa, Y. (2012). MuJoCo: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026-5033. ISSN: 2153-0866.
Vamplew, P., Dazeley, R., Barker, E., & Kelarev, A. (2009). Constructing Stochastic Mixture Policies for Episodic Multiobjective Reinforcement Learning Tasks. In Nicholson, A., & Li, X. (Eds.), AI 2009: Advances in Artificial Intelligence, Lecture Notes in Computer Science, pp. 340-349, Berlin, Heidelberg. Springer.
Vamplew, P., Dazeley, R., Berry, A., Issabekov, R., & Dekker, E. (2011). Empirical evaluation methods for multiobjective reinforcement learning algorithms. Machine Learning, 84 (1), 51-80.
Vamplew, P., Smith, B. J., Källström, J., Ramos, G., Radulescu, R., Roijers, D. M., Hayes, C. F., Heintz, F., Mannion, P., Libin, P. J. K., Dazeley, R., & Foale, C. (2022). Scalar reward is not enough: a response to Silver, Singh, Precup and Sutton (2021). Autonomous Agents and Multi-Agent Systems, 36 (2), 41.
Van Moffaert, K., Drugan, M. M., & Nowe, A. (2013). Scalarized multi-objective reinforcement learning: Novel design techniques. In 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), pp. 191-199, Singapore, Singapore. IEEE.
Van Moffaert, K., & Nowé, A. (2014). Multi-objective reinforcement learning using sets of pareto dominating policies. The Journal of Machine Learning Research, 15 (1), 3483-3512. Publisher: JMLR. org.
Varrette, S., Bouvry, P., Cartiaux, H., & Georgatos, F. (2014). Management of an academic HPC cluster: The UL experience. In 2014 International Conference on High Performance Computing & Simulation (HPCS), pp. 959-967. IEEE.
Watkins, C. J. C. H., & Dayan, P. (1992). Q-learning. Machine Learning, 8 (3), 279-292.
Wiering, M. A., Withagen, M., & Drugan, M. M. (2014). Model-based multi-objective reinforcement learning. In 2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), pp. 1-6, Orlando, FL, USA. IEEE.
Wolpert, D., & Macready, W. (1997). No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation, 1 (1), 67-82. Conference Name: IEEE Transactions on Evolutionary Computation.
Wortsman, M., Ilharco, G., Gadre, S. Y., Roelofs, R., Gontijo-Lopes, R., Morcos, A. S., Namkoong, H., Farhadi, A., Carmon, Y., Kornblith, S., & Schmidt, L. (2022). Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time. In Proceedings of the 39th International Conference on Machine Learning, pp. 23965-23998. PMLR. ISSN: 2640-3498.
Wurman, P. R., Barrett, S., Kawamoto, K., MacGlashan, J., Subramanian, K., Walsh, T. J., Capobianco, R., Devlic, A., Eckert, F., Fuchs, F., Gilpin, L., Khandelwal, P., Kompella, V., Lin, H., MacAlpine, P., Oller, D., Seno, T., Sherstan, C., Thomure, M. D., Aghabozorgi, H., Barrett, L., Douglas, R., Whitehead, D., Dürr, P., Stone, P., Spranger, M., & Kitano, H. (2022). Outracing champion Gran Turismo drivers with deep reinforcement learning. Nature, 602 (7896), 223-228. Number: 7896 Publisher: Nature Publishing Group.
Xu, J., Tian, Y., Ma, P., Rus, D., Sueda, S., & Matusik, W. (2020a). Prediction-Guided Multi-Objective Reinforcement Learning for Continuous Robot Control. In Proceedings of the 37th International Conference on Machine Learning, pp. 10607-10616. PMLR. ISSN: 2640-3498.
Xu, Q., Xu, Z., & Ma, T. (2020b). A Survey of Multiobjective Evolutionary Algorithms Based on Decomposition: Variants, Challenges and Future Directions. IEEE Access, 8, 41588-41614. Conference Name: IEEE Access.
Yang, R., Sun, X., & Narasimhan, K. (2019). A Generalized Algorithm for Multi-Objective Reinforcement Learning and Policy Adaptation. In Advances in Neural Information Processing Systems, Vol. 32. Curran Associates, Inc.
Yu, C., Velu, A., Vinitsky, E., Gao, J., Wang, Y., Bayen, A., & Wu, Y. (2022). The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track.
Zhang, Q., & Li, H. (2007). MOEA/D: A Multiobjective Evolutionary Algorithm Based on Decomposition. IEEE Transactions on Evolutionary Computation, 11 (6), 712-731. Conference Name: IEEE Transactions on Evolutionary Computation.
Zintgraf, L. M., Kanters, T. V., Roijers, D. M., Oliehoek, F. A., & Beau, P. (2015). Quality Assessment of MORL Algorithms: A Utility-Based Approach. In Benelearn 2015: Proceedings of the 24th Annual Machine Learning Conference of Belgium and the Netherlands.
Zitzler, E. (1999). Evolutionary algorithms for multiobjective optimization: methods and applications. In Ph.D. Dissertation. ETH Zurich, Switzerland.
Similar publications
Sorry the service is unavailable at the moment. Please try again later.