[en] To integrate into human-centered environments, autonomous agents must learn from and adapt to humans in their native settings. Preference-based reinforcement learning (PbRL) can enable this by learning reward functions from human preferences. However, humans live in a world full of diverse information, most of which is irrelevant to completing any particular task. It then becomes essential that agents learn to focus on the subset of task-relevant state features. To that end, this work proposes R2N (Robust-to-Noise), the first PbRL algorithm that leverages principles of dynamic sparse training to learn robust reward models that can focus on task-relevant features. In experiments with a simulated teacher, we demonstrate that R2N can adapt the sparse connectivity of its neural networks to focus on task-relevant features, enabling R2N to significantly outperform several sparse training and PbRL algorithms across simulated robotic environments. We open-source our code at the following link: https://github.com/cmuslima/R2N
Disciplines :
Computer science
Author, co-author :
Muslimani, Calarina; UAlberta - University of Alberta
Grooten, Bram; Eindhoven University of Technology
Ranganatha Sastry Mamillapalli, Deepak; UAlberta - University of Alberta
Pechenizkiy, Mykola; Eindhoven University of Technology
MOCANU, Decebal Constantin ; University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)
Taylor, Matthew E.; UAlberta - University of Alberta
External co-authors :
yes
Language :
English
Title :
Boosting Robustness in Preference-Based Reinforcement Learning with Dynamic Sparsity
Publication date :
19 May 2025
Event name :
AAMAS 2025: 24th International Conference on Autonomous Agents and Multiagent Systems
Event place :
Detroit, United States
Event date :
19 - 23 May 2025
Audience :
International
Main work title :
AAMAS 2025: Proceedings of the 24th International Conference on Autonomous Agents and Multiagent Systems
Publisher :
International Foundation for Autonomous Agents and Multiagent Systems, Detroit, United States
Ralph Allan Bradley and Milton E. Terry. 1952. Rank Analysis of Incomplete Block Designs: I. The Method of Paired Comparisons. Biometrika (1952).
Paul F Christiano, Jan Leike, Tom Brown, Miljan Martic, Shane Legg, and Dario Amodei. 2017. Deep Reinforcement Learning from Human Preferences. In The Conference on Neural Information Processing Systems.
Yogesh K. Dwivedi, Laurie Hughes, Elvira Ismagilova, Gert Aarts, Crispin Coombs, Tom Crick, Yanqing Duan, Rohita Dwivedi, John Edwards, Aled Eirug, Vassilis Galanos, P. Vigneswara Ilavarasan, Marijn Janssen, Paul Jones, Arpan Kumar Kar, Hatice Kizgin, Bianca Kronemann, Banita Lal, Biagio Lucini, Rony Medaglia, Kenneth Le Meunier-FitzHugh, Leslie Caroline Le Meunier-FitzHugh, Santosh Misra, Emmanuel Mogaji, Sujeet Kumar Sharma, Jang Bahadur Singh, Vishnupriya Raghavan, Ramakrishnan Raman, Nripendra P. Rana, Spyridon Samothrakis, Jak Spencer, Kuttimani Tamilmani, Annie Tubadji, Paul Walton, and Michael D. Williams. 2021. Artificial Intelligence (AI): Multidisciplinary Perspectives on Emerging Challenges, Opportunities, and Agenda for Research, Practice and Policy. International Journal of Information Management (2021).
Utku Evci, Trevor Gale, Jacob Menick, Pablo Samuel Castro, and Erich Elsen. 2020. Rigging the Lottery: Making All Tickets Winners. In The International Conference on Machine Learning.
Google Gemini Team. 2024. Gemini: A Family of Highly Capable Multimodal Models.
Bram Grooten, Ghada Sokar, Shibhansh Dohare, Elena Mocanu, Matthew E. Taylor, Mykola Pechenizkiy, and Decebal Constantin Mocanu. 2023. Automatic Noise Filtering with Dynamic Sparse Training in Deep Reinforcement Learning. In The International Conference on Autonomous Agents and Multiagent Systems.
Kimin Lee, Laura M Smith, and Pieter Abbeel. 2021. PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training. In The International Conference on Machine Learning.
Xinran Liang, Katherine Shu, Kimin Lee, and Pieter Abbeel. 2022. Reward Uncertainty for Exploration in Preference-based Reinforcement Learning. In The International Conference on Learning Representations.
Decebal Constantin Mocanu, Elena Mocanu, Peter Stone, Phuong H Nguyen, Madeleine Gibescu, and Antonio Liotta. 2018. Scalable Training of Artificial Neural Networks with Adaptive Sparse Connectivity inspired by Network Science. Nature Communications (2018).
Calarina Muslimani and Matthew E Taylor. 2024. Leveraging Sub-Optimal Data for Human-in-the-Loop Reinforcement Learning (extended abstract). In The International Conference on Autonomous Agents and Multiagent Systems.
Andrew Y. Ng. 2004. Feature selection, L1 vs. L2 regularization, and rotational invariance. In The International Conference on Machine Learning.
OpenAI. 2023. GPT-4 Technical Report.
Jongjin Park, Younggyo Seo, Jinwoo Shin, Honglak Lee, Pieter Abbeel, and Kimin Lee. 2022. SURF: Semi-supervised Reward Learning with Data Augmentation for Feedback-efficient Preference-based Reinforcement Learning. In The International Conference on Learning Representations.
Ghada Sokar, Elena Mocanu, Decebal Constantin Mocanu, Mykola Pechenizkiy, and Peter Stone. 2022. Dynamic Sparse Training for Deep Reinforcement Learning. In The International Joint Conference on Artificial Intelligence.
Yuval Tassa, Yotam Doron, Alistair Muldal, Tom Erez, Yazhe Li, Diego de Las Casas, David Budden, Abbas Abdolmaleki, Josh Merel, Andrew Lefrancq, et al. 2018. Deepmind Control Suite.
Li Wan, Matthew Zeiler, Sixin Zhang, Yann Le Cun, and Rob Fergus. 2013. Regularization of Neural Networks using DropConnect. In The International Conference on Machine Learning.