[en] The visual world provides an abundance of information, but many input pixels
received by agents often contain distracting stimuli. Autonomous agents need
the ability to distinguish useful information from task-irrelevant perceptions,
enabling them to generalize to unseen environments with new distractions.
Existing works approach this problem using data augmentation or large auxiliary
networks with additional loss functions. We introduce MaDi, a novel algorithm
that learns to mask distractions by the reward signal only. In MaDi, the
conventional actor-critic structure of deep reinforcement learning agents is
complemented by a small third sibling, the Masker. This lightweight neural
network generates a mask to determine what the actor and critic will receive,
such that they can focus on learning the task. The masks are created
dynamically, depending on the current input. We run experiments on the DeepMind
Control Generalization Benchmark, the Distracting Control Suite, and a real UR5
Robotic Arm. Our algorithm improves the agent's focus with useful masks, while
its efficient Masker network only adds 0.2% more parameters to the original
structure, in contrast to previous work. MaDi consistently achieves
generalization results better than or competitive to state-of-the-art methods.
Disciplines :
Computer science
Author, co-author :
Grooten, Bram; Eindhoven University of Technology [NL]
Tomilin, Tristan; Eindhoven University of Technology [NL]
Vasan, Gautham; UAlberta - University of Alberta [CA]
Taylor, Matthew E.; UAlberta - University of Alberta [CA] ; Alberta Machine Intelligence Institute (Amii)
Mahmood, Rupam A.; UAlberta - University of Alberta [CA] ; Alberta Machine Intelligence Institute (Amii)
Fang, Meng; University of Liverpool [GB]
Pechenizkiy, Mykola; Eindhoven University of Technology [NL]
MOCANU, Decebal Constantin ; University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS) ; Eindhoven University of Technology [NL]
External co-authors :
yes
Language :
English
Title :
MaDi: Learning to Mask Distractions for Generalization in Visual Deep Reinforcement Learning
Publication date :
06 May 2024
Event name :
AAMAS '24: 2024 International Conference on Autonomous Agents and Multiagent Systems
Event date :
from 6 to 10 May 2024
Audience :
International
Main work title :
AAMAS '24: Proceedings of the 2024 International Conference on Autonomous Agents and Multiagent Systems
Publisher :
International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC
Accepted as full-paper (oral) at AAMAS 2024. Code is available at
https://github.com/bramgrooten/mask-distractions and see our 40-second video
at https://youtu.be/2oImF0h1k48
Rishabh Agarwal, Marlos C Machado, Pablo Samuel Castro, and Marc G Bellemare. 2020. Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning. In International Conference on Learning Representations. URL: https://arxiv.org/abs/2101.05265. (Cited in Section 2)
Ilge Akkaya, Marcin Andrychowicz, Maciek Chociej, Mateusz Litwin, Bob Mc-Grew, Arthur Petron, Alex Paino, Matthias Plappert, Glenn Powell, Raphael Ribas, et al. 2019. Solving Rubik's Cube with a Robot Hand. arXiv preprint arXiv:1910.07113 (2019). URL: https://openai.com/research/solving-rubiks-cube. (Cited in Section 1, 2)
Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. 2016. Layer Normalization. Advances in Neural Information Processing Systems, Deep Learning Symposium (2016). URL: https://arxiv.org/abs/1607.06450. (Cited in Section A)
David Bertoin, Adil Zouitine, Mehdi Zouitine, and Emmanuel Rachelson. 2022. Look where you look! Saliency-guided Q-networks for generalization in visual Reinforcement Learning. Advances in Neural Information Processing Systems 35 (2022), 30693-30706. URL: https://arxiv.org/abs/2209.09203. (Cited in Section 1, 2, 2, 2, 4)
Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. 2016. OpenAI Gym. arXiv preprint arXiv:1606.01540 (2016). URL: https://www.gymlibrary.dev/. (Cited in Section 2)
Karl Cobbe, Oleg Klimov, Chris Hesse, Taehoon Kim, and John Schulman. 2019. Quantifying Generalization in Reinforcement Learning. In Proceedings of the 36th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 97), Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.). PMLR, 1282-1289. URL: https://proceedings.mlr.press/v97/cobbe19a.html. (Cited in Section 2)
Jonas Degrave, Federico Felici, Jonas Buchli, Michael Neunert, Brendan Tracey, Francesco Carpanese, Timo Ewalds, Roland Hafner, Abbas Abdolmaleki, Diego de Las Casas, et al. 2022. Magnetic control of tokamak plasmas through deep reinforcement learning. Nature 602, 7897 (2022), 414-419. URL: https://www.nature.com/articles/s41586-021-04301-9. (Cited in Section 1)
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2021. An image is worth 16x16 words: Transformers for image recognition at scale. International Conference on Learning Representations (2021). URL: https://arxiv.org/abs/2010.11929. (Cited in Section 5.4, A.1)
Yan Duan, John Schulman, Xi Chen, Peter L Bartlett, Ilya Sutskever, and Pieter Abbeel. 2016. RL2: Fast Reinforcement Learning via Slow Reinforcement Learning. arXiv preprint arXiv:1611.02779 (2016). URL: https://arxiv.org/abs/1611.02779. (Cited in Section 2)
Open X-Embodiment Collaboration et al. 2023. Open X-Embodiment: Robotic Learning Datasets and RT-X Models. URL: https://robotics-transformer-x.github. io. (Cited in Section 1)
Linxi Fan, Guanzhi Wang, De-An Huang, Zhiding Yu, Li Fei-Fei, Yuke Zhu, and Anima Anandkumar. 2021. SECANT: Self-Expert Cloning for Zero-Shot Generalization of Visual Policies. International Conference on Machine Learning (2021). URL: https://arxiv.org/abs/2106.09678. (Cited in Section 2)
Jesse Farebrother, Marlos C Machado, and Michael Bowling. 2018. Generalization and Regularization in DQN. NeurIPS'18 Deep Reinforcement Learning Workshop (2018). URL: https://arxiv.org/abs/1810.00123. (Cited in Section 2)
Norm Ferns, Prakash Panangaden, and Doina Precup. 2011. Bisimulation Metrics for Continuous Markov Decision Processes. SIAM J. Comput. 40, 6 (2011), 1662-1714. URL: https://doi.org/10.1137/10080484X. (Cited in Section 2)
Chelsea Finn, Xin Yu Tan, Yan Duan, Trevor Darrell, Sergey Levine, and Pieter Abbeel. 2016. Deep Spatial Autoencoders for Visuomotor Learning. In 2016 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 512-519. URL: https://arxiv.org/abs/1509.06113. (Cited in Section A)
Laura Graesser, Utku Evci, Erich Elsen, and Pablo Samuel Castro. 2022. The State of Sparse Training in Deep Reinforcement Learning. In International Conference on Machine Learning. PMLR, 7766-7792. URL: https://arxiv.org/abs/2206.10369. (Cited in Section 5.1)
Bram Grooten, Ghada Sokar, Shibhansh Dohare, Elena Mocanu, Matthew E. Taylor, Mykola Pechenizkiy, and Decebal Constantin Mocanu. 2023. Automatic Noise Filtering with Dynamic Sparse Training in Deep Reinforcement Learning. The 22nd International Conference on Autonomous Agents and Multiagent Systems (AAMAS) (2023). URL: https://arxiv.org/abs/2302.06548. (Cited in Section 2, 5.1)
Tuomas Haarnoja, Ben Moran, Guy Lever, Sandy H Huang, Dhruva Tirumala, Markus Wulfmeier, Jan Humplik, Saran Tunyasuvunakool, Noah Y Siegel, Roland Hafner, et al. 2023. Learning Agile Soccer Skills for a Bipedal Robot with Deep Reinforcement Learning. arXiv preprint arXiv:2304.13653 (2023). URL: https://sites.google.com/view/op3-soccer. (Cited in Section 1)
Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. 2018. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. In International Conference on Machine Learning. PMLR, 1861-1870. URL: https://arxiv.org/abs/1801.01290. (Cited in Section 1, 2, 3, 5.3, 6.1, A)
Danijar Hafner, Timothy Lillicrap, Jimmy Ba, and Mohammad Norouzi. 2020. Dream to Control: Learning Behaviors by Latent Imagination. International Conference on Learning Representations (2020). URL: https://arxiv.org/abs/1912. 01603. (Cited in Section F.1)
Nicklas Hansen, Rishabh Jangir, Yu Sun, Guillem Alenyà, Pieter Abbeel, Alexei A Efros, Lerrel Pinto, and Xiaolong Wang. 2020. Self-Supervised Policy Adaptation during Deployment. International Conference on Learning Representations (2020). URL: https://arxiv.org/abs/2007.04309. (Cited in Section 2)
Nicklas Hansen, Hao Su, and Xiaolong Wang. 2021. Stabilizing Deep Q-Learning with ConvNets and Vision Transformers under Data Augmentation. 35th Conference on Neural Information Processing Systems (2021). URL: https://arxiv.org/abs/2107.00644. (Cited in Section 1, 2, 2, 3, 5.1, 5.4, 6.1, 4, A.1, F.1)
Nicklas Hansen and Xiaolong Wang. 2021. Generalization in Reinforcement Learning by Soft Data Augmentation. In International Conference on Robotics and Automation. URL: https://arxiv.org/abs/2011.13389. (Cited in Section 1, 1, 2, 2, 2, 5.1, 5.1, 6.1, A, 4, A, B, B.5, D, F.1, F.2)
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770-778. URL: https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/He_Deep_Residual_Learning_CVPR_2016_paper.pdf. (Cited in Section 2)
Dan Hendrycks and Kevin Gimpel. 2016. Gaussian Error Linear Units (GELUs). arXiv preprint arXiv:1606.08415 (2016). URL: https://arxiv.org/abs/1606.08415. (Cited in Section A.1)
Leslie Pack Kaelbling, Michael L Littman, and Anthony R Cassandra. 1998. Planning and acting in partially observable stochastic domains. Artificial Intelligence 101, 1-2 (1998), 99-134. URL: https://www.sciencedirect.com/science/article/pii/S000437029800023X. (Cited in Section 3)
Diederik Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. International Conference for Learning Representations (2015). URL: https://arxiv.org/abs/1412.6980. (Cited in Section 4)
Robert Kirk, Amy Zhang, Edward Grefenstette, and Tim Rocktäschel. 2023. A Survey of Zero-shot Generalisation in Deep Reinforcement Learning. Journal of Artificial Intelligence Research 76 (2023), 201-264. URL: https://arxiv.org/abs/2111.09794. (Cited in Section 2)
Misha Laskin, Kimin Lee, Adam Stooke, Lerrel Pinto, Pieter Abbeel, and Aravind Srinivas. 2020. Reinforcement Learning with Augmented Data. Advances in Neural Information Processing Systems 33 (2020), 19884-19895. URL: https://arxiv.org/abs/2004.14990. (Cited in Section 1, 2, 2, 2, 6.1, 4)
Michael Laskin, Aravind Srinivas, and Pieter Abbeel. 2020. CURL: Contrastive Unsupervised Representations for Reinforcement Learning. In International Conference on Machine Learning. PMLR, 5639-5650. URL: https://arxiv.org/abs/2004. 04136. (Cited in Section 2)
A Rupam Mahmood, Dmytro Korenkevych, Gautham Vasan, William Ma, and James Bergstra. 2018. Benchmarking Reinforcement Learning Algorithms on Real-World Robots. In Conference on robot learning. PMLR, 561-591. URL: https://arxiv.org/abs/1809.07731. (Cited in Section 6.1)
Piotr Mirowski, Razvan Pascanu, Fabio Viola, Hubert Soyer, Andrew J Ballard, Andrea Banino, Misha Denil, Ross Goroshin, Laurent Sifre, Koray Kavukcuoglu, et al. 2016. Learning to Navigate in Complex Environments. arXiv preprint arXiv:1611.03673 (2016). URL: https://arxiv.org/abs/1611.03673. (Cited in Section 1)
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529-533. URL: https://www.nature.com/articles/nature14236. (Cited in Section 1, 2, 7)
Xinghua Qu, Zhu Sun, Yew-Soon Ong, Abhishek Gupta, and Pengfei Wei. 2020. Minimalistic Attacks: How Little it Takes to Fool a Deep Reinforcement Learning Policy. IEEE Transactions on Cognitive and Developmental Systems 13, 4 (2020), 806-817. URL: https://arxiv.org/abs/1911.03849. (Cited in Section 2)
Roberta Raileanu, Maxwell Goldstein, Denis Yarats, Ilya Kostrikov, and Rob Fergus. 2021. Automatic Data Augmentation for Generalization in Reinforcement Learning. In Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (Eds.), Vol. 34. Curran Associates, Inc., 5402-5415. URL: https://proceedings.neurips.cc/paper/2021/hash/2b38c2df6a49b97f706ec9148ce48d86-Abstract.html. (Cited in Section 2)
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Medical Image Computing and Computer-Assisted Intervention-MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. Springer, 234-241. URL: https://arxiv.org/abs/1505.04597. (Cited in Section 2)
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV) 115, 3 (2015), 211-252. https://doi.org/10.1007/s11263-015-0816-y URL: https://link.springer.com/article/10.1007/s11263-015-0816-y. (Cited in Section 2)
Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, et al. 2020. Mastering Atari, Go, chess and shogi by planning with a learned model. Nature 588, 7839 (2020), 604-609. URL: https://www.nature.com/articles/s41586-020-03051-4. (Cited in Section 1)
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal Policy Optimization Algorithms. arXiv preprint arXiv:1707.06347 (2017). URL: https://arxiv.org/abs/1707.06347. (Cited in Section 7)
Austin Stone, Oscar Ramirez, Kurt Konolige, and Rico Jonschkowski. 2021. The Distracting Control Suite - A Challenging Benchmark for Reinforcement Learning from Pixels. arXiv preprint arXiv:2101.02722 (2021). URL: https://arxiv.org/abs/2101.02722. (Cited in Section 1, 1, 2, 5.1, B.6, F.1)
Tianxin Tao, Daniele Reda, and Michiel van de Panne. 2022. Evaluating Vision Transformer Methods for Deep Reinforcement Learning from Pixels. arXiv preprint arXiv:2204.04905 (2022). URL: https://arxiv.org/abs/2204.04905. (Cited in Section 5.4)
Yuval Tassa, Yotam Doron, Alistair Muldal, Tom Erez, Yazhe Li, Diego de Las Casas, David Budden, Abbas Abdolmaleki, Josh Merel, Andrew Lefrancq, et al. 2018. Deepmind Control Suite. arXiv preprint arXiv:1801.00690 (2018). URL: https://arxiv.org/abs/1801.00690. (Cited in Section 2, 5, F.1)
Josh Tobin, Rachel Fong, Alex Ray, Jonas Schneider, Wojciech Zaremba, and Pieter Abbeel. 2017. Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 23-30. URL: https://arxiv.org/abs/1703.06907. (Cited in Section 2)
Emanuel Todorov, Tom Erez, and Yuval Tassa. 2012. MuJoCo: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 5026-5033. URL: https://mujoco.org/. (Cited in Section 2)
Manan Tomar, Riashat Islam, Sergey Levine, and Philip Bachman. 2023. Ignorance is Bliss: Robust Control via Information Gating. arXiv preprint arXiv:2303.06121 (2023). URL: https://arxiv.org/abs/2303.06121. (Cited in Section 2)
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need. Advances in Neural Information Processing Systems 30 (2017). URL: https://arxiv.org/abs/1706.03762. (Cited in Section A.1)
Jane X Wang, Zeb Kurth-Nelson, Dhruva Tirumala, Hubert Soyer, Joel Z Leibo, Remi Munos, Charles Blundell, Dharshan Kumaran, and Matt Botvinick. 2016. Learning to reinforcement learn. arXiv preprint arXiv:1611.05763 (2016). URL: https://arxiv.org/abs/1611.05763. (Cited in Section 2)
Kaixin Wang, Bingyi Kang, Jie Shao, and Jiashi Feng. 2020. Improving Generalization in Reinforcement Learning with Mixture Regularization. Advances in Neural Information Processing Systems 33 (2020), 7968-7978. URL: https://arxiv.org/abs/2010.10814. (Cited in Section 2)
Yan Wang, Gautham Vasan, and A Rupam Mahmood. 2023. Real-Time Reinforcement Learning for Vision-Based Robotics Utilizing Local and Remote Computers. In 2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 9435-9441. URL: https://arxiv.org/abs/2210.02317. (Cited in Section 6.1, A, F.2)
Jinwei Xing, Takashi Nagata, Kexin Chen, Xinyun Zou, Emre Neftci, and Jeffrey L Krichmar. 2021. Domain Adaptation In Reinforcement Learning Via Latent Unified State Representation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 10452-10459. URL: https://arxiv.org/abs/2102.05714. (Cited in Section 2)
Denis Yarats, Ilya Kostrikov, and Rob Fergus. 2021. Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels. International Conference on Learning Representations (2021). URL: https://openreview.net/forum?id=GY6-6sTvGaf. (Cited in Section 1, 2, 2, 2, 4, F.1)
Tao Yu, Zhizheng Zhang, Cuiling Lan, Yan Lu, and Zhibo Chen. 2022. Mask-based Latent Reconstruction for Reinforcement Learning. Advances in Neural Information Processing Systems 35 (2022), 25117-25131. URL: https://arxiv.org/abs/2201.12096. (Cited in Section 2)
Yufeng Yuan and A Rupam Mahmood. 2022. Asynchronous Reinforcement Learning for Real-Time Control of Physical Robots. In 2022 International Conference on Robotics and Automation (ICRA). IEEE, 5546-5552. URL: https://arxiv.org/abs/2203.12759. (Cited in Section 6.1, F.2)
Zhecheng Yuan, Guozheng Ma, Yao Mu, Bo Xia, Bo Yuan, Xueqian Wang, Ping Luo, and Huazhe Xu. 2022. Don't Touch What Matters: Task-Aware Lipschitz Data Augmentation for Visual Reinforcement Learning. International Joint Conference on Artificial Intelligence (2022). URL: https://arxiv.org/abs/2202.09982. (Cited in Section 2, 2)
Zhecheng Yuan, Zhengrong Xue, Bo Yuan, Xueqian Wang, Yi Wu, Yang Gao, and Huazhe Xu. 2022. Pre-Trained Image Encoder for Generalizable Visual Reinforcement Learning. Advances in Neural Information Processing Systems 35 (2022), 13022-13037. URL: https://arxiv.org/abs/2212.08860. (Cited in Section 2, 2)
Dylan Yung, Andrew Szot, Prithvijit Chattopadhyay, Judy Hoffman, and Zsolt Kira. 2023. Augmentation Curriculum Learning For Generalization in RL. (2023). URL: https://openreview.net/forum?id=Fj1S0SV8p3U. (Cited in Section D)
Amy Zhang, Rowan McAllister, Roberto Calandra, Yarin Gal, and Sergey Levine. 2021. Learning Invariant Representations for Reinforcement Learning without Reconstruction. International Conference on Learning Representations (ICLR) (2021). URL: https://arxiv.org/abs/2006.10742. (Cited in Section 2)
Chiyuan Zhang, Oriol Vinyals, Remi Munos, and Samy Bengio. 2018. A Study on Overfitting in Deep Reinforcement Learning. arXiv preprint arXiv:1804.06893 (2018). URL: https://arxiv.org/abs/1804.06893. (Cited in Section 2)
Huan Zhang, Hongge Chen, Chaowei Xiao, Bo Li, Mingyan Liu, Duane Boning, and Cho-Jui Hsieh. 2020. Robust Deep Reinforcement Learning against Adversarial Perturbations on State Observations. Advances in Neural Information Processing Systems 33 (2020), 21024-21037. URL: https://arxiv.org/abs/2003.08938. (Cited in Section 2)
Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. 2017. Places: A 10 million Image Database for Scene Recognition. IEEE transactions on pattern analysis and machine intelligence 40, 6 (2017), 1452-1464. URL: https://ieeexplore.ieee.org/document/7968387. (Cited in Section 2, 5.1)
Y Zhu, R Mottaghi, E Kolve, JJ Lim, A Gupta, L Fei-Fei, and A Farhadi. 2017. Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning. International Conference on Robotics and Automation (2017). URL: https://ieeexplore.ieee.org/document/7989381. (Cited in Section 1)