[en] Vision-based grasping of unknown objects in unstructured environments is a key challenge for autonomous robotic manipulation. A practical grasp synthesis system is required to generate a diverse set of 6-DoF grasps from which a task-relevant grasp can be executed. Although generative models are suitable for learning such complex data distributions, existing models have limitations in grasp quality, long training times, and a lack of flexibility for task-specific generation. In this work, we present GraspLDM, a modular generative framework for 6-DoF grasp synthesis that uses diffusion models as priors in the latent space of a VAE. GraspLDM learns a generative model of object-centric SE(3) grasp poses conditioned on point clouds. GraspLDM's architecture enables us to train task-specific models efficiently by only re-training a small denoising network in the low-dimensional latent space, as opposed to existing models that need expensive re-training. Our framework provides robust and scalable models on both full and partial point clouds. GraspLDM models trained with simulation data transfer well to the real world without any further fine-tuning. Our models provide an 80% success rate for 80 grasp attempts of diverse test objects across two real-world robotic setups.
Centre de recherche :
Interdisciplinary Centre for Security, Reliability and Trust (SnT) > SpaceR – Space Robotics
Disciplines :
Sciences informatiques
Auteur, co-auteur :
BARAD, Kuldeep Rambhai ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > Space Robotics ; Redwire Space Europe, Luxembourg City, Luxembourg
ORSULA, Andrej ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > Space Robotics
RICHARD, Antoine ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > Space Robotics
DENTLER, Jan Eric ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust > Automation ; Redwire Space Europe, Luxembourg City, Luxembourg
OLIVARES MENDEZ, Miguel Angel ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > Space Robotics
MARTINEZ LUNA, Carol ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > Space Robotics
Co-auteurs externes :
yes
Langue du document :
Anglais
Titre :
GraspLDM: Generative 6-DoF Grasp Synthesis Using Latent Diffusion Models
Date de publication/diffusion :
2024
Titre du périodique :
IEEE Access
ISSN :
2169-3536
Maison d'édition :
Institute of Electrical and Electronics Engineers Inc.
FNR15799985 - Modular Vision For Dynamic Grasping Of Unknown Resident Space Objects, 2021 (01/04/2021-15/01/2025) - Kuldeep Rambhai Barad
Intitulé du projet de recherche :
Modular Vision For Dynamic Grasping Of Unknown Resident Space Objects
Organisme subsidiant :
Fonds National de la Recherche (FNR) Industrial Fellowship under Redwire Space Europe
N° du Fonds :
15799985
Subventionnement (détails) :
This work was supported in part by the Fonds National de la Recherche (FNR) Industrial Fellowship under Grant 15799985, and in part by the Redwire Space Europe. Code and resources are available at: https://github.com/kuldeepbrd1/graspLDM.
J. Bohg, A. Morales, T. Asfour, and D. Kragic, "Data-driven grasp synthesis-A survey, " IEEE Trans. Robot., vol. 30, no. 2, pp. 289-309, Apr. 2014.
A. Bicchi and V. Kumar, "Robotic grasping and contact: A review, " in Proc. Millennium Conf. IEEE Int. Conf. Robot. Autom. Symposia (ICRA), vol. 1, Jun. 2000, pp. 348-353.
R. Balasubramanian, L. Xu, P. D. Brook, J. R. Smith, and Y. Matsuoka, "Physical human interactive guidance: Identifying grasping principles from human-planned grasps, " IEEE Trans. Robot., vol. 28, no. 4, pp. 899-910, Aug. 2012.
R. Detry, C. H. Ek, M. Madry, J. Piater, and D. Kragic, "Generalizing grasps across partly similar objects, " in Proc. IEEE Int. Conf. Robot. Autom., May 2012, pp. 3791-3797.
S. El-Khoury and A. Sahbani, "Handling objects by their handles, " in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst., Jun. 2008, pp. 1-7.
Y. Jiang, S. Moseson, and A. Saxena, "Efficient grasping from RGBD images: Learning using a new rectangle representation, " in Proc. IEEE Int. Conf. Robot. Autom., May 2011, pp. 3304-3311.
J. Mahler, J. Liang, S. Niyaz, M. Laskey, R. Doan, X. Liu, J. A. Ojea, and K. Goldberg, "Dex-Net 2. 0: Deep learning to plan robust grasps with synthetic point clouds and analytic grasp metrics, " 2017, arXiv:1703. 09312.
D. Morrison, P. Corke, and J. Leitner, "Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach, " 2018, arXiv:1804. 05172.
R. Newbury, M. Gu, L. Chumbley, A. Mousavian, C. Eppner, J. Leitner, J. Bohg, A. Morales, T. Asfour, D. Kragic, D. Fox, and A. Cosgun, "Deep learning approaches to grasp synthesis: A review, " 2022, arXiv:2207. 02556.
J. Ibarz, J. Tan, C. Finn, M. Kalakrishnan, P. Pastor, and S. Levine, "How to train your robot with deep reinforcement learning: Lessons we have learned, " Int. J. Robot. Res., vol. 40, nos. 4-5, pp. 698-721, Apr. 2021.
C. Eppner, A. Mousavian, and D. Fox, "ACRONYM: A large-scale grasp dataset based on simulation, " in Proc. IEEE Int. Conf. Robot. Autom. (ICRA), May 2021, pp. 6222-6227.
H.-S. Fang, C. Wang, M. Gou, and C. Lu, "GraspNet-1Billion: A largescale benchmark for general object grasping, " in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 11441-11450.
A. Mousavian, C. Eppner, and D. Fox, "6-DOF GraspNet: Variational grasp generation for object manipulation, " in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 2901-2910.
A. Murali, A. Mousavian, C. Eppner, C. Paxton, and D. Fox, "6-DOF grasping for target-driven object manipulation in clutter, " in Proc. IEEE Int. Conf. Robot. Autom. (ICRA), May 2020, pp. 6232-6238.
D. P. Kingma and M. Welling, "Auto-encoding variational Bayes, " 2013, arXiv:1312. 6114.
J. Ho, A. Jain, and P. Abbeel, "Denoising diffusion probabilistic models, " in Proc. NIPS, vol. 33. Vancouver, BC, Canada: Curran Associates, 2020, pp. 6840-6851.
Y. Song and S. Ermon, "Generative modeling by estimating gradients of the data distribution, " in Proc. Adv. Neural Inf. Process. Syst., vol. 32, 2019, pp. 11918-11930, Art. no. 1067.
J. Sohl-Dickstein, E. Weiss, N. Maheswaranathan, and S. Ganguli, "Deep unsupervised learning using nonequilibrium thermodynamics, " in Proc. Int. Conf. Mach. Learn., 2015, pp. 2256-2265.
Y. Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole, "Score-based generative modeling through stochastic differential equations, " 2020, arXiv:2011. 13456.
X. Zeng, A. Vahdat, F. Williams, Z. Gojcic, O. Litany, S. Fidler, and K. Kreis, "LION: Latent point diffusion models for 3D shape generation, " 2022, arXiv:2210. 06978.
S. Zhao, J. Song, and S. Ermon, "InfoVAE: Information maximizing variational autoencoders, " 2017, arXiv:1706. 02262.
R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, "High-resolution image synthesis with latent diffusion models, " in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2022, pp. 10674-10685.
J. Urain, N. Funk, J. Peters, and G. Chalvatzaki, "SE(3)-DiffusionFields: Learning smooth cost functions for joint grasp and motion optimization through diffusion, " 2022, arXiv:2209. 03855.
S. Ekvall and D. Kragic, "Learning and evaluation of the approach vector for automatic grasp generation and planning, " in Proc. IEEE Int. Conf. Robot. Autom., Apr. 2007, pp. 4715-4720.
M. Ciocarlie, K. Hsiao, E. G. Jones, S. Chitta, R. B. Rusu, and I. A. Sucan, "Towards reliable grasping and manipulation in household environments, " in Proc. 12th Int. Symp. Experim. Robot. Cham, Switzerland: Springer, 2014, pp. 241-252.
M. Sundermeyer, A. Mousavian, R. Triebel, and D. Fox, "Contact-GraspNet: Efficient 6-DoF grasp generation in cluttered scenes, " in Proc. IEEE Int. Conf. Robot. Autom. (ICRA), May 2021, pp. 13438-13444.
Z. Jiang, Y. Zhu, M. Svetlik, K. Fang, and Y. Zhu, "Synergies between affordance and geometry: 6-DoF grasp detection via implicit representations, " 2021, arXiv:2104. 01542.
P. Li, T. Liu, Y. Li, Y. Geng, Y. Zhu, Y. Yang, and S. Huang, "GenDex-Grasp: Generalizable dexterous grasping, " 2022, arXiv:2210. 00722.
Z. Liu, H. Tang, Y. Lin, and S. Han, "Point-voxel CNN for efficient 3D deep learning, " in Proc. Adv. Neural Inf. Process. Syst., vol. 32, 2019, Art. no. 87, pp. 965-975.
E. Perez, F. Strub, H. De Vries, V. Dumoulin, and A. Courville, "FiLM: Visual reasoning with a general conditioning layer, " in Proc. AAAI Conf. Artif. Intell., 2018, vol. 32, no. 1, Art. no. 483, pp. 3942-3951.
J. L. Crassidis and F. L. Markley, "Attitude estimation using modified Rodrigues parameters, " in Proc. Flight Mechanics/Estimation Theory Symp., 1996, pp. 71-83.
V. Makoviychuk, L. Wawrzyniak, Y. Guo, M. Lu, K. Storey, M. Macklin, D. Hoeller, N. Rudin, A. Allshire, A. Handa, and G. State, "Isaac gym: High performance GPU-based physics simulation for robot learning, " 2021, arXiv:2108. 10470.
J. Song, C. Meng, and S. Ermon, "Denoising diffusion implicit models, " 2020, arXiv:2010. 02502.
A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo, P. Dollár, and R. Girshick, "Segment anything, " 2023, arXiv:2304. 02643.