[en] In computer vision, camera pose estimation from correspondences between 3D geometric entities and their projections into the image has been a widely investigated problem. Although most state-of-the-art methods exploit low-level primitives such as points or lines, the emergence of very effective CNN-based object detectors in the recent years has paved the way to the use of higher-level features carrying semantically meaningful information. Pioneering works in that direction have shown that modelling 3D objects by ellipsoids and 2D detections by ellipses offers a convenient manner to link 2D and 3D data. However, the mathematical formalism most often used in the related litterature does not enable to easily distinguish ellipsoids and ellipses from other quadrics and conics, leading to a loss of specificity potentially detrimental in some developments. Moreover, the linearization process of the projection equation creates an over-representation of the camera parameters, also possibly causing an efficiency loss. In this paper, we therefore introduce an ellipsoid-specific theoretical framework and demonstrate its beneficial properties in the context of pose estimation. More precisely, we first show that the proposed formalism enables to reduce the pose estimation problem to a position or orientation-only estimation problem in which the remaining unknowns can be derived in closed-form. Then, we demonstrate that it can be further reduced to a 1 Degree-of-Freedom (1DoF) problem and provide the analytical derivations of the pose as a function of that unique scalar unknown. We illustrate our theoretical considerations by visual examples and include a discussion on the practical aspects. Finally, we release this paper along with the corresponding source code in order to contribute towards more efficient resolutions of ellipsoid-related pose estimation problems. The source code is available here: https://gitlab.inria.fr/vgaudill/p1e .
Disciplines :
Sciences informatiques
Auteur, co-auteur :
GAUDILLIERE, Vincent ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > CVI2 ; Université de Lorraine, CNRS, Inria, LORIA, F-54000 Nancy, France
SIMON, Gilles; Université de Lorraine, CNRS, Inria, LORIA, F-54000 Nancy, France
BERGER, Marie-Odile; Université de Lorraine, CNRS, Inria, LORIA, F-54000 Nancy, France
Co-auteurs externes :
yes
Langue du document :
Anglais
Titre :
Perspective-1-Ellipsoid: Formulation, Analysis and Solutions of the Camera Pose Estimation Problem from One Ellipse-Ellipsoid Correspondence
Date de publication/diffusion :
septembre 2023
Titre du périodique :
International Journal of Computer Vision
ISSN :
0920-5691
Maison d'édition :
Springer
Titre particulier du numéro :
Special Issue on Traditional Computer Vision in the Age of Deep Learning
FNR14755859 - Multi-modal Fusion Of Electro-optical Sensors For Spacecraft Pose Estimation Towards Autonomous In-orbit Operations, 2020 (01/01/2021-31/12/2023) - Djamila Aouada
Subventionnement (détails) :
The work presented in this paper was carried out at Université de Lorraine, CNRS, Inria, LORIA. The writing effort was partly funded by the Luxembourg National Research Fund (FNR) under the project reference BRIDGES2020/IS/14755859/MEET-A/Aouada.
Avron, H., Ng, E., & Toledo, S. (2008). A generalized courant-fischer minimax theorem. https://escholarship.org/uc/item/4gb4t762.
Bonin-Font, F., Ortiz, A., & Oliver, G. (2008). Visual navigation for mobile robots: A survey. Journal of Intelligent and Robotic Systems, 53(3), 263–296. DOI: 10.1007/s10846-008-9235-4
Crocco, M., Rubino, C., & Del Bue, A. (2016). Structure from motion with objects. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Dong, W., & Isler, V. (2021). Ellipse regression with predicted uncertainties for accurate multi-view 3d object estimation. CoRR arXiv:2101.05212.
Dong, W., Roy, P., Peng, C., & Isler, V. (2021). Ellipse R-CNN: Learning to infer elliptical object from clustering and occlusion. IEEE Transactions on Image Processing, 30, 2193–2206. DOI: 10.1109/TIP.2021.3050673
Eberly, D. (2007). Reconstructing an ellipsoid from its perspective projection onto a plane. https://www.geometrictools.com/. Updated version: March 1, 2008.
Fischler, M. A., & Bolles, R. C. (1981). Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24(6), 381–395. DOI: 10.1145/358669.358692
Garg, S., Sünderhauf, N., Dayoub, F., Morrison, D., Cosgun, A., Carneiro, G., Wu, Q., Chin, T. J., Reid, I., Gould, S., Corke, P., & Milford, M. (2020). Semantics for robotic mapping, perception and interaction: A survey. Foundations and Trends in Robotics, 8(1–2), 1–224. DOI: 10.1561/2300000059
Gaudillière, V., Simon, G., & Berger, M. O. (2019a). November. Camera Pose Estimation with Semantic 3D Model. In IROS 2019 - 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems, Macau, Macau SAR China.
Gaudillière, V., Simon, G., & Berger, M. O. (2019b). Camera Relocalization with Ellipsoidal Abstraction of Objects. In ISMAR 2019 - 18th IEEE International Symposium on Mixed and Augmented Reality, Beijing, China.
Gaudillière, V., Simon, G., & Berger, M. O. (2019c). Perspective-12-Quadric: An analytical solution to the camera pose estimation problem from conic - quadric correspondences.
Gaudillière, V., Simon, G. & Berger, M. O. (2020a). Perspective-2-Ellipsoid: Bridging the Gap Between Object Detections and 6-DoF Camera Pose. In IROS 2020 – 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems, Las Vegas, United States.
Gaudillière, V., Simon, G., & Berger, M. O. (2020). Perspective-2-Ellipsoid: Bridging the gap between object detections and 6-DoF camera pose. IEEE Robotics and Automation Letters, 5189–5196.
Gay, P., Rubino, C., Bansal, V., & Del Bue, A. (2017). Probabilistic structure from motion with objects (psfmo). In Proceedings of the IEEE international conference on computer vision (ICCV).
Girshick, R. (2015). Fast r-cnn. In Proceedings of the IEEE international conference on computer vision (ICCV).
Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR).
Golub, G. H., & Van Loan, C. F. (1996). Matrix computations (3rd ed.). The Johns Hopkins University Press.
Hartley, R. I., & Zisserman, A. (2004). Multiple view geometry in computer vision (2nd ed.). Cambridge University Press. DOI: 10.1017/CBO9780511811685
Hoque, S., Arafat, M. Y., Xu, S., Maiti, A., & Wei, Y. (2021). A comprehensive review on 3d object detection and 6d pose estimation with deep learning. IEEE Access, 9, 143746–143770. DOI: 10.1109/ACCESS.2021.3114399
Kisantal, M., Sharma, S., Park, T. H., Izzo, D., Märtens, M., & D’Amico, S. (2020). Satellite pose estimation challenge: Dataset, competition design, and results. IEEE Transactions on Aerospace and Electronic Systems, 56(5), 4083–4098. DOI: 10.1109/TAES.2020.2989063
Lang, S. (2002). Algebra. Graduate Texts in Mathematics, Vol. 211 (Revised third ed.).
Lepetit, V., Moreno-Noguer, F., & Fua, P. (2009). Epnp: An accurate O(n) solution to the pnp problem. International Journal of Computer Vision, 81(2), 155–166. DOI: 10.1007/s11263-008-0152-6
Li, J., Meger, D., & Dudek, G. (2017). Context-coherent scenes of objects for camera pose estimation. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2017, IEEE, Vancouver, BC, Canada, September 24–28, 2017 (pp. 655–660).
Li, J., Meger, D., & Dudek, G. (2019). Semantic mapping for view-invariant relocalization. In International Conference on Robotics and Automation, ICRA 2019, IEEE, Montreal, QC, Canada, May 20–24, 2019 (pp. 7108–7115).
Li, J., Xu, Z., Meger, D., & Dudek, G. (2018). Semantic scene models for visual localization under large viewpoint changes. In 15th Conference on Computer and Robot Vision (CRV), Toronto, Canada.
Li, Y. (2019). Detecting lesion bounding ellipses with gaussian proposal networks. In Machine Learning in Medical Imaging - 10th International Workshop, MLMI 2019, Held in Conjunction with MICCAI 2019, Shenzhen, China, October 13, 2019, Proceedings, Volume 11861 of Lecture Notes in Computer Science (pp. 337–344). Springer.
Liao, Z., Wang, W., Qi, X., Zhang, X., Xue, L., Jiao, J. & Wei, R. (2020). Object-oriented SLAM using quadrics and symmetry properties for indoor environments. Corr arxiv:2004.05303
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S. E., Fu, C., & Berg, A. C. (2016). SSD: single shot multibox detector. In Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part I, Volume 9905 of Lecture Notes in Computer Science (pp. 21–37). Springer.
Marchand, É., Uchiyama, H., & Spindler, F. (2016). Pose estimation for augmented reality: A hands-on survey. IEEE Transactions on Visualization and Computer Graphics, 22(12), 2633–2651. DOI: 10.1109/TVCG.2015.2513408
Nicholson, L., Milford, M., & Sünderhauf, N. (2019). Quadricslam: Dual quadrics from object detections as landmarks in object-oriented slam. IEEE Robotics and Automation Letters, 4(1), 1–8. DOI: 10.1109/LRA.2018.2866205
Pan, S., Fan, S., Wong, S. W. K., Zidek, J. V. & Rhodin, H. (2021). Ellipse detection and localization with applications to knots in sawn lumber images. In IEEE Winter Conference on Applications of Computer Vision, IEEE, WACV 2021, Waikoloa, HI, USA, January 3–8, 2021 (pp. 3891–3900).
Park, T. H., Märtens, M., Jawaid, M., Wang, Z., Chen, B., Chin, T. J., Izzo, D., & D’Amico, S. (2023). Satellite pose estimation competition 2021: Results and analyses. Acta Astronautica. 10.1016/j.actaastro.2023.01.002 DOI: 10.1016/j.actaastro.2023.01.002
Rathinam, A., Gaudillière, V., Pauly, L., & Aouada, D. (2022). Pose estimation of a known texture-less space target using convolutional neural networks. In 73rd International Astronautical Congress (IAC), Paris, France, 18-22 September 2022.
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Redmon, J., & Farhadi, A. (2017). Yolo9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Redmon, J., & Farhadi, A. (2018). Yolov3: An incremental improvement. arXiv:1804.02767.
Rubino, C., Crocco, M., & Bue, A. D. (2018). 3d object localisation from multi-view image detections. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(6), 1281–1294.
Sturm, J., Engelhard, N., Endres, F., Burgard, W., & Cremers, D. (2012). A benchmark for the evaluation of RGB-D SLAM systems. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2012, IEEE, Vilamoura, Algarve, Portugal, October 7–12, 2012 (pp. 573–580).
Tian, R., Zhang, Y., Feng, Y., Yang, L., Cao, Z., Coleman, S., & Kerr, D. (2021). Accurate and robust object-oriented SLAM with 3d quadric landmark construction in outdoor environment. CoRR arXiv:2110.08977.
Wokes, D. S., & Palmer, P. L. (2010). Perspective reconstruction of a spheroid from an image plane ellipse. International Journal of Computer Vision, 90(3), 369–379. DOI: 10.1007/s11263-010-0368-0
Wylie, C. R. (2008). Introduction to projective geometry. Dover Publications.
Xu, C., Zhang, L., Cheng, L., & Koch, R. (2017). Pose estimation from line correspondences: A complete analysis and a series of solutions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6), 1209–1222. DOI: 10.1109/TPAMI.2016.2582162
Zhao, M., Jia, X., Fan, L., Liang, Y., & Yan, D. (2021). Robust ellipse fitting using hierarchical gaussian mixture models. IEEE Transactions on Image Processing, 30, 3828–3843. DOI: 10.1109/TIP.2021.3065799
Zins, M., Simon, G., & Berger, M. O. (2020). 3d-aware ellipse prediction for object-based camera pose estimation. In 8th International Conference on 3D Vision, 3DV 2020, IEEE, Virtual Event, Japan, November 25–28, 2020 (pp. 281–290).