[en] Identifying traversable space is one of the most important problems in autonomous robot navigation and is primarily tackled using learning-based methods. To alleviate the prohibitively high annotation-cost associated with labeling large and diverse datasets, research has recently shifted from traditional supervised methods to focus on unsupervised and semi-supervised approaches. This work focuses on monocular road segmentation and proposes a practical, generic, and minimally-supervised approach based on task-specific feature extraction and pseudo-labeling. Building on recent advances in monocular depth estimation models, we process approximate dense depth maps to estimate pixel-wise road-plane distance maps. These maps are then used in both unsupervised and semi-supervised road segmentation scenarios. In the unsupervised case, we propose a pseudo-labeling pipeline that reaches state-of-the-art Intersection-over-Union (IoU), while reducing complexity and computations compared to existing approaches. We also investigate a semi-supervised extension to our method and find that even minimal labeling efforts can greatly improve results. Our semi-supervised experiments using as little as 1% and 10% of ground truth data, yield models scoring 0.9063 and 0.9332 on the IoU metric respectively. These results correspond to a comparative performance of 95.9% and 98.7% of a fully-supervised model's IoU score, which motivates a pragmatic approach to labeling.
Research center :
Interdisciplinary Centre for Security, Reliability and Trust (SnT) > Services and Data management research group (SEDAN)
Disciplines :
Computer science
Author, co-author :
Robinet, François ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SEDAN
Akl, Yussef ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)
Ullah, Kaleem ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SEDAN
Nozarian, Farzad
Müller, Christian
Frank, Raphaël ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SEDAN
External co-authors :
yes
Language :
English
Title :
Striving for Less: Minimally-Supervised Pseudo-Label Generation for Monocular Road Segmentation
Publication date :
October 2022
Journal title :
IEEE Robotics and Automation Letters
ISSN :
2377-3766
Publisher :
Institute of Electrical and Electronics Engineers, New York, United States - New York
M. Cordts et al., "The cityscapes dataset," in Proc. CVPRWorkshop Future Datasets Vis., 2015. [Online]. Available: https://dblp.org/rec/conf/bmvc/XieWLZ20.html?view=bibtex
J. Tremblay et al., "Training deep networks with synthetic data: Bridging the reality gap by domain randomization," in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Workshops, 2018, pp. 969-977.
A. Harakeh, D. Asmar, and E. Shammas, "Identifying good training data for self-supervised free space estimation," in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 3530-3538.
S. Tsutsui, T.Kerola, S. Saito, and D. J. Crandall, "Minimizing supervision for free-space segmentation," in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Workshops, 2018, pp. 988-997.
J. Mayr, C. Unger, and F. Tombari, "Self-supervised learning of the drivable area for autonomous vehicles," in Proc. IEEE/RSJ Int.Conf. Intell. Robots Syst., 2018, pp. 362-369.
F. Robinet, C. Parera, C. Hundt, and R. Frank, "Weakly-supervised free space estimation through stochastic co-teaching," in Proc. IEEE/CVF Winter Conf. Appl. Comput. Vis. Workshops, 2022, pp. 618-627.
F. Robinet and R. Frank, "Refining weakly-supervised free space estimation through data augmentation and recursive training," in Proc. Artif. Intell. Mach. Learn., 2022, pp. 30-45.
L.-C. Chen et al., "Naive-student: Leveraging semi-supervised learning in video sequences for urban scene segmentation," in Proc. Eur. Conf. Comput. Vis., 2020, pp. 695-714.
L. Hoyer, D. Dai, Y. Chen, A. Koring, S. Saha, and L. Gool, "Three ways to improve semantic segmentation with self-supervised depth estimation," in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2021, pp. 11125-11135.
A. Tarvainen and H. Valpola, "Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results," in Proc. Adv. Neural Inf. Process. Syst., 2017, pp. 1195-1204.
C. Godard, O. Mac Aodha, M. Firman, and G. Brostow, "Digging into self-supervised monocular depth estimation," in Proc. Int. Conf. Comput. Vis., 2019, pp. 3827-3837.
A. J. Davison, I. D. Reid, N. D. Molton, and O. Stasse, "MonoSLAM: Real-time single camera SLAM," IEEE Trans. Pattern Anal.Mach. Intell., vol. 29, no. 6, pp. 1052-1067, Jun. 2007.
J. Watson, M. Firman, A. Monszpart, and G. J. Brostow, "Footprints and free space from a single color image," in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 11-20.
R. Labayrade, D. Aubert, and J.-P. Tarel, "Real time obstacle detection in stereovision on non flat road geometry through "v-disparity" representation," in Proc. Intell. Veh. Symp., 2002, pp. 646-651.
H. Wang, Y. Sun, and M. Liu, "Self-supervised drivable area and road anomaly segmentation using RGB-D data for robotic wheelchairs," IEEE Robot. Automat. Lett., vol. 4, no. 4, pp. 4386-4393, Oct. 2019.
D. Seichter,M. Köhler, B. Lewandowski, T.Wengefeld, and H.-M. Gross, "Efficient RGB-D semantic segmentation for indoor scene analysis," in Proc. IEEE Int. Conf. Robot. Automat., 2021, pp. 13525-13531.
Q. Xie, Z. Dai, E. Hovy, M.-T. Luong, and Q. V. Le, "Unsupervised data augmentation for consistency training," in Proc. 34th Int. Conf. Neural Inf. Process. Syst., Red Hook, NY, USA: Curran Associates Inc., 2020, p. 13, Art. no. 525.
T. Miyato, S.-i. Maeda, M. Koyama, and S. Ishii, "Virtual adversarial training: A regularization method for supervised and semi-supervised learning," IEEE Trans. Pattern Anal. Mach. Intell., vol. 41, no. 8, pp. 1979-1993, Aug. 2019.
S. Yun, D. Han, S. J. Oh, S. Chun, J. Choe, and Y. Yoo, "Cutmix: Regularization strategy to train strong classifiers with localizable features," in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2019, pp. 6022-6031.
M. Li, M. Soltanolkotabi, and S. Oymak, "Gradient descent with early stopping is provably robust to label noise for overparameterized neural networks," in Proc. Int. Conf. Artif. Intell. Statist., 2020, pp. 4313-4324.
B. Han et al., "Co-teaching: Robust training of deep neural networks with extremely noisy labels," in Proc. Adv. Neural Inf. Process. Syst., 2018, pp. 8527-8537.
D. Karimi, H. Dou, S. K. Warfield, and A. Gholipour, "Deep learning with noisy labels: Exploring techniques and remedies in medical image analysis," Med. Image Anal., vol. 65, 2020, Art. no. 101759.
H. Hirschmuller, "Stereo processing by semiglobal matching and mutual information," IEEE Trans. Pattern Anal. Mach. Intell., vol. 30, no. 2, pp. 328-341, Feb. 2008.
A. Geiger, P. Lenz, andR.Urtasun, "Are we ready for autonomous driving? The KITTI vision benchmark suite," in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2012, pp. 3354-3361.
T. Zhou, M. Brown, N. Snavely, and D. G. Lowe, "Unsupervised learning of depth and ego-motion from video," in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 6612-6619.
M. A. Fischler and R. C. Bolles, "Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography," Commun. ACM, vol. 24, no. 6, pp. 381-395, 1981.
P. F. Felzenszwalb and D. P. Huttenlocher, "Efficient graph-based image segmentation," Int. J. Comput. Vis., vol. 59, no. 2, pp. 167-181, 2004.
J. Dai, K. He, and J. Sun, "Boxsup: Exploiting bounding boxes to supervise convolutional networks for semantic segmentation," in Proc. IEEE Int. Conf. Comput. Vis., 2015, pp. 1635-1643.
W. Xie, Q. Wei, Z. Li, and H. Zhang, "Learning effectively from noisy supervision for weakly supervised semantic segmentation," in Proc. Brit. Machin. Vis. Conf., 2020.
P. O. Pinheiro and R. Collobert, "From image-level to pixel-level labeling with convolutional networks," in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2015, pp. 1713-1721.
T. Durand, T. Mordan, N. Thome, and M. Cord, "WILDCAT: Weakly supervised learning of deep convnets for image classification, pointwise localization and segmentation," in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 5957-5966.
Y. Chang, Q. Wang, W. Hung, R. Piramuthu, Y. Tsai, and M. Yang, "Mixup-CAM:Weakly-supervised semantic segmentation via uncertainty regularization," in Proc. 31st Brit. Mach. Vis. Conf., 2020. [Online]. Available: https://dblp.org/rec/conf/bmvc/ChangWHPTY20.html?view= bibtex
F. Isensee et al., "nnU-Net: Self-adapting framework for U-Net-based medical image segmentation," 2018. [Online]. Available: https://www. nature.com/articles/s41592-020-01008-z#citeas
O. Ronneberger, P. Fischer, and T. Brox, "U-Net: Convolutional networks for biomedical image segmentation," in Proc. Int. Conf. Med. Image Comput. Comput.-Assist. Intervention, 2015, pp. 234-241.
K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 770-778.
A. Paszke et al., "Pytorch: An imperative style, high-performance deep learning library," in Proc. Adv. Neural Inf. Process. Syst., 2019, pp. 8024-8035. [Online]. Available: http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library. Pdf