Weakly-Supervised Free Space Estimation through Stochastic Co-Teaching

[en] Free space estimation is an important problem for autonomous robot navigation. Traditional camera-based approaches train a segmentation model using an annotated dataset. The training data needs to capture the wide variety of environments and weather conditions encountered at runtime, making the annotation cost prohibitively high. In this work, we propose a novel approach for obtaining free space estimates from images taken with a single road-facing camera. We rely on a technique that generates weak free space labels without any supervision, which are then used as ground truth to train a segmentation model for free space estimation. Our work differs from prior attempts by explicitly taking label noise into account through the use of Co-Teaching. Since Co-Teaching has traditionally been investigated in classification tasks, we adapt it for segmentation and examine how its parameters affect performances in our experiments. In addition, we propose Stochastic Co-Teaching, which is a novel method to select clean samples that leads to enhanced results. We achieve an IoU of 82.6%, a Precision of 90.9%, and a Recall of 90.3%. Our best model reaches 87% of the IoU, 93% of the Precision, and 93% of the Recall of the equivalent fully-supervised baseline while using no human annotations. To the best of our knowledge, this work is the first to use Co-Teaching to train a free space segmentation model under explicit label noise. Our implementation and trained models are freely available online.

Research center :

Interdisciplinary Centre for Security, Reliability and Trust (SnT) > Services and Data management research group (SEDAN)

Disciplines :

Computer science

Author, co-author :

ROBINET, François ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SEDAN

PARERA, Claudia ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SEDAN

HUNDT, Christian ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)

FRANK, Raphaël ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SEDAN

External co-authors :

Language :

English

Title :

Weakly-Supervised Free Space Estimation through Stochastic Co-Teaching

Publication date :

04 January 2022

Event name :

IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops 2022

Event place :

United States - Hawaii

Event date :

4-01-2022 to 8-01-2022

Audience :

International

Main work title :

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops, 2022

Pages :

618-627

Peer reviewed :

Peer reviewed

Focus Area :

Computational Sciences

Additional URL :

https://openaccess.thecvf.com/content/WACV2022W/HPIV/html/Robinet_Weakly-Supervised_Free_Space_Estimation_Through_Stochastic_Co-Teaching_WACVW_2022_paper.html

FnR Project :

FNR13301060 - Machine Learning For Risk Assessment In Semi-autonomous Vehicles, 2018 (01/10/2018-31/08/2022) - François Robinet

Available on ORBilu :

since 04 February 2022

Statistics

Number of views

236 (55 by Unilu)

Number of downloads

92 (16 by Unilu)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

Bibliography

Hernán Badino, Uwe Franke, and David Pfeiffer. The stixel world-a compact medium level representation of the 3dworld. In Joachim Denzler, Gunther Notni, and Herbert S¨uße, editors, Pattern Recognition, pages 51-60, Berlin, Heidelberg, 2009. Springer Berlin Heidelberg.
Vijay Badrinarayanan, Alex Kendall, and Roberto Cipolla. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE transactions on pattern analysis and machine intelligence, 39(12):2481-2495, 2017.
Amy Bearman, Olga Russakovsky, Vittorio Ferrari, and Li Fei-Fei. What's the point: Semantic segmentation with point supervision. In Computer Vision-ECCV 2016, Lecture Notes in Computer Science (LNCS), pages 549-565. Springer International Publishing, Sept. 2016. 14th European Conference on Computer Vision 2016, ECCV 2016 ; Conference date: 08-10-2016 Through 16-10-2016.
Yoshua Bengio, Jérome Louradour, Ronan Collobert, and Jason Weston. Curriculum learning. In Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, page 41-48, New York, NY, USA, 2009. Association for Computing Machinery.
Simon Chadwick and Paul Newman. Radar as a teacher: Weakly supervised vehicle detection using radar labels. In 2020 IEEE International Conference on Robotics and Automation (ICRA), pages 222-228, 2020.
Yu-Ting Chang, QiaosongWang, Wei-Chih Hung, Robinson Piramuthu, Yi-Hsuan Tsai, and Ming-Hsuan Yang. Mixupcam: Weakly-supervised semantic segmentation via uncertainty regularization. In 31st British Machine Vision Conference 2020, BMVC 2020, Virtual Event, UK, September 7-10, 2020. BMVA Press, 2020.
Pengfei Chen, Ben Ben Liao, Guangyong Chen, and Shengyu Zhang. Understanding and utilizing deep neural networks trained with noisy labels. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 1062-1070. PMLR, 09-15 Jun 2019.
F Chiaroni, M-C Rahal, N. Hueber, and Frédéric Dufaux. Hallucinating a Cleanly Labeled Augmented Dataset from a Noisy Labeled Dataset Using GANs. In IEEE, editor, 26th IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, Sept. 2019.
Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Scharw¨achter, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. The cityscapes dataset. In CVPR Workshop on the Future of Datasets in Vision, volume 2, 2015.
Marius Cordts, Timo Rehfeld, Lukas Schneider, David Pfeiffer, Markus Enzweiler, Stefan Roth, Marc Pollefeys, and Uwe Franke. The stixel world: A medium-level representation of traffic scenes. Image and Vision Computing, 68, 02 2017.
Jifeng Dai, Kaiming He, and Jian Sun. Boxsup: Exploiting bounding boxes to supervise convolutional networks for semantic segmentation. In Proceedings of the IEEE international conference on computer vision, pages 1635-1643, 2015.
Andrew J. Davison, Ian D. Reid, Nicholas D. Molton, and Olivier Stasse. Monoslam: Real-time single camera slam. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(6):1052-1067, 2007.
T. Durand, T. Mordan, N. Thome, and M. Cord. Wildcat: Weakly supervised learning of deep convnets for image classification, pointwise localization and segmentation. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 5957-5966, 2017.
Jakob Engel, Thomas Sch¨ops, and Daniel Cremers. Lsdslam: Large-scale direct monocular slam. In David Fleet, Tomas Pajdla, Bernt Schiele, and Tinne Tuytelaars, editors, Computer Vision-ECCV 2014, pages 834-849, Cham, 2014. Springer International Publishing.
Sheng Guo, Weilin Huang, Haozhi Zhang, Chenfan Zhuang, Dengke Dong, Matthew R. Scott, and Dinglong Huang. Curriculumnet: Weakly supervised learning from large-scale web images. In Proceedings of the European Conference on Computer Vision (ECCV), September 2018.
Bo Han, Quanming Yao, Xingrui Yu, Gang Niu, Miao Xu, Weihua Hu, Ivor Tsang, and Masashi Sugiyama. Coteaching: Robust training of deep neural networks with extremely noisy labels. In Advances in neural information processing systems, pages 8527-8537, 2018.
Ali Harakeh, Daniel Asmar, and Elie Shammas. Identifying good training data for self-supervised free space estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016.
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770-778, 2016.
Geoffrey Hinton, Oriol Vinyals, and Jeffrey Dean. Distilling the knowledge in a neural network. In NIPS Deep Learning and Representation Learning Workshop, 2015.
Judy Hoffman, DequanWang, Fisher Yu, and Trevor Darrell. Fcns in the wild: Pixel-level adversarial and constraint-based adaptation. CoRR, abs/1612.02649, 2016.
Fabian Isensee, Jens Petersen, André Klein, David Zimmerer, Paul F. Jaeger, Simon Kohl, JakobWasserthal, Gregor Koehler, Tobias Norajitra, Sebastian J.Wirkert, and Klaus H. Maier-Hein. nnu-net: Self-adapting framework for u-netbased medical image segmentation. CoRR, abs/1809.10486, 2018.
J. Janai, F. G¨uney, A. Behl, and Andreas Geiger. Computer vision for autonomous vehicles: Problems, datasets and state-of-the-art. ArXiv, abs/1704.05519, 2020.
S. Jégou, M. Drozdzal, David Vázquez, A. Romero, and Yoshua Bengio. The one hundred layers tiramisu: Fully convolutional densenets for semantic segmentation. 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 1175-1183, 2017.
Lu Jiang, Zhengyuan Zhou, Thomas Leung, Li-Jia Li, and Li Fei-Fei. MentorNet: Learning data-driven curriculum for very deep neural networks on corrupted labels. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 2304-2313. PMLR, 10-15 Jul 2018.
Davood Karimi, Haoran Dou, Simon K. Warfield, and Ali Gholipour. Deep learning with noisy labels: Exploring techniques and remedies in medical image analysis. Medical Image Analysis, 65:101759, 2020.
Hoel Kervadec, Jose Dolz, Shanshan Wang, Eric Granger, and Ismail ben Ayed. Bounding boxes for weakly supervised segmentation: Global constraints get close to full supervision. In Medical Imaging with Deep Learning, 2020.
A. Khoreva, R. Benenson, J. Hosang, M. Hein, and B. Schiele. Simple does it: Weakly supervised instance and semantic segmentation. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1665-1674, 2017.
Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. CoRR, abs/1412.6980, 2015.
Raphael Labayrade, Didier Aubert, and J-P Tarel. Real time obstacle detection in stereovision on non flat road geometry through" v-disparity" representation. In Intelligent Vehicle Symposium, 2002. IEEE, volume 2, pages 646-651. IEEE, 2002.
A. Laddha, M. K. Kocamaz, L. E. Navarro-Serment, and M. Hebert. Map-supervised road detection. In 2016 IEEE Intelligent Vehicles Symposium (IV), pages 118-123, 2016.
Mingchen Li, Mahdi Soltanolkotabi, and Samet Oymak. Gradient descent with early stopping is provably robust to label noise for overparameterized neural networks. In International Conference on Artificial Intelligence and Statistics, pages 4313-4324. PMLR, 2020.
Yuncheng Li, Jianchao Yang, Yale Song, Liangliang Cao, Jiebo Luo, and Li-Jia Li. Learning from noisy labels with distillation. In 2017 IEEE International Conference on Computer Vision (ICCV), pages 1928-1936, 2017.
D. Lin, J. Dai, J. Jia, K. He, and J. Sun. Scribblesup: Scribble-supervised convolutional networks for semantic segmentation. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3159-3167, 2016.
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European conference on computer vision, pages 740-755. Springer, 2014.
Jonathan Long, Evan Shelhamer, and Trevor Darrell. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3431-3440, 2015.
Zhiwu Lu, Zhenyong Fu, Tao Xiang, Peng Han, LiweiWang, and Xin Gao. Learning from weak and noisy labels for semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39:486-500, 03 2017.
J. Mairal, M. Elad, and G. Sapiro. Sparse representation for color image restoration. Trans. Img. Proc., 17(1):53-69, Jan. 2008.
Eran Malach and Shai Shalev-Shwartz. Decoupling "when to update" from "how to update". In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
Jakob Mayr, Christian Unger, and Federico Tombari. Selfsupervised learning of the drivable area for autonomous vehicles. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 362-369. IEEE, 2018.
Richard Newcombe, Steven Lovegrove, and Andrew Davison. Dtam: Dense tracking and mapping in real-time. pages 2320-2327, 11 2011.
Ozan Oktay, Jo Schlemper, Loic Folgoc, Matthew Lee, Mattias Heinrich, Kazunari Misawa, Kensaku Mori, Steven Mc-Donagh, Nils Hammerla, Bernhard Kainz, Ben Glocker, and Daniel Rueckert. Attention u-net: Learning where to look for the pancreas. 04 2018.
G. L. Oliveira,W. Burgard, and T. Brox. Efficient deep models for monocular road segmentation. In 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 4885-4891, 2016.
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. Pytorch: An imperative style, high-performance deep learning library. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems 32, pages 8024-8035. Curran Associates, Inc., 2019.
Jizong Peng, Guillermo Estrada, Marco Pedersoli, and Christian Desrosiers. Deep co-training for semi-supervised image segmentation, 2019.
P. O. Pinheiro and R. Collobert. From image-level to pixellevel labeling with convolutional networks. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1713-1721, 2015.
François Robinet, Antoine Demeules, Raphäel Frank, Georgios Varisteas, and Christian Hundt. Leveraging privileged information to limit distraction in end-to-end lane following. In 2020 IEEE 17th Annual Consumer Communications Networking Conference (CCNC), pages 1-6, 2020.
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. Unet: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pages 234-241. Springer, 2015.
Sainbayar Sukhbaatar, Joan Bruna, Manohar Paluri, Lubomir Bourdev, and Rob Fergus. Training convolutional networks with noisy labels. Jan. 2015. 3rd International Conference on Learning Representations, ICLR 2015 ; Conference date: 07-05-2015 Through 09-05-2015.
Satoshi Tsutsui, Tommi Kerola, Shunta Saito, and David J Crandall. Minimizing supervision for free-space segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 988-997, 2018.
S. Tsutsui, S. Saito, and T. Kerola. Distantly supervised road segmentation. In 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), pages 174-181, 2017.
Satoshi Tsutsui, Shunta Saito, and Tommi Kerola. Distantly supervised road segmentation. 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), pages 174-181, 2017.
Jamie Watson, Michael Firman, Aron Monszpart, and Gabriel J. Brostow. Footprints and free space from a single color image. In Computer Vision and Pattern Recognition (CVPR), 2020.
Liang Xiao, Bin Dai, Daxue Liu, Tingbo Hu, and Tao Wu. Crf based road detection with multi-sensor fusion. In 2015 IEEE Intelligent Vehicles Symposium (IV), pages 192-198, 2015.
Wenbin Xie, Qiaoqiao Wei, Zheng Li, and Hui Zhang. Learning effectively from noisy supervision for weakly supervised semantic segmentation. In BMVC, 2020.
Pavel Yakubovskiy. Segmentation models. https: //github.com/qubvel/segmentationmodels, 2019.
Jian Yao, Srikumar Ramalingam, Yuichi Taguchi, Yohei Miki, and Raquel Urtasun. Estimating drivable collision-free space from monocular video. In 2015 IEEE Winter Conference on Applications of Computer Vision, pages 420-427, 2015.
Quanming Yao, Hansi Yang, Bo Han, Gang Niu, and James Tin-Yau Kwok. Searching to exploit memorization effect in learning with noisy labels. In Hal Daumé III and Aarti Singh, editors, Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 10789-10798. PMLR, 13-18 Jul 2020.
Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. Understanding deep learning requires rethinking generalization. Communications of the ACM, 64, 11 2016.
Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, and Jiaya Jia. Pyramid scene parsing network. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 6230-6239, 2017.