Computer Vision for Transportation; Deep Learning for Visual Perception; Human Detection and Tracking; Multi-Modal Perception for HRI; Sensor Fusion; Artificial Intelligence
Abstract :
[en] Multispectral pedestrian detection has gained significant attention in recent years, particularly in autonomous driving applications. To address the challenges posed by adversarial illumination conditions, the combination of thermal and visible images has demonstrated its advantages. However, existing fusion methods rely on the critical assumption that the RGB-Thermal (RGB-T) image pairs are fully overlapping. These assumptions often do not hold in real-world applications, where only partial overlap between images can occur due to sensors configuration. Moreover, sensor failure can cause loss of information in one modality. In this paper, we propose a novel module called the Hybrid Attention (HA) mechanism as our main contribution to mitigate performance degradation caused by partial overlap and sensor failure, i.e. when at least part of the scene is acquired by only one sensor. We propose an improved RGB-T fusion algorithm, robust against partial overlap and sensor failure encountered during inference in real-world applications. We also leverage a mobile-friendly backbone to cope with resource constraints in embedded systems. We conducted experiments by simulating various partial overlap and sensor failure scenarios to evaluate the performance of our proposed method. The results demonstrate that our approach outperforms state-of-the-art methods, showcasing its superiority in handling real-world challenges.
Research center :
Interdisciplinary Centre for Security, Reliability and Trust (SnT) > CVI² - Computer Vision Imaging & Machine Intelligence
Disciplines :
Computer science
Author, co-author :
RATHINAM, Arunkumar ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > CVI2
PAULY, Leo ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust > CVI2 > Team Djamila AOUADA
SHABAYEK, Abd El Rahman ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > CVI2
RHARBAOUI, Wassim ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust > CVI2 > Team Djamila AOUADA ; University of Poitiers, Xlim institute, Limoges, France
KACEM, Anis ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > CVI2
GAUDILLIERE, Vincent ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust > CVI2 > Team Djamila AOUADA ; Université de Lorraine, Cnrs, Inria, Loria, Nancy, France
AOUADA, Djamila ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > CVI2
External co-authors :
no
Language :
English
Title :
Hybrid Attention for Robust RGB-T Pedestrian Detection in Real-World Conditions
Publication date :
January 2025
Journal title :
IEEE Robotics and Automation Letters
eISSN :
2377-3766
Publisher :
Institute of Electrical and Electronics Engineers Inc.
S. Hwang, J. Park, N. Kim, Y. Choi, and I. Kweon, "Multispectral pedestrian detection: Benchmark dataset and baselines, " in CVPR, 2015.
S. Krotosky and M. Trivedi, "On color-, infrared-, and multimodal-stereo approaches to pedestrian detection, " IEEE Trans. Intell. Transp. Syst., vol. 8, no. 4, pp. 619-629, 2007.
Z. Guo, X. Li, Q. Xu, and Z. Sun, "Robust semantic segmentation based on rgb-thermal in variable lighting scenes, " Measurement, vol. 186, p. 110176, 2021. [Online] Available: Https: //www. sciencedirect. com/science/article/pii/S0263224121010903
A. González, Z. Fang, Y. Socarras, J. Serrat, D. Vázquez, J. Xu, and A. M. López, "Pedestrian detection at day/night time with visible and fir cameras: A comparison, " Sensors, vol. 16, no. 6, 2016.
T. F. LLC, "Free flir thermal dataset for algorithm training. " [Online] Available: Https: //www. flir. com/oem/adas/adas-dataset-form/
J. Kim, H. Kim, T. Kim, N. Kim, and Y. Choi, "Mlpd: Multi-label pedestrian detector in multispectral domain, " IEEE Robotics Autom. Lett., vol. 6, no. 4, pp. 7846-7853, 2021.
L. Zhang, X. Zhu, X. Chen, X. Yang, Z. Lei, and Z. Liu, "Weakly aligned cross-modal learning for multispectral pedestrian detection, " in ICCV, 2019, pp. 5126-5136.
C. Li, D. Song, R. Tong, and M. Tang, "Multispectral pedestrian detection via simultaneous detection and segmentation, " in BMVC, 2018.
J. Lee, J.-S. Choi, E. Jeon, Y. Kim, T. Thanh Le, K. Shin, H. Lee, and K. Park, "Robust pedestrian detection by combining visible and thermal infrared cameras, " Sensors, vol. 15, no. 5, 2015.
L. Zhang, B. Wu, and R. Nevatia, "Pedestrian detection in infrared images based on local shape features, " in CVPR, 2007.
J. Davis and V. Sharma, "Fusion-based background-subtraction using contour saliency, " in CVPR Worksh., 2005.
Y. Yuan, X. Lu, and X. Chen, "Multi-spectral pedestrian detection, " Signal Processing, vol. 110, pp. 94-100, 2015.
J. Liu, S. Zhang, S. Wang, and D. Metaxas, "Multispectral deep neural networks for pedestrian detection, " in BMVC, 2016.
Y. Zheng, I. H. Izzat, and S. Ziaee, "Gfd-ssd: Gated fusion double ssd for multispectral pedestrian detection, " arXiv preprint arXiv: 1903. 06999, 2019.
W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. E. Reed, C. Fu, and A. C. Berg, "SSD: Single shot multibox detector, " in ECCV, vol. 9905, 2016, pp. 21-37.
C. Li, D. Song, R. Tong, and M. Tang, "Illumination-aware faster rcnn for robust multispectral pedestrian detection, " Pattern Recognition, vol. 85, pp. 161-171, 2019.
D. Guan, Y. Cao, J. Yang, Y. Cao, and M. Yang, "Fusion of multispectral data through illumination-aware deep neural networks for pedestrian detection, " Information Fusion, vol. 50, pp. 148-157, 2019.
K. Zhou, L. Chen, and X. Cao, "Improving multispectral pedestrian detection by addressing modality imbalance problems, " in ECCV, 2020.
Y.-T. Chen, J. Shi, Z. Ye, C. Mertz, D. Ramanan, and S. Kong, "Multimodal object detection via probabilistic ensembling, " in ECCV, 2022.
H. Zhang, E. Fromont, S. Lefèvre, and B. Avignon, "Guided attentive feature fusion for multispectral pedestrian detection, " in WACV, 2021.
Y. Yang, K. Xu, and K. Wang, "Cascaded information enhancement and cross-modal attention feature fusion for multispectral pedestrian detection, " Frontiers in Physics, vol. 11, 2023.
L. Zhang, X. Zhu, X. Chen, X. Yang, Z. Lei, and Z. Liu, "Weakly aligned cross-modal learning for multispectral pedestrian detection, " in ICCV, 2019.
J. U. Kim, S. Park, and Y. M. Ro, "Towards versatile pedestrian detector with multisensory-matching and multispectral recalling memory, " in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 1, 2022, pp. 1157-1165.
N. Narayan, N. Sankaran, S. Setlur, and V. Govindaraju, "Learning deep features for online person tracking using non-overlapping cameras: A survey, " Image and Vision Computing, vol. 89, pp. 222-235, 2019.
R. Eshel and Y. Moses, "Homography based multiple camera detection and tracking of people in a dense crowd, " in CVPR, 2008.
S. M. Khan and M. Shah, "A multiview approach to tracking people in crowded scenes using a planar homography constraint, " in ECCV, 2006.
L. V. Ma, T. T. D. Nguyen, B.-N. Vo, H. Jang, and M. Jeon, "Track initialization and re-identification for 3d multi-view multi-object tracking, " Inf. Fusion, vol. 111, p. 102496, 2024.
M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, "Mobilenetv2: Inverted residuals and linear bottlenecks, " in CVPR, 2018, pp. 4510-4520.
C.-F. Chen, Q. Fan, and R. Panda, "Crossvit: Cross-attention multi-scale vision transformer for image classification, " in ICCV, 2021.
S. Cai, P. Li, E. Su, and L. Xie, "Auditory attention detection via crossmodal attention, " Frontiers in neuroscience, vol. 15, p. 652058, 2021.
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, "Attention is all you need, " NeurIPS, 2017.
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, et al., "An image is worth 16x16 words: Transformers for image recognition at scale, " in ICLR, 2021.
M. Arar, Y. Ginger, D. Danon, A. Bermano, and D. Cohen-Or, "Unsupervised multi-modal image registration via geometry preserving imageto-image translation, " in CVPR, 2020.
K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition, " in ICLR, 2015.
J. Liu, S. Zhang, S. Wang, and D. N. Metaxas, "Multispectral deep neural networks for pedestrian detection, " in BMVC, 2016.
D. König, M. Adam, C. Jarvers, G. Layher, H. Neumann, and M. Teutsch, "Fully convolutional region proposal networks for multispectral person detection, " in CVPR, 2017, pp. 243-250.
C. Li, D. Song, R. Tong, and M. Tang, "Illumination-aware faster RCNN for robust multispectral pedestrian detection, " Pattern Recognit., vol. 85, pp. 161-171, 2019.
D. Guan, Y. Cao, J. Yang, Y. Cao, and M. Y. Yang, "Fusion of multispectral data through illumination-aware deep neural networks for pedestrian detection, " Inf. Fusion, vol. 50, pp. 148-157, 2019.
L. Zhang, Z. Liu, S. Zhang, X. Yang, H. Qiao, K. Huang, and A. Hussain, "Cross-modality interactive attention network for multispectral pedestrian detection, " Inf. Fusion, vol. 50, pp. 20-29, 2019.
P. Dollar, C. Wojek, B. Schiele, and P. Perona, "Pedestrian detection: An evaluation of the state of the art, " IEEE Trans. Pattern Anal. Mach. Intell., vol. 34, no. 4, pp. 743-761, 2012.