Paper published in a journal (Scientific congresses, symposiums and conference proceedings)
Self-Supervised Learning for Visual Relationship Detection through Masked Bounding Box Reconstruction
ANASTASAKIS, Zacharias; MALLIS, Dimitrios; DIOMATARIS, Markos et al.
2024In 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Proceedings
Peer reviewed
 

Files


Full Text
Self_Supervised_Learning_for_Visual_Relationship_Detection_through_Masked_Bounding_Box_Reconstruction (1).pdf
Author preprint (1.68 MB) Creative Commons License - Attribution
Download

All documents in ORBilu are protected by a user license.

Send to



Details



Keywords :
PredDet; VRD; SelfSupervision; ComputerVision; Transformers
Abstract :
[en] We present a novel self-supervised approach for representation learning, particularly for the task of Visual Relationship Detection (VRD). Motivated by the effectiveness of Masked Image Modeling (MIM), we propose Masked Bounding Box Reconstruction (MBBR), a variation of MIM where a percentage of the entities/objects within a scene are masked and subsequently reconstructed based on the unmasked objects. The core idea is that, through object-level masked modeling, the network learns context-aware representations that capture the interaction of objects within a scene and thus are highly predictive of visual object relationships. We extensively evaluate learned representations, both qualitatively and quantitatively, in a few-shot setting and demonstrate the efficacy of MBBR for learning robust visual representations, particularly tailored for VRD. The proposed method is able to surpass state-of-the-art VRD methods on the Predicate Detection (PredDet) evaluation setting, using only a few annotated samples.
Disciplines :
Computer science
Author, co-author :
ANASTASAKIS, Zacharias;  Deeplab Athens
MALLIS, Dimitrios  ;  University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > CVI2
DIOMATARIS, Markos;  ETH Zurich
ALEXANDRIDIS, George;  NTUA - National Technical University of Athens [GR]
KOLLIAS, Stefanos;  NTUA - National Technical University of Athens [GR]
PITSIKALIS, Vassilis;  Deeplab Athens
External co-authors :
yes
Language :
English
Title :
Self-Supervised Learning for Visual Relationship Detection through Masked Bounding Box Reconstruction
Publication date :
04 January 2024
Event name :
2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
Event organizer :
IEEE Computer Society
Event place :
WAIKOLOA, United States
Event date :
from 04 to 08 January 2024
Audience :
International
Journal title :
2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Proceedings
Peer reviewed :
Peer reviewed
Focus Area :
Computational Sciences
Available on ORBilu :
since 24 November 2023

Statistics


Number of views
134 (3 by Unilu)
Number of downloads
78 (0 by Unilu)

Scopus citations®
 
3
Scopus citations®
without self-citations
3
OpenAlex citations
 
2

Bibliography


Similar publications



Contact ORBilu