[en] Multi-label Image Classification (MLIC) is an active research topic within the Computer Vision community. Its objective is to simultaneously identify the presence or absence of multiple objects within an image. Thanks to its practical usefulness, MLIC finds its way into numerous fields of applications such as human action recognition, multi-attribute predictions, and semantic segmentation. With the widespread availability of large-scale labeled datasets and the recent advancements in deep learning, numerous methods have achieved remarkable performance for the task of MLIC. Despite their proven success, these methods usually employ very deep neural networks leading to cumbersome architectures. This strategy practically limits their applicability in a memory-constrained scenario. Additionally, existing MLIC methods usually assume that the test data comes from the same domain as the training data. However, this might not always be true which leads to poor generalization of these methods on the images from unseen domains. To overcome this challenge, commonly known as domain-shift, very few methods have been proposed that extend the concept of Unsupervised Domain Adaptation (UDA) from single-label classification to multi-label classification. However, due to the inherent differences between the two problems, this extension might yield sub-optimal results. Additionally, these methods also suffer from the use of cumbersome network architectures.
In this thesis, we propose to tackle the aforementioned challenges in MLIC, going from the simple scenario where a single-label domain is considered to a more challenging setup involving a cross-domain setting. In the first part of this thesis, we aim to efficiently and effectively model the relationship between multiple objects while keeping a moderate-size architecture for the general task of MLIC. Specifically, we make use of Graph Convolutional Networks (GCN) to model the label relationships using an adaptive graph learning strategy. In the second part of this work, we focus on tackling the domain shift that degrades the performance of MLIC methods. For that purpose, we propose novel UDA approaches specifically tailored to the task of MLIC. The proposed solutions are evaluated on several benchmarks to demonstrate their effectiveness with respect to the state-of-the-art methods.
Disciplines :
Sciences informatiques
Auteur, co-auteur :
SINGH, Inder Pal ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > CVI2
Langue du document :
Anglais
Titre :
TOWARDS COMPACT YET EFFECTIVE MULTI-LABEL IMAGE CLASSIFICATION: FROM A SINGLE DOMAIN TO MULTIPLE DOMAINS
Date de soutenance :
12 juillet 2024
Institution :
Unilu - University of Luxembourg [The Faculty of Science, Technology and Medicine], Kirchberg, Luxembourg
Intitulé du diplôme :
Docteur en Informatique (DIP_DOC_0006_B)
Membre du jury :
AOUADA, Djamila ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > CVI2
HEIN, Andreas ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SPASYS
GHORBEL, Enjie ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust > CVI2 > Team Djamila AOUADA ; University of Manouba