[en] This paper proposes an adaptive graph-based approach for multi-label image classification. Graph-based methods have been largely exploited in the field of multi-label classification, given their ability to model label correlations. Specifically, their effectiveness has been proven not only when considering a single domain but also when taking into account multiple domains. However, the topology of the used graph is not optimal as it is pre-defined heuristically. In addition, consecutive Graph Convolutional Network (GCN) aggregations tend to destroy the feature similarity. To overcome these issues, an architecture for learning the graph connectivity in an end-to-end fashion is introduced. This is done by integrating an attention-based mechanism and a similarity-preserving strategy. The proposed framework is then extended to multiple domains using an adversarial training scheme. Numerous experiments are reported on well-known single-domain and multi-domain benchmarks. The results demonstrate that our approach achieves competitive results in terms of mean Average Precision (mAP) and model size as compared to the state-of-the-art. The code will be made publicly available.
Disciplines :
Sciences informatiques
Auteur, co-auteur :
SINGH, Inder Pal ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > CVI2
GHORBEL, Enjie ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust > CVI2 > Team Djamila AOUADA ; Cristal Laboratory, National School of Computer Sciences, University of Manouba, Tunisia
OYEDOTUN, Oyebade ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust > CVI2 > Team Djamila AOUADA
AOUADA, Djamila ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > CVI2
Co-auteurs externes :
yes
Langue du document :
Anglais
Titre :
Multi-label image classification using adaptive graph convolutional networks: From a single domain to multiple domains
Cai, Y., Ge, L., Liu, J., Cai, J., Cham, T.-J., Yuan, J., Thalmann, N.M., 2019. Exploiting spatial-temporal relationships for 3d pose estimation via graph convolutional networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 2272–2281.
Chen, L., Chen, H., Wei, Z., Jin, X., Tan, X., Jin, Y., Chen, E., 2022. Reusing the task-specific classifier as a discriminator: Discriminator-free adversarial domain adaptation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 7181–7190.
Chen, T., Lin, L., Hui, X., Chen, R., Wu, H., Knowledge-guided multi-label few-shot learning for general image recognition. IEEE Trans. Pattern Anal. Mach. Intell., 2020.
Chen, Z.M., Wei, X.S., Wang, P., Guo, Y., 2019b. Multi-label image recognition with graph convolutional networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 5177–5186.
Chen, T., Xu, M., Hui, X., Wu, H., Lin, L., 2019a. Learning semantic-specific graph representation for multi-label image recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 522–531.
Cheng, X., Lin, H., Wu, X., Shen, D., Yang, F., Liu, H., Shi, N., Mltr: Multi-label classification with transformer. 2022 IEEE International Conference on Multimedia and Expo, ICME, 2022, IEEE, 1–6.
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L., IEEE Conference on Computer Vision and Pattern Recognition. 2009, A Large-Scale Hierarchical Image Database, Imagenet, 248–255.
Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A., The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 88:2 (2010), 303–338.
Ganin, Y., Lempitsky, V., June. Unsupervised domain adaptation by backpropagation. International Conference on Machine Learning, 2015, PMLR, 1180–1189.
Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., March, M., Lempitsky, V., Domain-adversarial training of neural networks. J. Mach. Learn. Res. 17:59 (2016), 1–35.
Gao, B.-B., Zhou, H.-Y., Learning to discover multi-class attentional regions for multi-label image recognition. IEEE Trans. Image Process. 30 (2021), 5920–5932.
Ge, W., Yang, S., Yu, Y., Multi-evidence filtering and fusion for multi-label classification. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, Object Detection and Semantic Segmentation Based on Weakly Supervised Learning, 1277–1286.
He, K., Zhang, X., Ren, S., Sun, J., 2016. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 770–778.
Hua, Y., Mou, L., Zhu, X.X., Recurrently exploring class-wise attention in a hybrid convolutional and bidirectional LSTM network for multi-label aerial image classification. ISPRS J. Photogramm. Remote Sens., 149, 2019.
Inoue, N., Furuta, R., Yamasaki, T., Aizawa, K., 2018. Cross-domain weakly-supervised object detection through progressive domain adaptation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 5001–5009.
Jin, W., Derr, T., Wang, Y., Ma, Y., Liu, Z., Tang, J., 2021. Node similarity preserving graph convolutional networks. In: Proceedings of the 14th ACM International Conference on Web Search and Data Mining. pp. 148–156.
Kipf, T.N., Welling, M., Semi-Supervised Classification with Graph Convolutional Networks. 2017, ICLR.
Krizhevsky, A., Sutskever, I., Hinton, G., 2012. Imagenet classification with deep convolutional neural networks. In: Proc. Neural Inf. Process. Syst. Vol. 1106.
Lanchantin, J., Wang, T., Ordonez, V., Qi, Y., 2021. General multi-label image classification with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 16478–16488.
Li, Y., Huang, C., Loy, C.C., Tang, X., 2016. Human attribute recognition by deep hierarchical contexts. In: European Conference on Computer Vision. Cham, pp. 684–700.
Li, G., Ji, Z., Chang, Y., Li, S., Qu, X., Cao, D., ML-ANet: A transfer learning approach using adaptation network for multi-label image classification in autonomous driving. Chin. J. Mech. Eng. 34:1 (2021), 1–11.
Li, M., Zhai, Y.M., Luo, Y.W., Ge, P.F., Ren, C.X., 2020. Enhanced transport distance for unsupervised domain adaptation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 13936–13944.
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., …, Zitnick, C.L., 2014. Microsoft Coco: Common Objects in Context. In: European Conference on Computer Vision. Cham, pp. 740–755.
Liu, R., Huang, J., Li, T.H., Li, G., 2022. Causality compensated attention for contextual biased visual recognition. In: The Eleventh International Conference on Learning Representations.
Long, M., Zhu, H., Wang, J., Jordan, M.I., Deep transfer learning with joint adaptation networks. International Conference on Machine Learning, 2017, PMLR, 2208–2217.
Papadopoulos, K., Ghorbel, E., Aouada, D., Ottersten, B., 25th International Conference on Pattern Recognition. 2020, 452–458 Vertex feature encoding and hierarchical temporal modeling in a spatio-temporal graph convolutional network for action recognition.
Papadopoulos, K., Ghorbel, E., Oyedotun, O., Aouada, D., Ottersten, B., Deepvi: A novel framework for learning deep view-invariant human action representations using a single rgb camera. 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2020, 2020, IEEE, 138–145.
Pennington, J., Socher, R., Manning, C.D., Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP, 2014, Global vectors for word representation, Glove, 1532–1543.
Pham, D.D., Koesnadi, S.M., Dovletov, G., Pauli, J., Adversarial, U., (eds.) IEEE 18th International Symposium on Biomedical Imaging, ISBI, 2021, Domain Adaptation for Multi-Label Classification of Chest X-Ray, 1236–1240.
Qu, X., Che, H., Huang, J., Xu, L., Zheng, X., Multi-layered semantic representation network for multi-label image classification. Int. J. Mach. Learn. Cybern., 2023, 1–9.
Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al. Learning transferable visual models from natural language supervision. International Conference on Machine Learning, 2021, PMLR, 8748–8763.
Razavian, S., Azizpour, A., Sullivan, H., Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, an Astounding Baseline for Recognition, and Carlsson, S. 2014, CNN features off-the-shelf, 806–813 J.
Ridnik, T., Ben-Baruch, E., Zamir, N., Noy, A., Friedman, I., Protter, M., Zelnik-Manor, L., 2021a. Asymmetric loss for multi-label classification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 82–91.
Ridnik, T., Lawen, H., Noy, A., Baruch, B., Sharir, E., 2021b. in: Tresnet: High Performance Gpu-Dedicated Architecture. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, and Friedman, I. pp. 1400–1409, G.
Ridnik, T., Sharir, G., Ben-Cohen, A., Ben-Baruch, E., Noy, A., 2023. Ml-decoder: Scalable and versatile classification head. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. pp. 32–41.
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D., 2017. Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision. pp. 618–626.
Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., LeCun, Y., Overfeat: Integrated Recognition, Localization and Detection Using Convolutional Networks. 2014, ICLR.
Shao, J., Kang, K., Loy, C., Wang, C., Deeply, X., (eds.) Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Learned Attributes for Crowded Scene Understanding, 2015, 4657–4666.
Simonyan, K., Zisserman, A., Very deep convolutional networks for large-scale image recognition. Bengio, Y., LeCun, Y., (eds.) 3rd International Conference on Learning Representations, ICLR 2015, 2015.
Singh, I.P., Ghorbel, E., Kacem, A., Rathinam, A., Aouada, D., 2024. Discriminator-free Unsupervised Domain Adaptation for Multi-label Image Classification. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. pp. 3936–3945.
Singh, I.P., Ghorbel, E., Oyedotun, O., Aouada, D., 2022a. Multi label image classification using adaptive graph convolutional networks (ML-AGCN). In: IEEE International Conference on Image Processing.
Singh, I., Mejri, N., Nygyen, V., Ghorbel, D., Multi-type deepfake detection. MMSP, 2023.
Singh, I.P., Oyedotun, O., Ghorbel, E., Aouada, D., 2022b. IML-GCN: Improved Multi-Label Graph Convolutional Network for Efficient yet Precise Image Classification. In: AAAI-22 Workshop Program-Deep Learning on Graphs: Methods and Applications.
Sun, X., Hu, P., Saenko, K., Dualcoop: Fast adaptation to multi-label recognition with limited annotations. Adv. Neural Inf. Process. Syst. 35 (2022), 30569–30582.
Sun, D., Ma, L., Ding, Z., Luo, B., An attention-driven multi-label image classification with semantic embedding and graph convolutional networks. Cogn. Comput., 2022, 1–12.
Velickovic, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., Bengio, Y., Graph attention networks. Statistics, 1050, 2017.
Wang, Y., He, D., Li, F., Long, X., Zhou, Z., Ma, J., Wen, S., Proceedings of the AAAI Conference on Artificial Intelligence, Multi-Label Classification with Label Graph Superimposing, Vol. 34, 2020, 12265–12272 No. 07.
Wang, Y., Xie, Y., Liu, Y., Zhou, K., Li, X., 2020b. Fast graph convolution network based multi-label image recognition via cross-modal fusion. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management. pp. 1575–1584.
Wang, J., Yang, Y., Mao, J., Huang, Z., Huang, C., Xu, W., 2016. Cnn-rnn: A unified framework for multi-label image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 2285–2294.
Wei, Y., Xia, W., Lin, M., Huang, J., Ni, B., Dong, J., Zhao, Y., Yan, S., HCP: A flexible CNN framework for multi-label image classification. IEEE Trans. Pattern Anal. Mach. Intell. 38:9 (2015), 1901–1907.
Xia, G.S., Hu, J., Hu, F., Shi, B., Bai, X., Zhong, Y., Zhang, L., Lu, X., AID: A benchmark data set for performance evaluation of aerial scene classification. IEEE Trans. Geosci. Remote Sens. 55:7 (2017), 3965–3981.
Yang, Y., Newsam, S., 2010. Bag-of-visual-words and spatial extensions for land-use classification. In: International Conference on Advances in Geographic Information Systems. SIGSPATIAL.
Zhang, Y., Liu, T., Long, M., Jordan, M., Bridging theory and algorithm for domain adaptation. International Conference on Machine Learning, 2019, PMLR, 7404–7413.
Zhu, F., Li, H., Ouyang, W., Yu, N., Wang, X., 2017. Learning spatial regularization with image-level supervisions for multi-label image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 5513–5522.