Multi-label image classification using adaptive graph convolutional networks: From a single domain to multiple domains

Computer vision; Deep learning; Domain shift; Graph Convolutional Networks; Machine learning; Multi-label image classification; Unsupervised domain adaptation; Convolutional networks; Domain adaptation; Graph convolutional network; Images classification; Label images; Machine-learning; Multi-labels; Software; Signal Processing; Computer Vision and Pattern Recognition

Abstract :

[en] This paper proposes an adaptive graph-based approach for multi-label image classification. Graph-based methods have been largely exploited in the field of multi-label classification, given their ability to model label correlations. Specifically, their effectiveness has been proven not only when considering a single domain but also when taking into account multiple domains. However, the topology of the used graph is not optimal as it is pre-defined heuristically. In addition, consecutive Graph Convolutional Network (GCN) aggregations tend to destroy the feature similarity. To overcome these issues, an architecture for learning the graph connectivity in an end-to-end fashion is introduced. This is done by integrating an attention-based mechanism and a similarity-preserving strategy. The proposed framework is then extended to multiple domains using an adversarial training scheme. Numerous experiments are reported on well-known single-domain and multi-domain benchmarks. The results demonstrate that our approach achieves competitive results in terms of mean Average Precision (mAP) and model size as compared to the state-of-the-art. The code will be made publicly available.

Disciplines :

Computer science

Author, co-author :

SINGH, Inder Pal ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > CVI2

GHORBEL, Enjie ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust > CVI2 > Team Djamila AOUADA ; Cristal Laboratory, National School of Computer Sciences, University of Manouba, Tunisia

OYEDOTUN, Oyebade ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust > CVI2 > Team Djamila AOUADA

AOUADA, Djamila ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > CVI2

External co-authors :

yes

Language :

English

Title :

Multi-label image classification using adaptive graph convolutional networks: From a single domain to multiple domains

Publication date :

13 July 2024

Journal title :

Computer Vision and Image Understanding

ISSN :

1077-3142

eISSN :

1090-235X

Publisher :

Academic Press Inc.

Volume :

247

Pages :

104062

Peer reviewed :

Peer Reviewed verified by ORBi

Additional URL :

https://api.elsevier.com/content/article/PII:S1077314224001437?httpAccept=text/xml

FnR Project :

FNR14755859 - Multi-modal Fusion Of Electro-optical Sensors For Spacecraft Pose Estimation Towards Autonomous In-orbit Operations, 2020 (01/01/2021-31/12/2023) - Djamila Aouada

Available on ORBilu :

since 22 July 2024

Statistics

Number of views

225 (18 by Unilu)

Number of downloads

116 (7 by Unilu)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

OpenCitations

OpenAlex citations

Bibliography

Cai, Y., Ge, L., Liu, J., Cai, J., Cham, T.-J., Yuan, J., Thalmann, N.M., 2019. Exploiting spatial-temporal relationships for 3d pose estimation via graph convolutional networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 2272–2281.
Chaudhuri, B., Demir, B., Chaudhuri, S., Bruzzone, L., Multilabel remote sensing image retrieval using a semisupervised graph-theoretic method. IEEE Trans. Geosci. Remote Sens., 56(2), 2018, 1144.
Chen, L., Chen, H., Wei, Z., Jin, X., Tan, X., Jin, Y., Chen, E., 2022. Reusing the task-specific classifier as a discriminator: Discriminator-free adversarial domain adaptation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 7181–7190.
Chen, T., Lin, L., Hui, X., Chen, R., Wu, H., Knowledge-guided multi-label few-shot learning for general image recognition. IEEE Trans. Pattern Anal. Mach. Intell., 2020.
Chen, Z.M., Wei, X.S., Wang, P., Guo, Y., 2019b. Multi-label image recognition with graph convolutional networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 5177–5186.
Chen, T., Xu, M., Hui, X., Wu, H., Lin, L., 2019a. Learning semantic-specific graph representation for multi-label image recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 522–531.
Cheng, X., Lin, H., Wu, X., Shen, D., Yang, F., Liu, H., Shi, N., Mltr: Multi-label classification with transformer. 2022 IEEE International Conference on Multimedia and Expo, ICME, 2022, IEEE, 1–6.
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L., IEEE Conference on Computer Vision and Pattern Recognition. 2009, A Large-Scale Hierarchical Image Database, Imagenet, 248–255.
Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A., The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 88:2 (2010), 303–338.
Ganin, Y., Lempitsky, V., June. Unsupervised domain adaptation by backpropagation. International Conference on Machine Learning, 2015, PMLR, 1180–1189.
Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., March, M., Lempitsky, V., Domain-adversarial training of neural networks. J. Mach. Learn. Res. 17:59 (2016), 1–35.
Gao, B.-B., Zhou, H.-Y., Learning to discover multi-class attentional regions for multi-label image recognition. IEEE Trans. Image Process. 30 (2021), 5920–5932.
Ge, W., Yang, S., Yu, Y., Multi-evidence filtering and fusion for multi-label classification. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, Object Detection and Semantic Segmentation Based on Weakly Supervised Learning, 1277–1286.
He, K., Zhang, X., Ren, S., Sun, J., 2016. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 770–778.
Hua, Y., Mou, L., Zhu, X.X., Recurrently exploring class-wise attention in a hybrid convolutional and bidirectional LSTM network for multi-label aerial image classification. ISPRS J. Photogramm. Remote Sens., 149, 2019.
Hua, Y., Mou, L., Zhu, X.X., Relation network for multilabel aerial image classification. IEEE Trans. Geosci. Remote Sens. 58:7 (2020), 4558–4572.
Inoue, N., Furuta, R., Yamasaki, T., Aizawa, K., 2018. Cross-domain weakly-supervised object detection through progressive domain adaptation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 5001–5009.
Jin, W., Derr, T., Wang, Y., Ma, Y., Liu, Z., Tang, J., 2021. Node similarity preserving graph convolutional networks. In: Proceedings of the 14th ACM International Conference on Web Search and Data Mining. pp. 148–156.
Kipf, T.N., Welling, M., Semi-Supervised Classification with Graph Convolutional Networks. 2017, ICLR.
Krizhevsky, A., Sutskever, I., Hinton, G., 2012. Imagenet classification with deep convolutional neural networks. In: Proc. Neural Inf. Process. Syst. Vol. 1106.
Lanchantin, J., Wang, T., Ordonez, V., Qi, Y., 2021. General multi-label image classification with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 16478–16488.
Li, Y., Huang, C., Loy, C.C., Tang, X., 2016. Human attribute recognition by deep hierarchical contexts. In: European Conference on Computer Vision. Cham, pp. 684–700.
Li, G., Ji, Z., Chang, Y., Li, S., Qu, X., Cao, D., ML-ANet: A transfer learning approach using adaptation network for multi-label image classification in autonomous driving. Chin. J. Mech. Eng. 34:1 (2021), 1–11.
Li, Q., Peng, X., Qiao, Y., Peng, Q., Learning category correlations for multi-label image recognition with graph networks. 2019 arXiv preprint arXiv:1909.13005.
Li, M., Zhai, Y.M., Luo, Y.W., Ge, P.F., Ren, C.X., 2020. Enhanced transport distance for unsupervised domain adaptation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 13936–13944.
Lin, D., Lin, J., Zhao, L., Wang, Z.J., Chen, Z., Multilabel aerial image classification with unsupervised domain adaptation. IEEE Trans. Geosci. Remote Sens., 2021.
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., …, Zitnick, C.L., 2014. Microsoft Coco: Common Objects in Context. In: European Conference on Computer Vision. Cham, pp. 740–755.
Liu, R., Huang, J., Li, T.H., Li, G., 2022. Causality compensated attention for contextual biased visual recognition. In: The Eleventh International Conference on Learning Representations.
Long, M., Cao, Z., Wang, J., Jordan, M.I., Conditional adversarial domain adaptation. Adv. Neural Inf. Process. Syst., 31, 2018.
Long, M., Zhu, H., Wang, J., Jordan, M.I., Deep transfer learning with joint adaptation networks. International Conference on Machine Learning, 2017, PMLR, 2208–2217.
Mirza, M., Osindero, S., Conditional generative adversarial nets. 2014 arXiv preprint arXiv:1411.1784.
Papadopoulos, K., Ghorbel, E., Aouada, D., Ottersten, B., 25th International Conference on Pattern Recognition. 2020, 452–458 Vertex feature encoding and hierarchical temporal modeling in a spatio-temporal graph convolutional network for action recognition.
Papadopoulos, K., Ghorbel, E., Oyedotun, O., Aouada, D., Ottersten, B., Deepvi: A novel framework for learning deep view-invariant human action representations using a single rgb camera. 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2020, 2020, IEEE, 138–145.
Pennington, J., Socher, R., Manning, C.D., Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP, 2014, Global vectors for word representation, Glove, 1532–1543.
Pham, D.D., Koesnadi, S.M., Dovletov, G., Pauli, J., Adversarial, U., (eds.) IEEE 18th International Symposium on Biomedical Imaging, ISBI, 2021, Domain Adaptation for Multi-Label Classification of Chest X-Ray, 1236–1240.
Qu, X., Che, H., Huang, J., Xu, L., Zheng, X., Multi-layered semantic representation network for multi-label image classification. Int. J. Mach. Learn. Cybern., 2023, 1–9.
Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al. Learning transferable visual models from natural language supervision. International Conference on Machine Learning, 2021, PMLR, 8748–8763.
Razavian, S., Azizpour, A., Sullivan, H., Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, an Astounding Baseline for Recognition, and Carlsson, S. 2014, CNN features off-the-shelf, 806–813 J.
Ridnik, T., Ben-Baruch, E., Zamir, N., Noy, A., Friedman, I., Protter, M., Zelnik-Manor, L., 2021a. Asymmetric loss for multi-label classification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 82–91.
Ridnik, T., Lawen, H., Noy, A., Baruch, B., Sharir, E., 2021b. in: Tresnet: High Performance Gpu-Dedicated Architecture. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, and Friedman, I. pp. 1400–1409, G.
Ridnik, T., Sharir, G., Ben-Cohen, A., Ben-Baruch, E., Noy, A., 2023. Ml-decoder: Scalable and versatile classification head. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. pp. 32–41.
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D., 2017. Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision. pp. 618–626.
Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., LeCun, Y., Overfeat: Integrated Recognition, Localization and Detection Using Convolutional Networks. 2014, ICLR.
Shao, J., Kang, K., Loy, C., Wang, C., Deeply, X., (eds.) Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Learned Attributes for Crowded Scene Understanding, 2015, 4657–4666.
Simonyan, K., Zisserman, A., Very deep convolutional networks for large-scale image recognition. Bengio, Y., LeCun, Y., (eds.) 3rd International Conference on Learning Representations, ICLR 2015, 2015.
Singh, I.P., Ghorbel, E., Kacem, A., Rathinam, A., Aouada, D., 2024. Discriminator-free Unsupervised Domain Adaptation for Multi-label Image Classification. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. pp. 3936–3945.
Singh, I.P., Ghorbel, E., Oyedotun, O., Aouada, D., 2022a. Multi label image classification using adaptive graph convolutional networks (ML-AGCN). In: IEEE International Conference on Image Processing.
Singh, I., Mejri, N., Nygyen, V., Ghorbel, D., Multi-type deepfake detection. MMSP, 2023.
Singh, I.P., Oyedotun, O., Ghorbel, E., Aouada, D., 2022b. IML-GCN: Improved Multi-Label Graph Convolutional Network for Efficient yet Precise Image Classification. In: AAAI-22 Workshop Program-Deep Learning on Graphs: Methods and Applications.
Sun, X., Hu, P., Saenko, K., Dualcoop: Fast adaptation to multi-label recognition with limited annotations. Adv. Neural Inf. Process. Syst. 35 (2022), 30569–30582.
Sun, D., Ma, L., Ding, Z., Luo, B., An attention-driven multi-label image classification with semantic embedding and graph convolutional networks. Cogn. Comput., 2022, 1–12.
Velickovic, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., Bengio, Y., Graph attention networks. Statistics, 1050, 2017.
Wang, Y., He, D., Li, F., Long, X., Zhou, Z., Ma, J., Wen, S., Proceedings of the AAAI Conference on Artificial Intelligence, Multi-Label Classification with Label Graph Superimposing, Vol. 34, 2020, 12265–12272 No. 07.
Wang, Y., Xie, Y., Fan, L., Hu, G., STMG: Swin transformer for multi-label image recognition with graph convolution network. Neural Comput. Appl. 34:12 (2022), 10051–10063.
Wang, Y., Xie, Y., Liu, Y., Zhou, K., Li, X., 2020b. Fast graph convolution network based multi-label image recognition via cross-modal fusion. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management. pp. 1575–1584.
Wang, Y., Xie, Y., Zeng, J., Wang, H., Fan, L., Song, Y., Cross-modal fusion for multi-label image classification with attention mechanism. Comput. Electr. Eng., 101, 2022, 108002.
Wang, J., Yang, Y., Mao, J., Huang, Z., Huang, C., Xu, W., 2016. Cnn-rnn: A unified framework for multi-label image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 2285–2294.
Wei, Y., Xia, W., Lin, M., Huang, J., Ni, B., Dong, J., Zhao, Y., Yan, S., HCP: A flexible CNN framework for multi-label image classification. IEEE Trans. Pattern Anal. Mach. Intell. 38:9 (2015), 1901–1907.
Xia, G.S., Hu, J., Hu, F., Shi, B., Bai, X., Zhong, Y., Zhang, L., Lu, X., AID: A benchmark data set for performance evaluation of aerial scene classification. IEEE Trans. Geosci. Remote Sens. 55:7 (2017), 3965–3981.
Yang, Y., Newsam, S., 2010. Bag-of-visual-words and spatial extensions for land-use classification. In: International Conference on Advances in Geographic Information Systems. SIGSPATIAL.
Zhang, Y., Liu, T., Long, M., Jordan, M., Bridging theory and algorithm for domain adaptation. International Conference on Machine Learning, 2019, PMLR, 7404–7413.
Zhu, F., Li, H., Ouyang, W., Yu, N., Wang, X., 2017. Learning spatial regularization with image-level supervisions for multi-label image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 5513–5522.