Recommendation systems; Personalization; Machine Learning; Multimodal Representation Learning; Visual Arts
Abstract :
[en] With the advent of digital media, the availability of art content has greatly expanded, making it increasingly challenging for individuals to discover and curate works that align with their personal preferences and taste. The task of providing accurate and personalised Visual Art (VA) recommendations is thus a complex one, requiring a deep understanding of the intricate interplay of multiple modalities such as images, textual descriptions, or other metadata. In this paper, we study the nuances of modalities involved in the VA domain (image and text) and how they can be effectively harnessed to provide a truly personalised art experience to users. Particularly, we develop four fusion-based multimodal VA recommendation pipelines and conduct a large-scale user-centric evaluation. Our results indicate that early fusion (i.e, joint multimodal learning of visual and textual features) is preferred over a late fusion of ranked paintings from unimodal models (state-of-the-art baselines) but only if the latent representation space of the multimodal painting embeddings is entangled. Our findings open a new perspective for a better representation learning in the VA RecSys domain.
Disciplines :
Computer science
Author, co-author :
YILMA, Bereket Abera ; University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)
LEIVA, Luis A. ; University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)
External co-authors :
no
Language :
English
Title :
Together Yet Apart: Multimodal Representation Learning for Personalised Visual Art Recommendation
Publication date :
26 June 2023
Event name :
ACM Conference on User Modeling, Adaptation and Personalization (UMAP 2023)
Event organizer :
ACM
Event place :
Limassol, Cyprus
Event date :
26-06-2023
Main work title :
Proceedings of the ACM Conference on User Modeling, Adaptation and Personalization (UMAP 2023)
Maha Amami, Gabriella Pasi, Fabio Stella, and Rim Faiz.2016.An lda-based approach to scientific paper recommendation.In International conference on applications of natural language to information systems.Springer, 200–210.
Rel Guzman Apaza, Elizabeth Vera Cervantes, Laura Cruz Quispe, and José Ochoa Luna.2014.Online Courses Recommendation based on LDA..In SIMBig.Citeseer, 42–48.
Monika Bansal, Munish Kumar, Manish Kumar, and Krishan Kumar.2021.An efficient technique for object recognition using Shi-Tomasi corner detection algorithm.Soft Computing 25, 6 (2021), 4423–4432.
Catarina Barata, M Emre Celebi, and Jorge S Marques.2018.A survey of feature extraction in dermoscopy image analysis of skin cancer.IEEE journal of biomedical and health informatics 23, 3 (2018), 1096–1109.
Yoshua Bengio, Aaron Courville, and Pascal Vincent.2013.Representation learning: A review and new perspectives.IEEE transactions on pattern analysis and machine intelligence 35, 8 (2013), 1798–1828.
David M Blei, Andrew Y Ng, and Michael I Jordan.2003.Latent dirichlet allocation.Journal of machine Learning research 3, Jan (2003), 993–1022.
Michael Calonder, Vincent Lepetit, Christoph Strecha, and Pascal Fua.2010.Brief: Binary robust independent elementary features.In European conference on computer vision.Springer, 778–792.
Jie Chen, Li-hui Zou, Juan Zhang, and Li-hua Dou.2009.The Comparison and Application of Corner Detection Algorithms.Journal of multimedia 4, 6 (2009).
Gordon V.Cormack, Charles L A Clarke, and Stefan Buettcher.2009.Reciprocal Rank Fusion Outperforms Condorcet and Individual Rank Learning Methods.In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval (Boston, MA, USA) (SIGIR’09).Association for Computing Machinery, New York, NY, USA, 758–759.https://doi.org/10.1145/1571941.1572114
Louis Deladiennee and Yannick Naudet.2017.A graph-based semantic recommender system for a reflective and personalised museum visit: Extended abstract.In 2017 12th International Workshop on Semantic and Social Media Adaptation and Personalization (SMAP).88–89.https://doi.org/10.1109/SMAP.2017.8022674
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova.2018.Bert: Pre-training of deep bidirectional transformers for language understanding.arXiv preprint arXiv:1810.04805 (2018).
Abigail R Esman.2012.The World’s Strongest Economy? The Global Art Market.
Willem Robert van Hage, Natalia Stash, Yiwen Wang, and Lora Aroyo.2010.Finding your way through the Rijksmuseum with an adaptive mobile museum guide.In Extended semantic web conference.Springer, 46–59.
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun.2016.Deep residual learning for image recognition.In Proceedings of the IEEE conference on computer vision and pattern recognition.770–778.
Ruining He, Chen Fang, Zhaowen Wang, and Julian McAuley.2016.Vista: a visually, socially, and temporally-aware model for artistic recommendation.In Proceedings of the 10th ACM Conference on Recommender Systems.309–316.
Ruining He and Julian McAuley.2016.VBPR: visual bayesian personalized ranking from implicit feedback.In Proceedings of the AAAI conference on artificial intelligence, Vol.30.
A Victor Ikechukwu, S Murali, R Deepu, and RC Shivamurthy.2021.ResNet-50 vs VGG-19 vs training from scratch: a comparative analysis of the segmentation and classification of Pneumonia from chest X-ray images.Global Transitions Proceedings 2, 2 (2021), 375–381.
Hamed Jelodar, Yongli Wang, Chi Yuan, Xia Feng, Xiahui Jiang, Yanchao Li, and Liang Zhao.2019.Latent Dirichlet Allocation (LDA) and Topic modeling: models, applications, a survey.Multimedia Tools and Applications 78, 11 (2019), 15169–15211.
Mehmet Oguz Kelek, Nurullah Calik, and Tulay Yildirim.2019.Painter classification over the novel art painting data set via the latest deep neural networks.Procedia Computer Science 154 (2019), 369–376.
Kalliopi Kontiza, Olga Loboda, Louis Deladiennee, Sylvain Castagnos, and Yannick Naudet.2018.A museum app to trigger users’ reflection.In International Workshop on Mobile Access to Cultural Heritage (MobileCH2018).
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton.2017.Imagenet classification with deep convolutional neural networks.Commun.ACM 60, 6 (2017), 84–90.
Tsvi Kuflik, Einat Minkov, and Keren Kahanov.2014.Graph-based Recommendation in the Museum.In DMRS.Citeseer, 46–48.
Bin Li and Dimas Lima.2021.Facial expression recognition via ResNet-50.International Journal of Cognitive Computing in Engineering 2 (2021), 57–64.
Junnan Li, Dongxu Li, Caiming Xiong, and Steven Hoi.2022.Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation.arXiv preprint arXiv:2201.12086 (2022).
Junnan Li, Ramprasaath Selvaraju, Akhilesh Gotmare, Shafiq Joty, Caiming Xiong, and Steven Chu Hong Hoi.2021.Align before fuse: Vision and language representation learning with momentum distillation.Advances in neural information processing systems 34 (2021), 9694–9705.
Weixin Liang, Yuhui Zhang, Yongchan Kwon, Serena Yeung, and James Zou.2022.Mind the gap: Understanding the modality gap in multi-modal contrastive representation learning.arXiv preprint arXiv:2203.02053 (2022).
Ioanna Lykourentzou, Xavier Claude, Yannick Naudet, Eric Tobias, Angeliki Antoniou, George Lepouras, and Costas Vasilakis.2013.Improving museum visitors’ Quality of Experience through intelligent recommendations: A visiting style-based approach.In Workshop Proceedings of the 9th International Conference on intelligent environments.IOS Press, 507–518.
Elmar Mair, Gregory D Hager, Darius Burschka, Michael Suppa, and Gerhard Hirzinger.2010.Adaptive and generic corner detection based on the accelerated segment test.In European conference on Computer vision.Springer, 183–196.
Karim Malik, Colin Robertson, Steven A Roberts, Tarmo K Remmel, and Jed A Long.2022.Computer vision models for comparing spatial patterns: understanding spatial scale.International Journal of Geographical Information Science (2022), 1–35.
R.Mayer and S.Sheehan.1991.The Artist’s Handbook of Materials and Techniques.Viking.https://books.google.lu/books?id=tQ9nYreyGwEC
Pablo Messina, Manuel Cartagena, Patricio Cerda, Felipe del Rio, and Denis Parra.2020.CuratorNet: Visually-aware Recommendation of Art Images.(2020).
Pablo Messina, Vicente Dominguez, Denis Parra, Christoph Trattner, and Alvaro Soto.2017.Exploring Content-based Artwork Recommendation with Metadata and Visual Features.arXiv preprint arXiv:1706.05786 (2017).
Pablo Messina, Vicente Dominguez, Denis Parra, Christoph Trattner, and Alvaro Soto.2019.Content-based artwork recommendation: integrating painting metadata with neural and manually-engineered visual features.User Modeling and User-Adapted Interaction 29, 2 (2019), 251–290.
Yannick Naudet, Angeliki Antoniou, Ioanna Lykourentzou, Eric Tobias, Jenny Rompa, and George Lepouras.2015.Museum personalization based on gaming and cognitive styles: the BLUE experiment.International Journal of Virtual Communities and Social Networking (IJVCSN) 7, 2 (2015), 1–30.
David Newman, Jey Han Lau, Karl Grieser, and Timothy Baldwin.2010.Automatic evaluation of topic coherence.In Human language technologies: The 2010 annual conference of the North American chapter of the association for computational linguistics.Association for Computational Linguistics, 100–108.
Pearl Pu, Li Chen, and Rong Hu.2011.A user-centric evaluation framework for recommender systems.In Proceedings of the fifth ACM conference on Recommender systems.157–164.
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al.2021.Learning transferable visual models from natural language supervision.In International Conference on Machine Learning.PMLR, 8748–8763.
Dhanesh Ramachandram and Graham W.Taylor.2017.Deep Multimodal Learning: A Survey on Recent Advances and Trends.IEEE Signal Process.Mag.34, 6 (2017), 96–108.
Steffen Rendle, Christoph Freudenthaler, and Lars Schmidt-Thieme.2010.Factorizing personalized markov chains for next-basket recommendation.In Proceedings of the 19th international conference on World wide web.811–820.
Ayodeji Olalekan Salau and Shruti Jain.2019.Feature extraction: a survey of the types, techniques, applications.In 2019 International Conference on Signal Processing and Communication (ICSC).IEEE, 158–164.
Holger Schielzeth, Niels J.Dingemanse, Shinichi Nakagawa, David F.Westneat, Hassen Allegue, Céline Teplitsky, Denis Réale, Ned A.Dochtermann, László Zsolt Garamszegi, and Yimen G.Araya-Ajoy.2020.Robustness of linear mixed-effects models to violations of distributional assumptions.Methods Ecol.Evol.11, 9 (2020), 1141–1152.
Karen Simonyan and Andrew Zisserman.2014.Very deep convolutional networks for large-scale image recognition.arXiv preprint arXiv:1409.1556 (2014).
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich.2015.Going deeper with convolutions.In Proceedings of the IEEE conference on computer vision and pattern recognition.1–9.
L.J.P.van der Maaten and G.E.Hinton.2008.Visualizing Data Using t-SNE.J.Mach.Learn.Res.(2008), 2579–2605.
William Webber, Alistair Moffat, and Justin Zobel.2010.A Similarity Measure for Indefinite Rankings.28, 4, Article 20 (2010), 38 pages.https://doi.org/10.1145/1852102.1852106
Bereket Abera Yilma, Najib Aghenda, Marcelo Romero, Yannick Naudet, and Hervé Panetto.2020.Personalised visual art recommendation by learning latent semantic representations.In 2020 15th International Workshop on Semantic and Social Media Adaptation and Personalization (SMA.IEEE, 1–6.
Bereket A.Yilma and Luis A.Leiva.2023.The Elements of Visual Art Recommendation: Learning Latent Semantic Representations of Paintings.In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI’23).Association for Computing Machinery, New York, NY, USA, Article 24, 17 pages.https://doi.org/10.1145/3544548.3581477
Bereket Abera Yilma, Yannick Naudet, and Hervé Panetto.2021.Personalisation in Cyber-Physical-Social Systems: A Multi-Stakeholder Aware Recommendation and Guidance.In Proceedings of the 29th ACM Conference on User Modeling, Adaptation and Personalization (Utrecht, Netherlands) (UMAP’21).Association for Computing Machinery, New York, NY, USA, 251–255.https://doi.org/10.1145/3450613.3456847
Feng Zhao, Yajun Zhu, Hai Jin, and Laurence T Yang.2016.A personalized hashtag recommendation approach using LDA-based topic model in microblog environment.Future Generation Computer Systems 65 (2016), 196–206.