VLMDiff: Leveraging Vision-Language Models for Multi-Class Anomaly Detection with Diffusion

HICSONMEZ, Samet; SHABAYEK, Abd El Rahman; AOUADA, Djamila

Download

Paper published in a journal (Scientific congresses, symposiums and conference proceedings)

VLMDiff: Leveraging Vision-Language Models for Multi-Class Anomaly Detection with Diffusion

HICSONMEZ, Samet; SHABAYEK, Abd El Rahman; AOUADA, Djamila

2026 • In 2026 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

Peer reviewed

Permalink
https://hdl.handle.net/10993/67845

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

_WACV_2026__VLMDIFF_paper_supp.pdf

Author postprint (27.09 MB)

Download

All documents in ORBilu are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

Image Anomaly Detection, VLMs, Diffusion Models

Abstract :

[en] Detecting visual anomalies in diverse, multi-class real-world images is a significant challenge. We introduce \ours, a novel unsupervised multi-class visual anomaly detection framework. It integrates a Latent Diffusion Model (LDM) with a Vision-Language Model (VLM) for enhanced anomaly localization and detection. Specifically, a pre-trained VLM with a simple prompt extracts detailed image descriptions, serving as additional conditioning for LDM training. Current diffusion-based methods rely on synthetic noise generation, limiting their generalization and requiring per-class model training, which hinders scalability. \ours, however, leverages VLMs to obtain normal captions without manual annotations or additional training. These descriptions condition the diffusion model, learning a robust normal image feature representation for multi-class anomaly detection. Our method achieves competitive performance, improving the pixel-level Per-Region-Overlap (PRO) metric by up to 25 points on the Real-IAD dataset and 8 points on the COCO-AD dataset, outperforming state-of-the-art diffusion-based approaches. Code is available at https://github.com/giddyyupp/VLMDiff.

Disciplines :

Computer science

Author, co-author :

HICSONMEZ, Samet ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > CVI2

SHABAYEK, Abd El Rahman ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > CVI2

AOUADA, Djamila ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > CVI2

External co-authors :

Language :

English

Title :

VLMDiff: Leveraging Vision-Language Models for Multi-Class Anomaly Detection with Diffusion

Publication date :

2026

Event name :

IEEE/CVF Winter Conference on Applications of Computer Vision 2026

Event organizer :

IEEE/CVF

Event place :

Tucson, United States

Event date :

06 - 10 March 2026

Audience :

International

Journal title :

2026 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

Peer reviewed :

Peer reviewed

FnR Project :

DEFENCE22/17813724/AUREA

Available on ORBilu :

since 21 February 2026

Statistics

Number of views

91 (1 by Unilu)

Number of downloads

42 (1 by Unilu)

More statistics