Detecting Audio-Visual Deepfakes with Fine-Grained Inconsistencies

ASTRID, Marcella; GHORBEL, Enjie; AOUADA, Djamila

Télécharger

Communication publiée sur un site web (Colloques, congrès, conférences scientifiques et actes)

Detecting Audio-Visual Deepfakes with Fine-Grained Inconsistencies

ASTRID, Marcella; GHORBEL, Enjie; AOUADA, Djamila

2024 • British Machine Vision Conference

Peer reviewed

Permalien
https://hdl.handle.net/10993/61827

arXiV
2408.06753v1

Documents (1)Envoyer vers Détails Statistiques Bibliographie Publications similaires

Documents

Texte intégral

[240814_arxiv]BMVC2024_submittedversion.pdf

Preprint Auteur (3.98 MB)

Submitted version

Télécharger

Tous les documents dans ORBilu sont protégés par une licence d'utilisation.

Envoyer vers

RIS BibTex APA Chicago Permalink X Linkedin

Détails

Mots-clés :

Deepfake detection; audio-visual; fine-grained classification; augmentation

Résumé :

[en] Existing methods on audio-visual deepfake detection mainly focus on high-level features for modeling inconsistencies between audio and visual data. As a result, these approaches usually overlook finer audio-visual artifacts, which are inherent to deepfakes. Herein, we propose the introduction of fine-grained mechanisms for detecting subtle artifacts in both spatial and temporal domains. First, we introduce a local audio-visual model capable of capturing small spatial regions that are prone to inconsistencies with audio. For that purpose, a fine-grained mechanism based on a spatially-local distance coupled with an attention module is adopted. Second, we introduce a temporally-local pseudo-fake augmentation to include samples incorporating subtle temporal inconsistencies in our training set. Experiments on the DFDC and the FakeAVCeleb datasets demonstrate the superiority of the proposed method in terms of generalization as compared to the state-of-the-art under both in-dataset and cross-dataset settings.

Centre de recherche :

ULHPC - University of Luxembourg: High Performance Computing

Disciplines :

Sciences informatiques

Auteur, co-auteur :

ASTRID, Marcella ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > CVI2

GHORBEL, Enjie ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust > CVI2 > Team Djamila AOUADA

AOUADA, Djamila ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > CVI2

Co-auteurs externes :

Langue du document :

Anglais

Titre :

Detecting Audio-Visual Deepfakes with Fine-Grained Inconsistencies

Date de publication/diffusion :

2024

Nom de la manifestation :

British Machine Vision Conference

Lieu de la manifestation :

Glasgow, Royaume-Uni

Date de la manifestation :

25-28 November 2024

Manifestation à portée :

International

Peer reviewed :

Peer reviewed

Source :

https://arxiv.org/abs/2408.06753

Projet FnR :

FNR16353350 - Deepfake Detection Using Spatio-temporal-spectral Representations For Effective Learning, 2021 (01/03/2022-28/02/2025) - Djamila Aouada

Intitulé du projet de recherche :

U-AGR-7133 - BRIDGES2021/IS/16353350/FakeDeTeR_Post - AOUADA Djamila

Organisme subsidiant :

FNR - Luxembourg National Research Fund

N° du Fonds :

BRIDGES2021/IS/16353350/FaKeDeTeR

Commentaire :

Accepted in BMVC 2024

Disponible sur ORBilu :

depuis le 14 août 2024

Statistiques

Nombre de vues

92 (dont 6 Unilu)

Nombre de téléchargements

86 (dont 2 Unilu)

Voir plus de statistiques