Communication publiée dans un ouvrage (Colloques, congrès, conférences scientifiques et actes)
How do humans perceive adversarial text? A reality check on the validity and naturalness of word-based adversarial attacks
DYRMISHI, Salijona; GHAMIZI, Salah; CORDY, Maxime
2023In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics
Peer reviewed
 

Documents


Texte intégral
ACL paper.pdf
Postprint Éditeur (1.19 MB)
Télécharger

Tous les documents dans ORBilu sont protégés par une licence d'utilisation.

Envoyer vers



Détails



Résumé :
[en] Natural Language Processing (NLP) models based on Machine Learning (ML) are susceptible to adversarial attacks -- malicious algorithms that imperceptibly modify input text to force models into making incorrect predictions. However, evaluations of these attacks ignore the property of imperceptibility or study it under limited settings. This entails that adversarial perturbations would not pass any human quality gate and do not represent real threats to human-checked NLP systems. To bypass this limitation and enable proper assessment (and later, improvement) of NLP model robustness, we have surveyed 378 human participants about the perceptibility of text adversarial examples produced by state-of-the-art methods. Our results underline that existing text attacks are impractical in real-world scenarios where humans are involved. This contrasts with previous smaller-scale human studies, which reported overly optimistic conclusions regarding attack success. Through our work, we hope to position human perceptibility as a first-class success criterion for text attacks, and provide guidance for research to build effective attack algorithms and, in turn, design appropriate defence mechanisms.
Centre de recherche :
Interdisciplinary Centre for Security, Reliability and Trust (SnT) > Other
Disciplines :
Sciences informatiques
Auteur, co-auteur :
DYRMISHI, Salijona ;  University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SerVal
GHAMIZI, Salah ;  University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)
CORDY, Maxime  ;  University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SerVal
Co-auteurs externes :
no
Langue du document :
Anglais
Titre :
How do humans perceive adversarial text? A reality check on the validity and naturalness of word-based adversarial attacks
Date de publication/diffusion :
2023
Nom de la manifestation :
ACL 2023: The 61st Annual Meeting of the Association for Computational Linguistics
Organisateur de la manifestation :
ACL
Lieu de la manifestation :
Toronto, Canada
Date de la manifestation :
from 09-07-2023 t0 14-07-2023
Titre de l'ouvrage principal :
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics
Maison d'édition :
Association for Computational Linguistics
ISBN/EAN :
978-1-959429-72-2
Collection et n° de collection :
Volume 1: Long Papers
Peer reviewed :
Peer reviewed
Focus Area :
Security, Reliability and Trust
Projet FnR :
FNR14585105 - Search-based Adversarial Testing Under Domain-specific Constraints, 2020 (01/10/2020-30/09/2024) - Salijona Dyrmishi
Disponible sur ORBilu :
depuis le 13 août 2023

Statistiques


Nombre de vues
173 (dont 5 Unilu)
Nombre de téléchargements
50 (dont 0 Unilu)

citations Scopus®
 
12
citations Scopus®
sans auto-citations
11

Bibliographie


Publications similaires



Contacter ORBilu