Paper published in a book (Scientific congresses, symposiums and conference proceedings)
Evaluating the Impact of Text De-Identification on Downstream NLP Tasks
LOTHRITZ, Cedric; LEBICHOT, Bertrand; ALLIX, Kevin et al.
2023In Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)
Peer reviewed
 

Files


Full Text
Anonymisation_NoDaLiDa-5.pdf
Author postprint (167.34 kB) Creative Commons License - Attribution
Download

All documents in ORBilu are protected by a user license.

Send to



Details



Keywords :
Natural Language Processing, NLP, BERT, ERNIE, Anonymisation, De-identification, Transfer learning, fine-tuning
Abstract :
[en] Data anonymisation is often required to comply with regulations when transfering information across departments or entities. However, the risk is that this procedure can distort the data and jeopardise the models built on it. Intuitively, the process of training an NLP model on anonymised data may lower the performance of the resulting model when compared to a model trained on non-anonymised data. In this paper, we investigate the impact of de-identification on the performance of nine downstream NLP tasks. We focus on the anonymisation and pseudonymisation of personal names and compare six different anonymisation strategies for two state-of-the-art pre-trained models. Based on these experiments, we formulate recommendations on how the de-identification should be performed to guarantee accurate NLP models. Our results reveal that de-identification does have a negative impact on the performance of NLP models, but this impact is relatively low. We also find that using pseudonymisation techniques involving random names leads to better performance across most tasks.
Research center :
Interdisciplinary Centre for Security, Reliability and Trust (SnT) > TruX - Trustworthy Software Engineering
Disciplines :
Computer science
Author, co-author :
LOTHRITZ, Cedric  ;  University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > TruX
LEBICHOT, Bertrand ;  University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust > TruX > Team Jacques KLEIN
ALLIX, Kevin ;  University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust > TruX > Team Jacques KLEIN
EZZINI, Saad ;  University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust > TruX > Team Jacques KLEIN
BISSYANDE, Tegawendé François d Assise  ;  University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > TruX
KLEIN, Jacques  ;  University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > TruX
Boytsov, Andrey;  BGL BNP Paribas
Lefebvre, Clément;  BGL BNP Paribas
Goujon, Anne;  BGL BNP Paribas
External co-authors :
yes
Language :
English
Title :
Evaluating the Impact of Text De-Identification on Downstream NLP Tasks
Publication date :
May 2023
Event name :
24th Nordic Conference on Computational Linguistics
Event date :
May, 2023
Main work title :
Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)
Publisher :
University of Tartu Library, Tartu, Estonia
Peer reviewed :
Peer reviewed
Focus Area :
Computational Sciences
FnR Project :
FNR16229163 - Multilingual Nlp Coping With Luxembourg Specificities For The Financial Industry, 2021 (01/01/2022-31/12/2024) - Jacques Klein
Name of the research project :
Multilingual Nlp Coping With Luxembourg Specificities For The Financial Industry
Available on ORBilu :
since 23 November 2023

Statistics


Number of views
120 (7 by Unilu)
Number of downloads
24 (0 by Unilu)

Scopus citations®
 
7
Scopus citations®
without self-citations
7

Bibliography


Similar publications



Contact ORBilu