A Multi-solution Study on GDPR AI-enabled Completeness Checking of DPAs

Requirements Engineering (RE); The General Data Protection Regulation (GDPR); Regulatory Compliance; Data Processing Agreements (DPAs); Artificial Intelligence (AI); Natural Language Processing (NLP); Classification; Large Language Models (LLMs); Few-shot Learning (FSL); Data Augmentation

Abstract :

[en] Specifying legal requirements for software systems to ensure their compliance with the applicable regulations is a major concern of requirements engineering. Personal data which is collected by an organization is often shared with other organizations to perform certain processing activities. In such cases, the General Data Protection Regulation (GDPR) requires issuing a data processing agreement (DPA) which regulates the processing and further ensures that personal data remains protected. Violating GDPR can lead to huge fines reaching to billions of Euros. Software systems involving personal data processing must adhere to the legal obligations stipulated both at a general level in GDPR as well as the obligations outlined in DPAs highlighting specific business. In other words, a DPA is yet another source from which requirements engineers can elicit legal requirements. However, the DPA must be complete according to GDPR to ensure that the elicited requirements cover the complete set of obligations. Therefore, checking the completeness of DPAs is a prerequisite step towards developing a compliant system. Analyzing DPAs with respect to GDPR entirely manually is time consuming and requires adequate legal expertise. In this paper, we propose an automation strategy that addresses the completeness checking of DPAs against GDPR provisions as a text classification problem. Specifically, we pursue ten alternative solutions which are enabled by different technologies, namely traditional machine learning, deep learning, language modeling, and few-shot learning. The goal of our work is to empirically examine how these different technologies fare in the legal domain. We computed F2 score on a set of 30 real DPAs. Our evaluation shows that best-performing solutions yield F2 score of 86.7% and 89.7% are based on pre-trained BERT and RoBERTa language models. Our analysis further shows that other alternative solutions based on deep learning (e.g., BiLSTM) and few-shot learning (e.g., SetFit) can achieve comparable accuracy, yet are more efficient to develop.

Research center :

Interdisciplinary Centre for Security, Reliability and Trust (SnT) > SVV - Software Verification and Validation
NCER-FT - FinTech National Centre of Excellence in Research

Disciplines :

Computer science

Author, co-author :

Ilyas Azeem, Muhammad; Unilu - University of Luxembourg [LU] > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SVV

ABUALHAIJA, Sallam ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SVV

External co-authors :

yes

Language :

English

Title :

A Multi-solution Study on GDPR AI-enabled Completeness Checking of DPAs

Publication date :

14 June 2024

Journal title :

Empirical Software Engineering

ISSN :

1382-3256

eISSN :

1573-7616

Publisher :

Kluwer Academic Publishers, Netherlands

Volume :

Issue :

Peer reviewed :

Peer Reviewed verified by ORBi

FnR Project :

FNR16570468 - 2021 (01/07/2022-30/06/2030) - Yves Le Traon

Name of the research project :

U-AGR-7501 - NCER22/IS/16570468/NCER-FT_AFRICA_UL - BIANCULLI Domenico

Funders :

FNR - Fonds National de la Recherche

Funding number :

NCER22/IS/16570468/NCERFT; BRIDGES/19/IS/13759068/ARTAGO

Available on ORBilu :

since 27 November 2023

Statistics

Number of views

195 (24 by Unilu)

Number of downloads

104 (1 by Unilu)

More statistics

OpenAlex citations