Legal Compliance; Privacy Policies; The General Data Protection Regulation (GDPR); Natural Language Processing (NLP); Machine Learning (ML); Case Study Research
Research center :
Interdisciplinary Centre for Security, Reliability and Trust (SnT) > SVV - Software Verification and Validation
Disciplines :
Computer science
Author, co-author :
TORRE, Damiano ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)
ABUALHAIJA, Sallam ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)
SABETZADEH, Mehrdad ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)
BRIAND, Lionel ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)
Baetens, Katrien; Linklaters LLP
Goes, Peter; Linklaters LLP
Forastier, Sylvie; Linklaters LLP
External co-authors :
yes
Language :
English
Title :
An AI-assisted Approach for Checking the Completeness of Privacy Policies Against GDPR
Publication date :
September 2020
Event name :
The 28th IEEE International Requirements Engineering Conference (RE’20)
Event date :
from 31-08-2020 to 04-09-2020
Audience :
International
Main work title :
Proceedings of the 28th IEEE International Requirements Engineering Conference (RE’20), Zurich, Switzerland, August 31 - September 04, 2020
Privacy policies are critical for helping individuals make informed decisions about their personal data.
In Europe, privacy policies are subject to compliance with the General Data Protection Regulation (GDPR).
If done entirely manually, checking whether a given privacy policy complies with GDPR is both time-consuming and error-prone. Automated support for this task is thus advantageous. At the moment, there is an evident lack of such support on the market.
In this paper, we tackle an important dimension of GDPR compliance checking for privacy policies. Specifically, we provide automated support for checking whether the content of a given privacy policy is complete according to the provisions stipulated by GDPR. To do so, we present: (1) a conceptual model to characterize the information content envisaged by GDPR for privacy policies, (2) an AI-assisted approach for classifying the information content in GDPR privacy policies and subsequently checking how well the classified content meets the completeness criteria of interest; and (3) an evaluation of our approach through a case study over 24 unseen privacy policies. For classification, we leverage a combination of Natural Language Processing and supervised Machine Learning. Our experimental material is comprised of 234 real privacy policies from the fund industry.
Our empirical results indicate that our approach detected 45 of the total of 47 incompleteness issues in the 24 privacy policies it was applied to. Over these policies, the approach had eight false positives. The approach thus has a precision of 85% and recall of 96% over our case study.
European Union, "General data protection regulation, " Official Journal of the European Union, 2018. [Online]. Available: http://eur-lex.europa. eu/legal-content/EN/TXT/PDF/?uri=CELEX:32016R0679
EU-GDPR. (2019) EU GDPR portal. [Online]. Available: https: //eugdpr.org
C. Tankard, "What the GDPR means for businesses, " Network Security, vol. 6, pp. 5-8, 2016.
C. Perera, M. Barhamgi, A. K. Bandara, M. Ajmal, B. A. Price, and B. Nuseibeh, "Designing privacy-Aware internet of things applications, " Inf. Sci., vol. 512, pp. 238-257, 2020.
D. Torre, G. Soltana, M. Sabetzadeh, L. C. Briand, Y. Auffinger, and P. Goes, "Using models to enable compliance checking against the GDPR: An experience report, " in 22nd ACM/IEEE International Conference on Model Driven Engineering Languages and Systems, MODELS 2019, Munich, Germany, September 15-20, 2019, 2019, pp. 1-11.
V. Ayala-Rivera and L. Pasquale, "The grace period has ended: An approach to operationalize GDPR requirements, " in Proceedings of 31st IEEE International Conference on Requirements Engineering (RE?18), 2018, pp. 136-146.
J. Caramujo, A. Rodrigues da Silva, S. Monfared, A. Ribeiro, P. Calado, and T. Breaux, "RSL-IL4Privacy: A domain-specific language for the rigorous specification of privacy policies, " Requirements Engineering, vol. 24, no. 1, pp. 1-26, 2019.
J. Bhatia and T. D. Breaux, "Semantic incompleteness in privacy policy goals, " in 26th IEEE International Requirements Engineering Conference, RE 2018, Banff, AB, Canada, August 20-24, 2018, 2018, pp. 159-169.
R. Eckart de Castilho and I. Gurevych, "A broad-coverage collection of portable NLP components for building shareable analysis pipelines, " in Proceedings of the Workshop on Open Infrastructures and Analysis Frameworks for HLT (OIAF4HLT?14), 2014, pp. 1-11.
S. Abualhaija, C. Arora, M. Sabetzadeh, L. Briand, and E. Vaz, "A machine learning-based approach for demarcating requirements in textual specifications, " in Proceedings of the 27th IEEE International Requirements Engineering Conference (RE?19), 2019.
T. Mikolov, W.-T. Yih, and G. Zweig, "Linguistic regularities in continuous space word representations, " in Proceedings of the 2013 conference of the north american chapter of the association for computational linguistics: Human language technologies, 2013, pp. 746-751.
T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, "Distributed representations of words and phrases and their compositionality, " in Advances in neural information processing systems, 2013, pp. 3111-3119.
T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Efficient estimation of word representations in vector space, " arXiv preprint arXiv:1301.3781, 2013.
J. Pennington, R. Socher, and C. D. Manning, "Glove: Global vectors for word representation, " in Empirical Methods in Natural Language Processing (EMNLP), 2014, pp. 1532-1543. [Online]. Available: http://www.aclweb.org/anthology/D14-1162
E. D. D. Team, "Deeplearning4j: Open-source distributed deep learning for the jvm, apache software foundation license 2.0, " 2020, last accessed: January 2020. [Online]. Available: http://deeplearning4j.org
C. Manning, P. Raghavan, and H. Schutze, Introduction to Information Retrieval. Cambridge, 2008.
J. H. Hayes, W. Li, and M. Rahimi, "Weka meets tracelab: Toward convenient classification: Machine learning for requirements engineering problems: A position paper, " in Artificial Intelligence for Requirements Engineering (AIRE), 2014 IEEE 1st International Workshop on. IEEE, 2014, pp. 9-12.
F. Eibe, M. Hall, and I. Witten, "The weka workbench. online appendix for" data mining: Practical machine learning tools and techniques, " Morgan Kaufmann, 2016.
I. H. Witten, E. Frank, M. A. Hall, and C. J. Pal, Data Mining: Practical Machine Learning Tools and Techniques, 4th ed. Morgan Kaufmann, 2016.
D. Torre, S. Abualhaija, M. Sabetzadeh, and L. C. Briand, Glossary and completeness criteria, available at http://tiny.cc/x3gkqz, http://tiny. cc/5il9jz, June 2020.
J. Saldana, The Coding Manual for Qualitative Researchers. SAGE Publishing, 2016.
European Union, "Article 29 working party-guidelines on data protection officers (dpos), " Justice and Consumers, 2018.
G. Soltana, N. Sannier, M. Sabetzadeh, and L. C. Briand, "Model-based simulation of legal policies: Framework, tool support, and validation, " Software & Systems Modeling, vol. 17, no. 3, pp. 851-883, 2018.
L. Michaelis, "Word meaning, sentence meaning, and syntactic meaning, " Cognitive approaches to lexical semantics, vol. 23, pp. 163-209, 2003.
D. Torre, S. Abualhaija, M. Sabetzadeh, and L. C. Briand, Dataset of privacy policies annotated, available at http://tiny.cc/n33xqz, June 2020.
S. Arora, Y. Liang, and T. Ma, "A simple but tough-To-beat baseline for sentence embeddings, " in 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, 2017.
S. Wang and C. D. Manning, "Baselines and bigrams: Simple, good sentiment and topic classification, " in Proceedings of the 50th annual meeting of the association for computational linguistics: Short papersvolume 2. Association for Computational Linguistics, 2012, pp. 90-94.
P. Pullonen, J. Tom, R. Matulevicius, and A. Toots, "Privacy-enhanced BPMN: Enabling data privacy analysis in business processes models, " Software & Systems Modeling, pp. 1-30, 2019.
N. V. N. Kumar and R. K. Shyamasundar, "Realizing purpose-based privacy policies succinctly via information-flow labels, " in 2014 IEEE Fourth International Conference on Big Data and Cloud Computing, BDCloud 2014, Sydney, Australia, December 3-5, 2014, 2014, pp. 753-760.
W. B. Tesfay, P. Hofmann, T. Nakamura, S. Kiyomoto, and J. Serna, "Privacyguide: Towards an implementation of the EU GDPR on internet privacy policy evaluation, " in Proceedings of the Fourth ACM International Workshop on Security and Privacy Analytics, IWSPA@CODASPY 2018, Tempe, AZ, USA, March 19-21, 2018, 2018, pp. 15-21.
J. Bhatia, T. D. Breaux, and F. Schaub, "Mining privacy goals from privacy policies using hybridized task recomposition, " ACM Trans. Softw. Eng. Methodol., vol. 25, no. 3, pp. 22:1-22:24, 2016.
F. Liu, R. Ramanath, N. M. Sadeh, and N. A. Smith, "A step towards usable privacy policy: Automatic alignment of privacy statements, " in COLING 2014, 25th International Conference on Computational Linguistics, Proceedings of the Conference: Technical Papers, August 23-29, 2014, Dublin, Ireland, 2014, pp. 884-894.
S. Wilson, F. Schaub, R. Ramanath, N. M. Sadeh, F. Liu, N. A. Smith, and F. Liu, "Crowdsourcing annotations for websites? privacy policies: Can it really work?" in Proceedings of the 25th International Conference on World Wide Web, WWW 2016, Montreal, Canada, April 11-15, 2016, 2016, pp. 133-143.
M. Guerriero, D. A. Tamburri, and E. D. Nitto, "Defining, enforcing and checking privacy policies in data-intensive applications, " in Proceedings of the 13th International Conference on Software Engineering for Adaptive and Self-Managing Systems, SEAMS@ICSE 2018, Gothenburg, Sweden, May 28-29, 2018, 2018, pp. 172-182.
J. Bhatia, M. C. Evans, and T. D. Breaux, "Identifying incompleteness in privacy policy goals using semantic frames, " Requir. Eng., vol. 24, no. 3, pp. 291-313, 2019.
M. Lippi, P. Palka, G. Contissa, F. Lagioia, H.-W. Micklitz, G. Sartor, and P. Torroni, "Claudette: An automated detector of potentially unfair clauses in online terms of service, " Artificial Intelligence and Law, vol. 27, no. 2, pp. 117-139, 2019