Requirements Engineering (RE); Regulatory Compliance; Natural Language Processing (NLP); Question Answering; Language Models (LMs); BERT
Résumé :
[en] We introduce COREQQA, a tool for assisting requirements engineers in acquiring a better understanding of compliance requirements by means of automated Question Answering. Extracting compliance-related requirements by manually navigating through a legal document is both time-consuming and error-prone. COREQQA enables requirements engineers to pose questions in natural language about a compliance-related topic given some legal document, e.g., asking about data breach. The tool then automatically navigates through the legal document and returns to the requirements engineer a list of text passages containing the possible answers to the input question. For better readability, the tool also highlights the likely answers in these passages. The engineer can then use this output for specifying compliance requirements. COREQQA is developed using advanced large-scale language models from BERT’s family. COREQQA has been evaluated on four legal documents. The results of this evaluation are briefly presented in the paper. The tool is publicly available on Zenodo (https://doi.org/10.5281/zenodo.6653514).
Centre de recherche :
Interdisciplinary Centre for Security, Reliability and Trust (SnT) > SVV - Software Verification and Validation
Disciplines :
Sciences informatiques
Auteur, co-auteur :
ABUALHAIJA, Sallam ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SVV
Arora, Chetan
BRIAND, Lionel ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SVV
Co-auteurs externes :
yes
Langue du document :
Anglais
Titre :
COREQQA: A COmpliance REQuirements Understanding using Question Answering Tool
Date de publication/diffusion :
2022
Nom de la manifestation :
ACM SIGSOFT CONFERENCE ON THE FOUNDATIONS OF SOFTWARE ENGINEERING
Date de la manifestation :
14-11-2022 to 18-11-2022
Titre de l'ouvrage principal :
ACM SIGSOFT CONFERENCE ON THE FOUNDATIONS OF SOFTWARE ENGINEERING
Maison d'édition :
Association for Computing Machinery
Pagination :
The ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2022 - Tool Demonstrations)
2020. Law of 25 March 2020 (coordinated version) establishing a central electronic data retrieval system related to IBAN accounts and safe-deposit boxes. https: //www. cssf. lu/en/Document/law-of-25-march-2020-data-retrieval/
Sallam Abualhaija, Chetan Arora, Amin Sleimi, and Lionel Briand. 2022. Automated Question Answering for Improved Understanding of Compliance Requirements: A Multi-Document Study. In 30th IEEE International Requirements Engineering Conference (RE'22).
Akiko Aizawa. 2003. An information-theoretic perspective of tf-idf measures. Information Processing & Management 39, 1 (2003), 45-65. https://doi. org/10. 1016/S0306-4573(02)00021-3
Muneera Bano, Chetan Arora, Didar Zowghi, and Alessio Ferrari. 2021. The Rise and Fall of COVID-19 Contact-Tracing Apps: When NFRs Collide with Pandemic. In 2021 IEEE 29th International Requirements Engineering Conference (RE). IEEE, 106-116. https://doi. org/10. 1109/RE51729. 2021. 00017
Brian Berenbach, Daniel J Paulish, Juergen Kazmeier, and Arnold Rudorfer. 2009. Software & systems requirements engineering: In practice. McGraw-Hill Education.
Steven Bird, Ewan Klein, and Edward Loper. 2009. Natural Language Processing with Python. O'Reilly.
Kevin Clark, Minh-Thang Luong, Quoc V. Le, and Christopher D. Manning. 2020. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators. CoRR abs/2003. 10555 (2020). https://doi. org/10. 48550/arXiv. 2003. 10555 arXiv:2003. 10555
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. CoRR abs/1810. 04805 (2018). https://doi. org/10. 48550/arXiv. 1810. 04805 arXiv:1810. 04805
EU (2019/770). 2019. Directive (EU) 2019/770 of the European Parliament and of the Council of 20 May 2019 on certain aspects concerning contracts for the supply of digital content and digital services, OJ L 136, 22. 5. 2019, p. 1-27. http: //data. europa. eu/eli/dir/2019/770/oj
EU (2019/771). 2019. Directive (EU) 2019/771 of the European Parliament and of the Council of 20 May 2019 on certain aspects concerning contracts for the sale of goods, amending Regulation (EU) 2017/2394 and Directive 2009/22/EC, and repealing Directive 1999/44/EC, OJ L 136, 22. 5. 2019, p. 28-50. http://data. europa. eu/eli/dir/2019/771/oj
EU (GDPR). 2016. Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation), OJ L 119, 4. 5. 2016, p. 1-88. http://data. europa. eu/eli/reg/2016/679/oj
Mattia Fazzini, Hourieh Khalajzadeh, Omar Haggag, Zhaoqing Li, Humphrey Obie, Chetan Arora, Waqar Hussain, and John Grundy. 2022. Characterizing Human Aspects in Reviews of COVID-19 Apps. In 9th IEEE/ACM International Conference on Mobile Software Engineering and Systems. https://doi. org/10. 1145/ 3524613. 3527814
Marijn Janssen, Paul Brous, Elsa Estevez, Luis S Barbosa, and Tomasz Janowski. 2020. Data governance: Organizing data for trustworthy Artificial Intelligence. Government Information Quarterly 37, 3 (2020), 101493. https://doi. org/10. 1016/j. giq. 2020. 101493
Dan Jurafsky and James H. Martin. 2020. Speech and Language Processing (3rd ed.). https://web. stanford. edu/~jurafsky/slp3/ (visited on 2022-01-04).
Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2019. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. CoRR abs/1909. 11942 (2019). https: //doi. org/10. 48550/arXiv. 1909. 11942 arXiv:1909. 11942
Dorothy E Leidner and Olgerta Tona. 2021. The CARE Theory of Dignity Amid Personal Data Digitalization. MIS Quarterly 45, 1 (2021).
Edward Loper and Steven Bird. 2002. NLTK: The Natural Language Toolkit. In Proceedings of the ACL-02 Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics.
Paul N Otto and Annie I Antón. 2007. Addressing legal requirements in requirements engineering. In 15th IEEE international requirements engineering conference (RE 2007). IEEE, 5-14. https://doi. org/10. 1109/RE. 2007. 65
Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. CoRR abs/1908. 10084 (2019). https://doi. org/10. 48550/arXiv. 1908. 10084 arXiv:1908. 10084
Amin Sleimi, Marcello Ceci, Nicolas Sannier, Mehrdad Sabetzadeh, Lionel Briand, and John Dann. 2019. A Query System for Extracting Requirements-Related Information from Legal Texts. In 27th IEEE International Requirements Engineering Conference. IEEE. https://doi. org/10. 1109/RE. 2019. 00041
Amin Sleimi, Nicolas Sannier, Mehrdad Sabetzadeh, Lionel Briand, and John Dann. 2018. Automated Extraction of Semantic Legal Metadata using Natural Language Processing. In Proceedings of the 26th IEEE International Requirements Engineering Conference. https://doi. org/10. 1109/re. 2018. 00022