Article (Scientific journals)
Classifier or prompt: A case study on legal requirements traceability
Etezadi, Romina; ABUALHAIJA, Sallam; Arora, Chetan et al.
2026In Empirical Software Engineering
Peer Reviewed verified by ORBi
 

Files


Full Text
2026-EMSE-EAAB.pdf
Author postprint (1.11 MB)
Download

All documents in ORBilu are protected by a user license.

Send to



Details



Keywords :
Requirements Traceability; Sentence Transformers (ST); Natural Language Processing (NLP); Machine Learning (ML); The General Data Protection Regulation (GDPR); Regulatory Compliance; Large Language Models (LLMs); RICE; Prompting Framework
Abstract :
[en] New regulations are continually introduced to ensure that software development complies with ethical concerns and prioritizes public safety. A prerequisite for demonstrating compliance involves tracing software requirements to legal provisions. Requirements traceability is a fundamental task where requirements engineers are supposed to analyze technical requirements against target artifacts, often under a limited time budget. Doing this analysis manually for complex systems with hundreds of requirements is infeasible. The legal dimension introduces additional challenges that increase manual effort. In this paper, we investigate two automated solutions based on language models, including large ones (LLMs). The first solution, K ashif, is a classifier that leverages sentence transformers and semantic similarity. The second solution, Rice_LRT, prompts a recent LLM based on Rice, a prompt engineering framework. Using a publicly available benchmark dataset, we empirically evaluate K ashif and compare it against seven baseline classifiers from the literature (LSI, LDA, GloVe, TraceBERT, RoBERTa, and LLaMa). K ashif can identify trace links with F2 score of ~63%, outperforming the best baseline by a substantial margin of 21 percentage points (pp) in F2 score. On a newly created and more complex requirements document traced to the European general data protection regulation (GDPR), Rice_LRT outperforms K ashif and baseline prompts in the literature by achieving an average recall of 84% and F2 score of 61%, improving the F2 score by 34 pp compared to the best baseline prompt. Our results indicate that requirements traceability in legal contexts cannot be adequately addressed by techniques proposed in the literature that are not specifically designed for legal artifacts. Furthermore, we demonstrate that our engineered prompt outperforms both classifier-based approaches and baseline prompts.
Research center :
Interdisciplinary Centre for Security, Reliability and Trust (SnT) > SVV - Software Verification and Validation
Disciplines :
Computer science
Author, co-author :
Etezadi, Romina
ABUALHAIJA, Sallam  ;  University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SVV
Arora, Chetan
Briand, Lionel
External co-authors :
yes
Language :
English
Title :
Classifier or prompt: A case study on legal requirements traceability
Publication date :
2026
Journal title :
Empirical Software Engineering
ISSN :
1382-3256
eISSN :
1573-7616
Publisher :
Kluwer Academic Publishers, Netherlands
Peer reviewed :
Peer Reviewed verified by ORBi
FnR Project :
FNR17958091 - PLAITO - Automated Completeness Enhancement Of Requirements Towards Improved Trustworthiness, 2023 (01/09/2024-31/08/2027) - Sallam Abualhaija
FNR16570468 - NCER-FT - 2021 (01/03/2023-28/02/2025) - Gilbert Fridgen
Available on ORBilu :
since 15 December 2025

Statistics


Number of views
69 (11 by Unilu)
Number of downloads
1 (0 by Unilu)

OpenCitations
 
0

Bibliography


Similar publications



Contact ORBilu