Classification or Prompting: A Case Study on Legal Requirements Traceability

Etezadi, Romina; ABUALHAIJA, Sallam; Arora, Chetan; Briand, Lionel

doi:10.48550/arXiv.2502.04916

No full text

Eprint already available on another site (E-prints, Working papers and Research blog)

Classification or Prompting: A Case Study on Legal Requirements Traceability

Etezadi, Romina; ABUALHAIJA, Sallam; Arora, Chetan et al.

2025

Permalink
https://hdl.handle.net/10993/66854

DOI
10.48550/arXiv.2502.04916

arXiV
2502.04916v5

Files (0)Send to Details Statistics Bibliography Similar publications

Files

Full Text

No document available.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

Computer Science - Software Engineering

Abstract :

[en] New regulations are introduced to ensure software development aligns with ethical concerns and protects public safety. Showing compliance requires tracing requirements to legal provisions. Requirements traceability is a key task where engineers must analyze technical requirements against target artifacts, often within limited time. Manually analyzing complex systems with hundreds of requirements is infeasible. The legal dimension adds challenges that increase effort. In this paper, we investigate two automated solutions based on language models, including large ones (LLMs). The first solution, Kashif, is a classifier that leverages sentence transformers and semantic similarity. The second solution, RICE_LRT, prompts a recent LLM based on RICE, a prompt engineering framework. Using a publicly available benchmark dataset, we empirically evaluate Kashif and compare it against seven baseline classifiers from the literature (LSI, LDA, GloVe, TraceBERT, RoBERTa, and LLaMa). Kashif can identify trace links with F2 score of 63%, outperforming the best baseline by a substantial margin of 21 percentage points (pp) in F2 score. On a newly created and more complex requirements document traced to the European general data protection regulation (GDPR), RICE_LRT outperforms Kashif and baseline prompts in the literature by achieving an average recall of 84% and F2 score of 61%, improving the F2 score by 34 pp compared to the best baseline prompt. Our results indicate that requirements traceability in legal contexts cannot be adequately addressed by techniques proposed in the literature that are not specifically designed for legal artifacts. Furthermore, we demonstrate that our engineered prompt outperforms both classifier-based approaches and baseline prompts.

Research center :

Interdisciplinary Centre for Security, Reliability and Trust (SnT) > SVV - Software Verification and Validation

Disciplines :

Computer science

Author, co-author :

Etezadi, Romina

ABUALHAIJA, Sallam ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SVV

Arora, Chetan

Briand, Lionel

Language :

English

Title :

Classification or Prompting: A Case Study on Legal Requirements Traceability

Publication date :

2025

Source :

https://doi.org/10.48550/arXiv.2502.04916

FnR Project :

FNR17958091 - PLAITO - Automated Completeness Enhancement Of Requirements Towards Improved Trustworthiness, 2023 (01/09/2024-31/08/2027) - Sallam Abualhaija
FNR16570468 - NCER-FT - 2021 (01/03/2023-28/02/2025) - Gilbert Fridgen

Available on ORBilu :

since 15 December 2025

Statistics

Number of views

27 (9 by Unilu)

Number of downloads

0 (0 by Unilu)

More statistics

OpenCitations

OpenAlex citations