Automated Extraction and Clustering of Requirements Glossary Terms

ARORA, Chetan; SABETZADEH, Mehrdad; BRIAND, Lionel; Zimmer, Frank

doi:10.1109/TSE.2016.2635134

Download

Article (Scientific journals)

Automated Extraction and Clustering of Requirements Glossary Terms

ARORA, Chetan; SABETZADEH, Mehrdad; BRIAND, Lionel et al.

2017 • In IEEE Transactions on Software Engineering, 43 (10), p. 918-945

Peer reviewed

Permalink
https://hdl.handle.net/10993/28943

DOI
10.1109/TSE.2016.2635134

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

ASBZ_TSE_Manuscript_revised.pdf

Author postprint (2.65 MB)

Download

All documents in ORBilu are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

Requirements Glossaries; Term Extraction; Natural Language Processing; Clustering; Case Study Research

Abstract :

[en] A glossary is an important part of any software requirements document. By making explicit the technical terms in a domain and providing definitions for them, a glossary helps mitigate imprecision and ambiguity. A key step in building a glossary is to decide upon the terms to include in the glossary and to find any related terms. Doing so manually is laborious, particularly for large requirements documents. In this article, we develop an automated approach for extracting candidate glossary terms and their related terms from natural language requirements documents. Our approach differs from existing work on term extraction mainly in that it clusters the extracted terms by relevance, instead of providing a flat list of terms. We provide an automated, mathematically-based procedure for selecting the number of clusters. This procedure makes the underlying clustering algorithm transparent to users, thus alleviating the need for any user-specified parameters. To evaluate our approach, we report on three industrial case studies, as part of which we also examine the perceptions of the involved subject matter experts about the usefulness of our approach. Our evaluation notably suggests that: (1) Over requirements documents, our approach is more accurate than major generic term extraction tools. Specifically, in our case studies, our approach leads to gains of 20% or more in terms of recall when compared to existing tools, while at the same time either improving precision or leaving it virtually unchanged. And, (2) the experts involved in our case studies find the clusters generated by our approach useful as an aid for glossary construction.

Research center :

Interdisciplinary Centre for Security, Reliability and Trust - SnT

Disciplines :

Computer science

Author, co-author :

ARORA, Chetan ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)

SABETZADEH, Mehrdad ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)

BRIAND, Lionel ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)

Zimmer, Frank; SES Techcom, Luxembourg

External co-authors :

Language :

English

Title :

Automated Extraction and Clustering of Requirements Glossary Terms

Publication date :

October 2017

Journal title :

IEEE Transactions on Software Engineering

ISSN :

0098-5589

Publisher :

Institute of Electrical and Electronics Engineers, New York, United States - New York

Volume :

Issue :

Pages :

918-945

Peer reviewed :

Peer reviewed

Focus Area :

Computational Sciences

European Projects :

H2020 - 694277 - TUNE - Testing the Untestable: Model Testing of Complex Software-Intensive Systems

FnR Project :

FNR6911386 - Enhancing The Automation And Accuracy Of Requirements Quality Assurance Processes Via Disciplined Use Of Natural Language, 2013 (01/09/2013-31/10/2016) - Chetan Arora

Funders :

FNR - Fonds National de la Recherche
CE - Commission Européenne

Available on ORBilu :

since 29 November 2016

Statistics

Number of views

617 (124 by Unilu)

Number of downloads

1191 (57 by Unilu)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

OpenAlex citations

100

WoS citations^™