Improving Requirements Glossary Construction via Clustering: Approach and Industrial Case Studies

ARORA, Chetan; SABETZADEH, Mehrdad; BRIAND, Lionel; Zimmer, Frank

Request a copy

Paper published in a book (Scientific congresses, symposiums and conference proceedings)

Improving Requirements Glossary Construction via Clustering: Approach and Industrial Case Studies

ARORA, Chetan; SABETZADEH, Mehrdad; BRIAND, Lionel et al.

2014 • In 8th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM 2014)

Peer reviewed

Permalink
https://hdl.handle.net/10993/16768

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

ASBZ_ESEM14.pdf

Author postprint (1.35 MB)

Request a copy

All documents in ORBilu are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

Glossary; Term Extraction; Case Study Research; Natural Language Processing (NLP); Clustering

Abstract :

[en] Context. A glossary is an important part of any software requirements document. By making explicit the technical terms in a domain and providing definitions for them, a glossary serves as a helpful tool for mitigating ambiguities. Objective. A necessary step for building a glossary is to decide upon the glossary terms and to identify their related terms. Doing so manually is a laborious task. Our objective is to provide automated support for identifying candidate glossary terms and their related terms. Our work differs from existing work on term extraction mainly in that, instead of providing a flat list of candidate terms, our approach \emph{clusters} the terms by relevance. Method. We use case study research as the basis for our empirical investigation. Results. We present an automated approach for identifying and clustering candidate glossary terms. We evaluate the approach through two industrial case studies; one study concerns a satellite software component, and the other -- an evidence management tool for safety certification. Conclusion. Our results indicate that over requirements documents: (1) our approach is more accurate than other existing methods for identifying candidate glossary terms; this makes it less likely that our approach will miss important glossary terms. (2) Clustering provides an effective basis for grouping related terms; this makes clustering a useful support tool for selection of glossary terms and associating these terms with their related terms.

Disciplines :

Computer science

Author, co-author :

ARORA, Chetan ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)

SABETZADEH, Mehrdad ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)

BRIAND, Lionel ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) ; University of Luxembourg > Faculty of Science, Technology and Communication (FSTC) > Computer Science and Communications Research Unit (CSC)

Zimmer, Frank; SES TechCom

External co-authors :

Language :

English

Title :

Improving Requirements Glossary Construction via Clustering: Approach and Industrial Case Studies

Publication date :

September 2014

Event name :

8th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM 2014)

Event place :

Italy

Event date :

18-09-2014 to 19-09-2014

Main work title :

8th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM 2014)

Peer reviewed :

Peer reviewed

Funders :

FNR - Fonds National de la Recherche

Available on ORBilu :

since 16 May 2014

Statistics

Number of views

342 (32 by Unilu)

Number of downloads

10 (5 by Unilu)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

Bibliography

C. Aguilera and D. Berry. The use of a repeated phrase finder in requirements extraction. JSS, 13(3):209-230, 1990.
E. Amigó, J. Gonzalo, J. Artiles, and F. Verdejo. A comparison of extrinsic clustering evaluation metrics based on formal constraints. IR, 12(4):461-486, 2009.
Apache's OpenNLP. http://opennlp.apache.org/.
C. Arora, M. Sabetzadeh, L. Briand, F. Zimmer, and R. Gnaga. Automatic checking of conformance to requirement boilerplates via text chunking: An industrial case study. In ESEM'13, 2013.
K. Barker and N. Cornacchia. Using noun phrase heads to extract document keyphrases. In Advances in Artificial Intelligence. 2000.
D. Bourigault. Surface grammatical analysis for the extraction of terminological noun phrases. In 14th Conf. on Computational Linguistics, 1992.
W. Cohen, P. Ravikumar, and S. Fienberg. A comparison of string distance metrics for name-matching tasks. In Wrkshp. on Information Integration on the Web, 2003.
A. Dempster, N. Laird, and D. Rubin. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc., 39(1):1-38, 1977.
A. Dwarakanath, R. Ramnani, and S. Sengupta. Automatic extraction of glossary terms from natural language requirements. In RE, 2013.
K. Frantzi, S. Ananiadou, and H. Mima. Automatic recognition of multi-word terms: the C-value/NC-value method. IJDL, 3(2):115-130, 2000.
GATE NLP Workbench. http://gate.ac.uk/.
L. Goldin and D. Berry. AbstFinder: a prototype natural language text abstraction finder for use in requirements elicitation. ASE J., 4(4):375-412, 1997.
L. Jones, E. Gassie, and S. Radhakrishnan. INDEX: The statistical basis for an automatic conceptual phrase-indexing system. ASIS&T, 41(2), 1990.
C. Larman. Applying UML and Patterns. Prentice Hall, 3rd edition, 2004.
C. Manning, P. Raghavan, and H. Schütze. Introduction to Information Retrieval. Cambridge, 2008.
mclust. http://www.stat.washington.edu/mclust/.
A. Monge and C. Elkan. Efficient domain-independent detection of approximately duplicate database records. In Wrkshp. on Research Issues in on Knowledge Discovery and Data Mining, 1997.
S. Nejati, M. Sabetzadeh, M. Chechik, S. Easterbrook, and P. Zave. Matching and merging of variant feature specifications. IEEE TSE, 38(6), 2012.
M. Pazienza, M. Pennacchiotti, and F. Zanzotto. Terminology extraction: an analysis of linguistic and statistical approaches. In Knowledge Mining. 2005.
K. Pohl. Requirements Engineering - Fundamentals, Principles, and Techniques. Springer, 2010.
L. Ramshaw and M. Marcus. Text chunking using transformation-based learning. In Natural language processing using very large corpora. Springer, 1999.
A. Rosenberg and J. Hirschberg. V-measure: A conditional entropy-based external cluster evaluation measure. In EMNLP-CoNLL, volume 7, 2007.
F. Scholz. Maximum likelihood estimation. Encyclopedia of Statistical Sciences, 1985.
G. Schwarz et al. Estimating the dimension of a model. The Annals of Statistics, 6:461-464, 1978.
Simpack: A generic Java library for similarity measures in ontologies. http://www.ifi.uzh.ch/ddis/simpack.html.
TermRaider. http://gate.ac.uk/projects/neon/.
TextRank. http://github.com/ceteri/textrank/.
The R project. http://www.r-project.org/.
TOPIA. http://pypi.python.org/pypi/topia.termextract/.
WordNet. http://wordnet.princeton.edu.
Wordnet::similarity. http://wn-similarity.sourceforge.net.
R. Young. The Requirements Engineering Handbook. Artech House, 2004.
Z. Zhang, J. Iria, C. Brewster, and F. Ciravegna. A comparative evaluation of term recognition algorithms. In 6th Intl. Conf. on Lang. Resources and Evaluation, 2008.
Y. Zhao and G. Karypis. Evaluation of hierarchical clustering algorithms for document datasets. In 11 Intl. Conf. on Information and Knowledge Management, 2002.
X. Zou, R. Settimi, and J. Cleland-Huang. Improving automated requirements trace retrieval: a study of term-based enhancement methods. ESE, 15(2), 2010.