Automated Demarcation of Requirements in Textual Specifications: A Machine Learning-Based Approach

ABUALHAIJA, Sallam; Arora, Chetan; SABETZADEH, Mehrdad; BRIAND, Lionel; Traynor, Michael

doi:10.1007/s10664-020-09864-1

Download

Article (Scientific journals)

Automated Demarcation of Requirements in Textual Specifications: A Machine Learning-Based Approach

ABUALHAIJA, Sallam; Arora, Chetan; SABETZADEH, Mehrdad et al.

2020 • In Empirical Software Engineering

Peer Reviewed verified by ORBi

Permalink
https://hdl.handle.net/10993/43584

DOI
10.1007/s10664-020-09864-1

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

Abualhaija20.pdf

Publisher postprint (4.3 MB)

Download

All documents in ORBilu are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

Natural-language Requirements; Requirements Identification and Classification; Machine Learning; Natural language processing

Abstract :

[en] A simple but important task during the analysis of a textual requirements specification is to determine which statements in the specification represent requirements. In principle, by following suitable writing and markup conventions, one can provide an immediate and unequivocal demarcation of requirements at the time a specification is being developed. However, neither the presence nor a fully accurate enforcement of such conventions is guaranteed. The result is that, in many practical situations, analysts end up resorting to after-the-fact reviews for sifting requirements from other material in a requirements specification. This is both tedious and time-consuming. We propose an automated approach for demarcating requirements in free-form requirements specifications. The approach, which is based on machine learning, can be applied to a wide variety of specifications in different domains and with different writing styles. We train and evaluate our approach over an independently labeled dataset comprised of 33 industrial requirements specifications. Over this dataset, our approach yields an average precision of 81.2% and an average recall of 95.7%. Compared to simple baselines that demarcate requirements based on the presence of modal verbs and identifiers, our approach leads to an average gain of 16.4% in precision and 25.5% in recall. We collect and analyze expert feedback on the demarcations produced by our approach for industrial requirements specifications. The results indicate that experts find our approach useful and efficient in practice.We developed a prototype tool, named DemaRQ, in support of our approach. To facilitate replication, we make available to the research community this prototype tool alongside the non-proprietary portion of our training data.

Research center :

Interdisciplinary Centre for Security, Reliability and Trust (SnT) > SVV - Software Verification and Validation

Disciplines :

Computer science

Author, co-author :

ABUALHAIJA, Sallam ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)

Arora, Chetan

SABETZADEH, Mehrdad ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)

BRIAND, Lionel ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)

Traynor, Michael

External co-authors :

yes

Language :

English

Title :

Automated Demarcation of Requirements in Textual Specifications: A Machine Learning-Based Approach

Publication date :

2020

Journal title :

Empirical Software Engineering

ISSN :

1382-3256

eISSN :

1573-7616

Peer reviewed :

Peer Reviewed verified by ORBi

Focus Area :

Security, Reliability and Trust

Additional URL :

10.1109/RE.2019.00017

FnR Project :

FNR12632261 - Early Quality Assurance Of Critical Systems, 2018 (01/01/2019-31/12/2021) - Mehrdad Sabetzadeh

Funders :

QRA Corp
FNR - Luxembourg National Research Fund
H2020 European Research Council
CRSNG - Conseil de Recherches en Sciences naturelles et en Génie

Available on ORBilu :

since 29 June 2020

Statistics

Number of views

631 (70 by Unilu)

Number of downloads

536 (20 by Unilu)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

OpenCitations

OpenAlex citations

WoS citations^™

Bibliography

A. van Lamsweerde, Requirements Engineering: From System Goals to UML Models to Software Specifications, 1st ed. Wiley, 2009.
D. Berry, E. Kamsties, and M. Krieger, "From contract drafting to software specification: Linguistic sources of ambiguity, a handbook, " 2003, last accessed: March 2019. [Online]. Available: http://se. uwaterloo.ca/-dberry/handbook/ambiguityHandbook.pdf
M. Luisa, F. Mariangela, and N. I. Pierluigi, "Market research for requirements analysis using linguistic tools, " Requirements Engineering Journal (RE J), vol. 9, no. 1, pp. 40-56, 2004.
K. Pohl, Requirements Engineering-Fundamentals, Principles, and Techniques, 1st ed. Springer, 2010.
J. Winkler and A. Vogelsang, "Using tools to assist identification of nonrequirements in requirements specifications-a controlled experiment, " in Proceedings of the 24th International Working Conference on Requirements Engineering: Foundation for Software Quality (REFSQ'18), 2018, pp. 57-71.
C. Arora, M. Sabetzadeh, L. Briand, and F. Zimmer., "Extracting domain models from natural-language requirements: Approach and industrial evaluation, " in Proceedings of the 19th International Conference on Model Driven Engineering Languages and Systems (MODELS'16), 2016, pp. 250-260.
K. Pohl and C. Rupp, Requirements Engineering Fundamentals, 1st ed. Rocky Nook, 2011.
A. Falkner, C. Palomares, X. Franch, G. Schenner, P. Aznar, and A. Schoerghuber, "Identifying requirements in requests for proposal: A research preview, " in Proceedings of the 25th International Working Conference on Requirements Engineering: Foundation for Software Quality (REFSQ'19), 2019, pp. 176-182.
I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, 1st ed. MIT Press, 2016.
I. H. Witten, E. Frank, M. A. Hall, and C. J. Pal, Data Mining: Practical Machine Learning Tools and Techniques, 4th ed. Morgan Kaufmann, 2016.
D. Jurafsky and J. Martin, Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, 2nd ed. Prentice Hall, 2009.
N. Indurkhya and F. J. Damerau, Handbook of Natural Language Processing, 2nd ed. CRC Press, 2010.
G. A. Miller, "WordNet: a lexical database for English, " Communications of the ACM, vol. 38, no. 11, pp. 39-41, 1995.
Princeton University, "About WordNet, " 2010, last accessed: March 2019. [Online]. Available: https://wordnet.princeton.edu/documentation
Z. Kurtanovic and W. Maalej, "Mining user rationale from software reviews, " in Proceedings of the 25th International Requirements Engineering Conference (RE'17), 2017, pp. 61-70.
I. Habernal, J. Eckle-Kohler, and I. Gurevych, "Argumentation mining on the web from information seeking perspective, " in Proceedings of the Workshop on Frontiers and Connections between Argumentation Theory and Natural Language Processing (ArgNLP'14), 2014.
W. A. Cook, Case grammar theory. Georgetown University Press, 1989.
Aspose.Words, "Java word documents manipulation APIs, " 2018, last accessed: March 2019. [Online]. Available: https://products.aspose.com/words/java
R. Eckart de Castilho and I. Gurevych, "A broad-coverage collection of portable NLP components for building shareable analysis pipelines, " in Proceedings of the Workshop on Open Infrastructures and Analysis Frameworks for HLT (OIAF4HLT'14), 2014, pp. 1-11.
S. Petrov, L. Barrett, R. Thibaux, and D. Klein, "Learning accurate, compact, and interpretable tree annotation, " in Proceedings of the 21st International Conference on Computational Linguistics (COLING'06), 2006, pp. 433-440.
J. Nivre, J. Hall, and J. Nilsson, "MaltParser: A data-driven parsergenerator for dependency parsing, " in Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC'06), 2006, pp. 2216-2219.
B. Walenz and J. Didion, "JWNL: Java WordNet Library, " 2011, last accessed: March 2019. [Online]. Available: http://jwordnet.sourceforge.net
International Organization for Standardization, "ISO/IEC/IEEE 29148:2011-Systems and software engineering-Requirements engineering, " 2011.
J. Cohen, "A coefficient of agreement for nominal scales, " Educational and psychological measurement (EPM), vol. 20, no. 1, pp. 37-46, 1960.
M. L. McHugh, "Interrater reliability: the kappa statistic, " Biochemia Medica (BM), vol. 22, no. 3, pp. 276-282, 2012.
D. M. Berry, "Evaluation of tools for hairy requirements and software engineering tasks, " in Proceedings of the 25th International Requirements Engineering Conference Workshops (REW'17), 2017, pp. 284-291.
S. Suthaharan, Modeling and Algorithms. Springer US, 2016, pp. 123-143.
P. Louridas and C. Ebert, "Machine learning, " IEEE Software, vol. 33, no. 5, pp. 110-115, 2016.
P. Reutemann, J. van Rijn, and E. Frank, "Weka MultiSearch Parameter Optimization, " 2018, last accessed: March 2019. [Online]. Available: http://weka.sourceforge.net/packageMetaData/multisearch/index.html
J. Bergstra and Y. Bengio, "Random search for hyper-parameter optimization, " Journal of Machine Learning Research (JMLR), vol. 13, no. 1, pp. 281-305, 2012.
L. Ramshaw and M. Marcus, "Text chunking using transformation-based learning, " in Natural language processing using very large corpora. Springer, 1999.
C. Arora, M. Sabetzadeh, L. Briand, and F. Zimmer, "Automated checking of conformance to requirements templates using natural language processing, " IEEE Transactions on Software Engineering (TSE), vol. 41, no. 10, pp. 944-968, 2015.
"Automated extraction and clustering of requirements glossary terms, " IEEE Transactions on Software Engineering (TSE), vol. 43, no. 10, pp. 918-945, 2017.
H. Asuncion, A. Asuncion, and R. Taylor, "Software traceability with topic modeling, " in Proceedings of the 32nd International Conference on Software Engineering (ICSE'10), 2010, pp. 95-104.
J. Cleland-Huang, A. Czauderna, M. Gibiec, and J. Emenecker, "A machine learning approach for tracing regulatory codes to product specific requirements, " in Proceedings of the 32nd International Conference on Software Engineering (ICSE'10), 2010, pp. 155-164.
H. Sultanov and J. H. Hayes, "Application of reinforcement learning to requirements engineering: requirements tracing, " in Proceedings of the 21st International Requirements Engineering Conference (RE'13), 2013, pp. 52-61.
J. Guo, J. Cheng, and J. Cleland-Huang, "Semantically enhanced software traceability using deep learning techniques, " in Proceedings of the 39th International Conference on Software Engineering (ICSE'17), 2017, pp. 255-272.
J. Cleland-Huang, R. Settimi, X. Zou, and P. Solc, "Automated classification of non-functional requirements, " Requirements Engineering Journal (RE J), vol. 12, no. 2, pp. 103-120, 2007.
J. Winkler and A. Vogelsang, "Automatic classification of requirements based on convolutional neural networks, " in Proceedings of the 24th International Requirements Engineering Conference Workshops (REW'16), 2016, pp. 39-45.
Z. Kurtanović and W. Maalej, "Automatically classifying functional and non-functional requirements using supervised machine learning, " in Proceedings of the 25th International Requirements Engineering Conference (RE'17), 2017, pp. 490-495.
A. Perini, A. Susi, and P. Avesani, "A machine learning approach to software requirements prioritization, " IEEE Transactions on Software Engineering (TSE), vol. 39, no. 4, pp. 445-461, 2013.
H. Yang, A. Willis, A. De Roeck, and B. Nuseibeh, "Automatic detection of nocuous coordination ambiguities in natural language requirements, " in Proceedings of the 25th International Conference on Automated Software Engineering (ASE'10), 2010, pp. 53-62.
H. Yang, A. De Roeck, V. Gervasi, A. Willis, and B. Nuseibeh, "Speculative requirements: Automatic detection of uncertainty in natural language requirements, " in Proceedings of the 20th International Requirements Engineering Conference (RE'12), 2012, pp. 11-20.
C. Arora, M. Sabetzadeh, S. Nejati, and L. Briand, "An active learning approach for improving the accuracy of automated domain model extraction, " ACM Transactions on Software Engineering and Methodology (TOSEM), vol. 28, no. 1, pp. 4:1-4:34, 2019.
W. Maalej, Z. Kurtanović, H. Nabil, and C. Stanik, "On the automatic classification of app reviews, " Requirements Engineering Journal (RE J), vol. 21, no. 3, pp. 311-331, 2016.
T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, "Distributed representations of words and phrases and their compositionality, " in Proceedings of the 26th International Neural Information Processing Systems Conference (NIPS'13), 2013, pp. 3111-3119.
IBM DOORS, "IBM-Rational DOORS, " 2018, last accessed: March 2019. [Online]. Available: https://www.ibm.com/us-en/marketplace/requirements-management
D. Ott, "Automatic requirement categorization of large natural language specifications at mercedes-benz for review improvements, " in Proceedings of the 19th International Working Conference on Requirements Engineering: Foundation for Software Quality (REFSQ'13), 2013, pp. 50-64.
A. Casamayor, D. Godoy, and M. Campo, "Identification of nonfunctional requirements in textual specifications: A semi-supervised learning approach, " Information and Software Technology (IST), vol. 52, no. 4, pp. 436-445, 2010.
M. Riaz, J. King, J. Slankas, and L. Williams, "Hidden in plain sight: Automatically identifying security requirements from natural language artifacts, " in Proceedings of the 22nd International Requirements Engineering Conference (RE'14), 2014, pp. 183-192.
C. Li, L. Huang, J. Ge, B. Luo, and V. Ng, "Automatically classifying user requests in crowdsourcing requirements engineering, " Journal of Systems and Software (JSS), vol. 138, no. 1, pp. 108-123, 2018.
E. Guzman, M. Ibrahim, and M. Glinz, "A little bird told me: Mining tweets for requirements and software evolution, " in Proceedings of the 25th International Requirements Engineering Conference (RE'17), 2017, pp. 11-20.
G. Williams and A. Mahmoud, "Mining twitter feeds for software user requirements, " in Proceedings of the 25th International Requirements Engineering Conference (RE'17), 2017, pp. 1-10.
P. Rodeghero, S. Jiang, A. Armaly, and C. McMillan, "Detecting user story information in developer-client conversations to generate extractive summaries, " in Proceedings of the 39th International Conference on Software Engineering (ICSE'17), 2017, pp. 49-59.