Requirements Engineering; Natural-language Requirements; Ambiguity; Natural Language Processing; Corpus Generation; Wikipedia
Résumé :
[en] Ambiguity in natural-language requirements is a pervasive issue that has been studied by the requirements engineering community for more than two decades. A fully manual approach for addressing ambiguity in requirements is tedious and time-consuming, and may further overlook unacknowledged ambiguity – the situation where different stakeholders perceive a requirement as unambiguous but, in reality, interpret the requirement differently. In this paper, we propose an automated approach that uses natural language processing for handling ambiguity in requirements. Our approach is based on the automatic generation of a domain-specific corpus from Wikipedia. Integrating domain knowledge, as we show in our evaluation, leads to a significant positive improvement in the accuracy of ambiguity detection and interpretation. We scope our work to coordination ambiguity (CA) and prepositional-phrase attachment ambiguity (PAA) because of the prevalence of these types of ambiguity in natural-language requirements [1]. We evaluate our approach on 20 industrial requirements documents. These documents collectively contain more than 5000 requirements from seven distinct application domains. Over this dataset, our approach detects CA and PAA with an average precision of 80% and an average recall of 89% (90% for cases of unacknowledged ambiguity). The automatic interpretations that our approach yields have an average accuracy of 85%. Compared to baselines that use generic corpora, our approach, which uses domain-specific corpora, has 33% better accuracy in ambiguity detection and 16% better accuracy in interpretation.
Centre de recherche :
Interdisciplinary Centre for Security, Reliability and Trust (SnT) > SVV - Software Verification and Validation
Disciplines :
Sciences informatiques
Auteur, co-auteur :
EZZINI, Saad ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SVV
ABUALHAIJA, Sallam ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SVV
Arora, Chetan; Deakin University
SABETZADEH, Mehrdad ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SVV
BRIAND, Lionel ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SVV
Co-auteurs externes :
yes
Langue du document :
Anglais
Titre :
Using Domain-specific Corpora for Improved Handling of Ambiguity in Requirements
Date de publication/diffusion :
mai 2021
Nom de la manifestation :
43rd International Conference on Software Engineering
Date de la manifestation :
from 25-05-2021 to 28-05-2021
Manifestation à portée :
International
Titre de l'ouvrage principal :
Proceedings of the 43rd International Conference on Software Engineering (ICSE'21), Madrid 25-28 May 2021
Maison d'édition :
IEEE
Peer reviewed :
Peer reviewed
Projet FnR :
FNR12632261 - Early Quality Assurance Of Critical Systems, 2018 (01/01/2019-31/12/2021) - Mehrdad Sabetzadeh
F. de Bruijn and H. Dekkers, "Ambiguity in natural language software requirements: A case study," in Proceedings of the 16th Working Con ference on Requirements Engineering: Foundation fo r Software Quality (REFSQ'10), 2010.
K. Pohl, Requirements Engineering, 1st ed. Springer, 2010.
D. Berry, E. Kamsties, and M. Krieger, "From contract drafting to software specification: Linguistic sources of ambiguity, a handbook," 2003. [Online]. Available: http://se.uwaterloo.ca/~dberry/handbook/ ambiguityHandbook.pdf
S. Piantadosi, H. Tily, and E. Gibson, " The communicative function of ambiguity in language," Cognition, vol. 122, no. 3, 2012.
K. Pohl and C. Rupp, Requirements Engineering Fundamentals, 1st ed. Rocky Nook, 2011.
A. Ferrari and A. Esuli, "An NLP approach for cross-domain ambiguity detection in requirements engineering," Automated Software Engineer ing, vol. 26, no. 3, 2019.
F. Chantree, B. Nuseibeh, A. de Roeck, and A. Willis, " Identifying nocuous ambiguities in natural language requirements," in Proceedings of the 14th IEEE International Requirements Engineering Conference (RE'06), 2006.
V. Gervasi, A. Ferrari, D. Zowghi, and P. Spoletini, "Ambiguity in requirements engineering: Towards a unifying framework," in From Soft ware Engineering to Formal Methods and Tools, and Back. Springer, 2019.
E. Kamsties, D. Berry, and B. Paech, "Detecting ambiguities in requirements documents using inspections," in Proceedings of the 1st Workshop on Inspection in Software Engineering (WISE'01), 2001.
N. Kiyavitskaya, N. Zeni, L. Mich, and D. Berry, "Requirements for tools for ambiguity identification and measurement in natural language requirements specifications," Requirements Engineering, vol. 13, no. 3, 2008.
F. Dalpiaz, I. Schalk, and G. Lucassen, " Pinpointing ambiguity and incompleteness in requirements engineering via information visualization and NLP," in Proceedings of the 24th Working Conference on Require ments Engineering: Foundation fo r Software Quality (REFSQ'18), 2018.
P. Spoletini, A. Ferrari, M. Bano, D. Zowghi, and S. Gnesi, " Interview review: An empirical study on detecting ambiguities in requirements elicitation interviews," in Proceedings of the 24th Working Confer ence on Requirements Engineering: Foundation fo r Software Quality (REFSQ'18), 2018.
H. Yang, A. de Roeck, V. Gervasi, A. Willis, and B. Nuseibeh, "Analysing anaphoric ambiguity in natural language requirements," Requirements Engineering, vol. 16, no. 3, 2011.
F. Dalpiaz, D. Dell'Anna, F. Aydemir, and S. Cevikol, "Requirements classification with interpretable machine learning and dependency parsing," in Proceedings of the 27th IEEE International Requirements Engineering Conference (RE'19), 2019.
S. Mishra and A. Sharma, "On the use of word embeddings for identifying domain specific ambiguities in requirements," in Proceedings of the 27th IEEE International Requirements Engineering Conference Workshops (REW'19), 2019.
D. Toews and L. Van Holland, "Determining domain-specific differences of polysemous words using context information." in Proceedings of the 25th Working Conference on Requirements Engineering: Foundation and Software Quality Workshops (REFSQW'19), 2019.
V. Jain, R. Malhotra, S. Jain, and N. Tanwar, "Cross-domain ambiguity detection using linear transformation of word embedding spaces," in Pro ceedings of the 26th Working Conference on Requirements Engineering: Foundation and Software Quality Workshops (REFSQW'20), 2020.
C. Arora, M. Sabetzadeh, L. Briand, and F. Zimmer, "Extracting domain models from natural-language requirements: approach and industrial evaluation," in Proceedings of the ACM/IEEE 19th International Con ference on Model Driven Engineering Languages and Systems (MODELS' 16), 2016.
A. Sleimi, N. Sannier, M. Sabetzadeh, L. Briand, and J. Dann, "Automated extraction of semantic legal metadata using natural language processing," in Proceedings ofthe 26th IEEE International Requirements Engineering Conference (RE'18), 2018.
C. Schutze, "PP attachment and argumenthood," MIT working papers in linguistics, vol. 26, no. 95, 1995.
P. Engelhardt and F. Ferreira, "Processing coordination ambiguity," Language and Speech, vol. 53, no. 4, 2010.
B. Strang, Modern English Structure, 2nd ed. Edward Arnold, 1968.
F. Chantree, A. Kilgarriff, A. De Roeck, and A. Willis, "Disambiguating coordinations using word distribution information," in Proceedings of the 5th International Conference on Recent Advances in Natural Language Processing (RANLP'05), 2005.
M. Goldberg, "An unsupervised model for statistically determining coordinate phrase attachment," in Proceedings of the 37th annual meeting of the Association fo r Computational Linguistics (ACL'99), 1999.
P. Resnik, "Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language," Journal of Artificial Intelligence Research, vol. 11, no. 1, 1999.
P. Nakov and M. Hearst, "Using the web as an implicit training set: application to structural ambiguity resolution," in Proceedings of the 5th conference on Human Language Technology and Empirical Methods in Natural Language Processing (HLT'05), 2005.
A. De Roeck, "Detecting dangerous coordination ambiguities using word distribution," in Proceedings of the 6th International Conference on Recent Advances in Natural Language Processing (RANLP'07), 2007.
S. Tjong and D. Berry, "Can rules of inferences resolve coordination ambiguity in natural language requirements specification?" in Proceedings of the 13th Workshop on Requirements Engineering (WER'08), 2008.
H. Yang, A. Willis, A. De Roeck, and B. Nuseibeh, "Automatic detection of nocuous coordination ambiguities in natural language requirements," in Proceedings of the 10th IEEE/ACM international conference on Automated software engineering (ASE'10), 2010.
S. Tjong and D. Berry, "The design of SREE-A prototype potential ambiguity finder for requirements specifications and lessons learned," in Proceedings of the 19th Working Conference on Requirements Engi neering: Foundation fo r Software Quality (REFSQ'13), 2013.
A. Kilgarriff, "Thesauruses for natural language processing," in Pro ceedings of the 1st International Conference on Natural Language Processing and Knowledge Engineering (NLPKE'03), 2003.
H. Yang, A. De Roeck, A. Willis, and B. Nuseibeh, "A methodology for automatic identification of nocuous ambiguity," in Proceedings of the 23rd International Conference on Computational Linguistics (COLING'10), 2010.
A. Okumura and K. Muraki, "Symmetric pattern matching analysis for English coordinate structures," in Proceedings of the 4th Conference on Applied Natural Language Processing (ANLP'94), 1994.
E. Agirre, T. Baldwin, and D. Martinez, " Improving parsing and PP attachment performance with sense information," in Proceedings of the 46th Annual Meeting of the Association fo r Computational Linguistics (ACL'08), 2008.
H. Calvo and A. Gelbukh, "Improving prepositional phrase attachment disambiguation using the web as corpus," in Proceedings of the 8th Iberoamerican Congress on Progress in Pattern Recognition, Speech and Image Analysis (CIARP'03), 2003.
M. B. Hosseini, R. Slavin, T. Breaux, X. Wang, and J. Niu, "Disambiguating requirements through syntax-driven semantic analysis of information types," in Proceedings of the 26th Working Conference on Re quirements Engineering: Foundation fo r Software Quality (REFSQ'20), 2020.
U. Shah and D. Jinwala, "Resolving ambiguities in natural language software requirements: A comprehensive survey," SIGSOFT Software Engineering Notes, vol. 40, no. 5, 2015.
C. Ribeiro and D. Berry, "The prevalence and severity of persistent ambiguity in software requirements specifications: Is a special effort needed to find them?" Science of Computer Programming, vol. 195, 2020.
F. Fabbrini, M. Fusani, S. Gnesi, and G. Lami, " The linguistic approach to the natural language requirements quality: Benefit of the use of an automatic tool," in Proceedings of the 26th Annual NASA Goddard Software Engineering Workshop (SEW'01), 2001.
E. Kamsties and B. Peach, "Taming ambiguity in natural language requirements," in Proceedings of the 13th International Conference on Software and Systems Engineering and Applications (ICSSEA'00), 2000.
A. Massey, R. Rutledge, A. Anton, and P. Swire, " Identifying and classifying ambiguity for regulatory requirements," in Proceedings of the 22nd IEEE International Requirements Engineering Conference (RE'14), 2014.
L. Mich, "NL-OOPS: From natural language to object oriented requirements using the natural language processing system LOLITA," Natural Language Engineering, vol. 2, no. 2, 1996.
V. Ambriola and V. Gervasi, "On the systematic analysis of natural language requirements with CIRCE," Automated Software Engineering, vol. 13, no. 1, 2006.
A. Mavin, P. Wilkinson, A. Harwood, and M. Novak, "Easy approach to requirements syntax (EARS)," in Proceedings of the 17th IEEE International Requirements Engineering Conference (RE'09), 2009.
C. Arora, M. Sabetzadeh, L. Briand, and F. Zimmer, "Automated checking of conformance to requirements templates using natural language processing," IEEE Transactions on Software Engineering, vol. 41, no. 10, 2015.
D. Rodriguez, D. Carver, and A. Mahmoud, "An efficient wikipediabased approach for better understanding of natural language text related to user requirements," in Proceedings of the 39th IEEE Aerospace Conference (AeroConf'18), 2018.
B. Gleich, O. Creighton, and L. Kof, "Ambiguity detection: Towards a tool explaining ambiguity sources," in Proceedings of the 16th Working Conference on Requirements Engineering: Foundation fo r Software Quality (REFSQ'10), 2010.
H. Femmer, D. Mendez Fernandez, S. Wagner, and S. Eder, "Rapid quality assurance with requirements smells," Journal of Systems and Software, vol. 123, 2017.
B. Rosadini, A. Ferrari, G. Gori, A. Fantechi, S. Gnesi, I. Trotta, and S. Bacherini, "Using NLP to detect requirements defects: An industrial experience in the railway domain," in Proceedings of the 23rd Working Conference on Requirements Engineering: Foundation fo r Software Quality (REFSQ'17), 2017.
A. Ferrari, G. Gori, B. Rosadini, I. Trotta, S. Bacherini, A. Fantechi, and S. Gnesi, "Detecting requirements defects with NLP patterns: An industrial experience in the railway domain," Empirical Software Engineering, vol. 23, no. 6, 2018.
G. Lami, M. Fusani, and G. Trentanni, "QuARS: A pioneer tool for NL requirement analysis," in From Software Engineering to Formal Methods and Tools, and Back. Springer, 2019.
F. Dalpiaz, I. van der Schalk, S. Brinkkemper, F. Aydemir, and G. Lucassen, "Detecting terminological ambiguity in user stories: Tool and experimentation," Information and Software Technology, vol. 110, 2019.
A. Willis, F. Chantree, and A. De Roeck, "Automatic identification of nocuous ambiguity," Research on Language and Computation, vol. 6, no. 3-4, 2008.
K. Church and R. Patil, Coping with Syntactic Ambiguity or How to Put the Block in the Box on the Table, 1st ed. MIT Press, 1982.
P. Pantel and D. Lin, "An unsupervised approach to prepositional phrase attachment using contextually similar words," in Proceedings ofthe 38th Annual Meeting on Association fo r Computational Linguistics (ACL'00), 2000.
E. Agirre, O. de Lacalle, C. Fellbaum, A. Marchetti, A. Toral, and P. Vossen, "SemEval-2010 task 17: all-words word sense disambiguation on a specific domain," in Proceedings of the 5th Workshop on Semantic Evaluations: Recent Achievements and Future Directions (SEW'10), 2010.
M. Strube and S. Ponzetto, "WikiRelate! computing semantic relatedness using Wikipedia," in Proceedings of the 21st national conference on Artificial intelligence (AAAI'06), 2006.
E. Gabrilovich, S. Markovitch et al., "Computing semantic relatedness using wikipedia-based explicit semantic analysis." in Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI'07), 2007.
A. Fogarolli, "Word sense disambiguation based on Wikipedia link structure," in Proceedings of the 3rd IEEE International Conference on Semantic Computing (ICSC'09), 2009.
S. Gella, C. Strapparava, and V. Nastase, "Mapping WordNet domains, WordNet topics and Wikipedia categories to generate multilingual domain specific resources," in Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC'14), 2014.
G. Miller, "WordNet: A lexical database for English," Communications of the ACM, vol. 38, no. 11, 1995.
C. Fellbaum, WordNet: An Electronic Lexical Database, 1st ed. The MIT Press, 1998.
D. Chen and C. Manning, "A fast and accurate dependency parser using neural networks," in Proceedings of the 18th Conference on Empirical Methods in Natural Language Processing (EMNLP'14), 2014.
C. Arora, M. Sabetzadeh, L. Briand, and F. Zimmer, "Automated extraction and clustering of requirements glossary terms," IEEE Transactions on Software Engineering, vol. 43, no. 10, 2017.
D. Newman, J. Lau, K. Grieser, and T. Baldwin, "Automatic evaluation of topic coherence," in Proceedings of the 8th annual conference of the North American chapter of the association fo r computational linguistics: Human language technologies (NAACL-HLT'10), 2010.
S. Evert, "Google web 1T 5-grams made easy (but not for the computer)," in Proceedings of the 8th annual conference of the North American Chapter of the Association fo r Computational Linguistics: Human Language Technologies (NAACL-HLT'10) and the 6th Web as Corpus Workshop (WAC'10), 2010.
C. Biemann, F. Bildhauer, S. Evert, D. Goldhahn, U. Quasthoff, R. Schafer, J. Simon, L. Swiezinski, and T. Zesch, "Scalable construction of high-quality web corpora." Journal fo r Language Technology and Computational Linguistics, vol. 28, no. 2, 2013.
T. Yen, J. Wu, J. Chang, J. Boisson, and J. Chang, "WriteAhead: Mining grammar patterns in corpora for assisted writing," in Proceedings of the 53rd Annual Meeting of the Association fo r Computational Linguis tics and the 7th International Joint Conference on Natural Language Processing, Proceedings of System Demonstrations (ACL-IJCNLP'15), 2015.
T. Hawker, "USYD: WSD and lexical substitution using the Web1T corpus," in Proceedings of the 4th International Workshop on Semantic Evaluations (SemEval'07), 2007.
D. Jurafsky and J. Martin, Speech and Language Processing: An In troduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, 2nd ed. Prentice Hall, 2009.
C. Manning and H. Schutze, Foundations of statistical natural language processing, 1st ed. MIT press, 1999.
G. Dinu and M. Lapata, "Measuring distributional similarity in context," in Proceedings of the 14th Conference on Empirical Methods in Natural Language Processing (EMNLP'10), 2010.
L. J. Brinton, The structure of modern English: A linguistic introduction. John Benjamins Publishing, 2000.
I. Witten, E. Frank, M. Hall, and C. Pal, Data Mining: Practical Machine Learning Tools and Techniques, 4th ed. Elsevier, 2011.
R. Eckart de Castilho and I. Gurevych, "A broad-coverage collection of portable NLP components for building shareable analysis pipelines," in Proceedings of the Workshop on Open Infrastructures and Analysis Frameworks fo r HLT (OIAF4HLT'14), 2014.
T. Zesch, C. Muller, and I. Gurevych, " Extracting lexical semantic knowledge from Wikipedia and Wiktionary," in Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC'08), 2008.
C. Giuliano, " jWeb1T: A library for searching the web 1T 5- gram corpus," last accessed: August 2020. [Online]. Available: http://hlt.fbk.eu/en/technology/jWeb1t
M. Zhu, Y. Zhang, W. Chen, M. Zhang, and J. Zhu, "Fast and accurate shift-reduce constituent parsing," in Proceedings of the 51st Annual Meeting of the Association fo r Computational Linguistics (ACL'13), 2013.
P. Resnik, "Using information content to evaluate semantic similarity in a taxonomy," in Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI'95), 1995.
H. Shima, "WS4J WordNet similarity for java," last accessed: August 2020. [Online]. Available: https://code.google.com/archive/p/ws4j/
J. R. Landis and G. G. Koch, "An application of hierarchical kappatype statistics in the assessment of majority agreement among multiple observers," Biometrics, vol. 33, no. 2, 1977.
J. Bergstra and Y. Bengio, "Random search for hyper-parameter optimization," Journal of Machine Learning Research, vol. 13, no. 1, 2012.
G. Leech, " 100 million words of English," English Today, vol. 9, no. 1, 1993.
J. Hirschberg and C. Manning, "Advances in natural language processing," Science, vol. 349, no. 6245, 2015.
L. Breiman, J. Friedman, R. Olshen, and C. Stone, Classification And Regression Trees, 1st ed. Routledge, 1984.
Y. Tian and D. Lo, "A comparative study on the effectiveness of part-ofspeech tagging techniques on bug reports," in Proceedings of the 22nd IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER'15), 2015.
J. Charbonnier and C. Wartena, "Using word embeddings for unsupervised acronym disambiguation," in Proceedings of the 27th International Conference on Computational Linguistics (COLING'18), 2018.