Requirements Engineering; Natural-language Requirements; Ambiguity; Natural Language Processing (NLP); Machine Learning (ML); Language Models; BERT
Abstract :
[en] Ambiguity is a pervasive issue in natural-language requirements. A common source of ambiguity in requirements is when a pronoun is anaphoric. In requirements engineering, anaphoric ambiguity occurs when a pronoun can plausibly refer to different entities and thus be interpreted differently by different readers. In this paper, we develop an accurate and practical automated approach for handling anaphoric ambiguity in requirements, addressing both ambiguity detection and anaphora interpretation. In view of the multiple competing natural language processing (NLP) and machine learning (ML) technologies that one can utilize, we simultaneously pursue six alternative solutions, empirically assessing each using a collection of ~1,350 industrial requirements. The alternative solution strategies that we consider are natural choices induced by the existing technologies; these choices frequently arise in other automation tasks involving natural-language requirements. A side-by-side empirical examination of these choices helps develop insights about the usefulness of different state-of-the-art NLP and ML technologies for addressing requirements engineering problems. For the ambiguity detection task, we observe that supervised ML outperforms both a large-scale language model, SpanBERT (a variant of BERT), as well as a solution assembled from off-the-shelf NLP coreference resolvers. In contrast, for anaphora interpretation, SpanBERT yields the most accurate solution. In our evaluation, (1) the best solution for anaphoric ambiguity detection has an average precision of ~60% and a recall of 100%, and (2) the best solution for anaphora interpretation (resolution) has an average success rate of ~98%.
Research center :
Interdisciplinary Centre for Security, Reliability and Trust (SnT) > SVV - Software Verification and Validation
Disciplines :
Computer science
Author, co-author :
EZZINI, Saad ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SVV
ABUALHAIJA, Sallam ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SVV
Arora, Chetan
Sabetzadeh, Mehrdad
External co-authors :
yes
Language :
English
Title :
Automated Handling of Anaphoric Ambiguity in Requirements: A Multi-solution Study
Publication date :
May 2022
Event name :
44th International Conference on Software Engineering
Event date :
from 22-05-2022 to 27-05-2022
Main work title :
Proceedings of the 44th International Conference on Software Engineering (ICSE'22), Pittsburgh, PA, USA 22-27 May 2022
Publisher :
Association for Computing Machinery
Peer reviewed :
Peer reviewed
FnR Project :
FNR12632261 - Early Quality Assurance Of Critical Systems, 2018 (01/01/2019-31/12/2021) - Mehrdad Sabetzadeh
Sallam Abualhaija, Davide Fucci, Fabiano Dalpiaz, and Xavier Franch. 2020. Preface: 3rd Workshop on Natural Language Processing for Requirements Engineering (NLP4RE'20). In Joint Proceedings of REFSQ-2020 Workshops, Doctoral Symposium, Live Studies Track, and Poster Track co-located with the 26th International Conference on Requirements Engineering: Foundation for Software Quality.
Sallam Abualhaija, Davide Fucci, Fabiano Dalpiaz, Xavier Franch, and Alessio Ferrari. 2020. ReqEval: The shared task on anaphora ambiguity detection and disambiguation. https://github. com/frieden84/nlp4re-reqeval last accessed: July 2021.
Charu C Aggarwal. 2018. Machine learning for text. Springer.
Vincenzo Ambriola and Vincenzo Gervasi. 2006. On the Systematic Analysis of Natural Language Requirements with CIRCE. Automated Software Engineering 13, 1 (2006).
Chetan Arora, Mehrdad Sabetzadeh, Lionel Briand, and Frank Zimmer. 2015. Automated Checking of Conformance to Requirements Templates Using Natural Language Processing. IEEE Transactions on Software Engineering (TSE'15) 41, 10 (2015).
Chetan Arora, Mehrdad Sabetzadeh, Lionel Briand, and Frank Zimmer. 2017. Automated Extraction and Clustering of Requirements Glossary Terms. IEEE Transactions on Software Engineering 43, 10 (2017).
Chetan Arora, Mehrdad Sabetzadeh, Lionel Briand, Frank Zimmer, and Raul Gnaga. 2013. RUBRIC: A Flexible Tool for Automated Checking of Conformance to Requirement Boilerplates. In Proceedings of the 9th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE'13).
Chetan Arora, Mehrdad Sabetzadeh, Shiva Nejati, and Lionel Briand. 2019. An Active Learning Approach for Improving the Accuracy of Automated Domain Model Extraction. ACM Transactions on Software Engineering and Methodology 28, 1 (2019).
James Bergstra, Rémi Bardenet, Yoshua Bengio, and Balázs Kégl. 2011. Algorithms for hyper-parameter optimization. Advances in neural information processing systems 24 (2011).
D. Berry, E. Kamsties, and M. Krieger. 2003. From Contract Drafting to Software Specification: Linguistic Sources of Ambiguity, A Handbook. http://se. uwaterloo. ca/~dberry/handbook/ambiguityHandbook. pdf
Daniel M Berry. 2021. Empirical evaluation of tools for hairy requirements engineering tasks. Empirical Software Engineering 26, 6 (2021).
Ekaba Bisong. 2019. Building machine learning and deep learning models on Google cloud platform: A comprehensive guide for beginners. Apress.
Samuel Broscheit, Massimo Poesio, Simone Paolo Ponzetto, Kepa Joseba Rodriguez, Lorenza Romano, Olga Uryupina, Yannick Versley, and Roberto Zanoli. 2010. BART: A multilingual anaphora resolution system. In Proceedings of the 5th international workshop on semantic evaluation.
Xinyun Cheng, Xianglong Kong, Li Liao, and Bixin Li. 2020. A Combined Method for Usage of NLP Libraries Towards Analyzing Software Documents. In International Conference on Advanced Information Systems Engineering.
Kevin Clark and Christopher D. Manning. 2016. Deep Reinforcement Learning for Mention-Ranking Coreference Models. In Empirical Methods on Natural Language Processing.
Kevin Clark and Christopher D. Manning. 2016. Improving Coreference Resolution by Learning Entity-Level Distributed Representations. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics.
Fabiano Dalpiaz, Ivor van der Schalk, Sjaak Brinkkemper, Fatma Aydemir, and Garm Lucassen. 2019. Detecting terminological ambiguity in user stories: Tool and experimentation. Information and Software Technology 110 (2019).
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. (2018). arXiv:arXiv:1810. 04805
Richard Evans. 2001. Applying machine learning toward an automatic classification of it. Literary and linguistic computing 16, 1 (2001), 45-58.
Saad Ezzini, Sallam Abualhaija, Chetan Arora, Mehrdad Sabetzadeh, and Lionel C Briand. 2021. Using domain-specific corpora for improved handling of ambiguity in requirements. In 2021 IEEE/ACM 43rd International Conference on Software Engineering.
Fabrizio Fabbrini, Mario Fusani, Stefania Gnesi, and Giuseppe Lami. 2001. The linguistic approach to the natural language requirements quality: Benefit of the use of an automatic tool. In Proceedings of the 26th Annual NASA Goddard Software Engineering Workshop.
Henning Femmer, Daniel Méndez Fernández, Elmar Juergens, Michael Klose, Ilona Zimmer, and Jörg Zimmer. 2014. Rapid requirements checks with requirements smells: Two case studies. In Proceedings of the 1st International Workshop on Rapid Continuous Software Engineering.
Henning Femmer, Daniel Méndez Fernández, Stefan Wagner, and Sebastian Eder. 2017. Rapid quality assurance with Requirements Smells. Journal of Systems and Software 123 (2017).
Alessio Ferrari and Andrea Esuli. 2019. An NLP approach for cross-domain ambiguity detection in requirements engineering. Automated Software Engineering 26, 3 (2019).
Alessio Ferrari, Gloria Gori, Benedetta Rosadini, Iacopo Trotta, Stefano Bacherini, Alessandro Fantechi, and Stefania Gnesi. 2018. Detecting requirements defects with NLP patterns: An industrial experience in the railway domain. Empirical Software Engineering 23, 6 (2018).
Alessio Ferrari, Giorgio Oronzo Spagnolo, and Stefania Gnesi. 2017. Pure: A dataset of public requirements documents. In 2017 IEEE 25th International Requirements Engineering Conference.
Joseph L. Fleiss. 1971. Measuring nominal scale agreement among many raters. Psychol. Bull. 76, 5 (1971).
Vincenzo Gervasi, Alessio Ferrari, Didar Zowghi, and Paola Spoletini. 2019. Ambiguity in Requirements Engineering: Towards a Unifying Framework. In From Software Engineering to Formal Methods and Tools, and Back. Springer.
Benedikt Gleich, Oliver Creighton, and Leonid Kof. 2010. Ambiguity Detection: Towards a Tool Explaining Ambiguity Sources. In Proceedings of the 16thWorking Conference on Requirements Engineering: Foundation for Software Quality.
Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning (1st ed.). MIT Press.
Matthew Honnibal, Ines Montani, Sofie Van Landeghem, and Adriane Boyd. 2020. spaCy: Industrial-strength Natural Language Processing in Python. https: //doi. org/10. 5281/zenodo. 1212303
Mitra Bokaei Hosseini, Rocky Slavin, Travis Breaux, Xiaoyin Wang, and Jianwei Niu. 2020. Disambiguating Requirements Through Syntax-Driven Semantic Analysis of Information Types. In Proceedings of the 26th Working Conference on Requirements Engineering: Foundation for Software Quality.
Yufang Hou. 2020. Bridging Anaphora Resolution as Question Answering. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.
David Graff Huang, Shudong and George Doddington. 2002. Multiple-Translation Chinese Corpus LDC2002T01. Web download file. Philadelphia: Linguistic Data Consortium.
Nathalie Japkowicz. 2000. The class imbalance problem: Significance and strategies. In Proceedings of the International Conference on Artificial Intelligence.
Mandar Joshi, Danqi Chen, Yinhan Liu, Daniel S Weld, Luke Zettlemoyer, and Omer Levy. 2020. SpanBERT: Improving pre-training by representing and predicting spans. Transactions of the Association for Computational Linguistics 8 (2020).
Dan Jurafsky and James H. Martin. 2020. Speech and Language Processing (3rd ed.). https://web. stanford. edu/~jurafsky/slp3/(visited 2021-06-04).
Erik Kamsties. 2005. Understanding Ambiguity in Requirements Engineering. Springer Berlin Heidelberg.
Erik Kamsties and Barbara Peach. 2000. Taming ambiguity in natural language requirements. In Proceedings of the 13th International Conference on Software and Systems Engineering and Applications.
Nadzeya Kiyavitskaya, Nicola Zeni, Luisa Mich, and Daniel Berry. 2008. Requirements for tools for ambiguity identification and measurement in natural language requirements specifications. Requirements Engineering 13, 3 (2008).
Pohl Klaus and Rupp Chris. 2011. Requirements Engineering Fundamentals (1st ed.). Rocky Nook.
Thomas Kluyver, Benjamin Ragan-Kelley, Fernando Pérez, Brian Granger, Matthias Bussonnier, Jonathan Frederic, Kyle Kelley, Jessica Hamrick, Jason Grout, Sylvain Corlay, Paul Ivanov, Damián Avila, Safia Abdalla, and Carol Willing. 2016. Jupyter Notebooks-a publishing format for reproducible computational workflows. In Positioning and Power in Academic Publishing: Players, Agents and Agendas.
Giuseppe Lami, Mario Fusani, and Gianluca Trentanni. 2019. QuARS: A Pioneer Tool for NL Requirement Analysis. In From Software Engineering to Formal Methods and Tools, and Back. Springer.
Matthew Lamm, Jennimaria Palomaki, Chris Alberti, Daniel Andor, Eunsol Choi, Livio Baldini Soares, and Michael Collins. 2021. Qed: A framework and dataset for explanations in question answering. Transactions of the Association for Computational Linguistics 9 (2021).
J. Richard Landis and Gary G. Koch. 1977. An Application of Hierarchical Kappatype Statistics in the Assessment of Majority Agreement among Multiple Observers. Biometrics 33, 2 (1977).
Kusum Lata, Pardeep Singh, and Kamlesh Dutta. 2021. A comprehensive review on feature set used for anaphora resolution. Artificial Intelligence Review 54, 4 (2021).
Timothy Lee, Alex Lutz, and Jinho D Choi. 2016. QA-It: classifying non-referential it for question answer pairs. In Proceedings of the ACL 2016 Student Research Workshop.
Qi Liu, Matt J Kusner, and Phil Blunsom. 2020. A survey on contextual embeddings. (2020). arXiv:arXiv:2003. 07278
Edward Loper and Steven Bird. 2002. NLTK: The Natural Language Toolkit. In Proceedings of the ACL-02 Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics.
Christopher Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven Bethard, and David McClosky. 2014. The Stanford CoreNLP Natural Language Processing Toolkit. In Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations.
Mitch Marcus, Beatrice Santorini, and Mary Ann Marcinkiewicz. 1993. Building a Large Annotated Corpus of English: The Penn Treebank. Computational Linguistics 19, 2 (1993).
Murray Grossman Nii Martey John Bell Mark Liberman, Kelly Davis. 2002. Emotional Prosody Speech and Transcripts LDC2002S28. CD-ROM. Philadelphia: Linguistic Data Consortium.
Aaron Massey, Richard Rutledge, Annie Anton, and Peter Swire. 2014. Identifying and classifying ambiguity for regulatory requirements. In Proceedings of the 22nd IEEE International Requirements Engineering Conference.
Alistair Mavin, Philip Wilkinson, Adrian Harwood, and Mark Novak. 2009. Easy Approach to Requirements Syntax (EARS). In Proceedings of the 17th IEEE International Requirements Engineering Conference.
Joseph F McCarthy and Wendy G Lehnert. 1995. Using Decision Trees for Coreference Resolution. In International Joint Conferences on Artificial Intelligence.
Alessio Miaschi and Felice Dell'Orletta. 2020. Contextual and Non-Contextual Word Embeddings: an in-depth Linguistic Investigation. In Proceedings of the 5th Workshop on Representation Learning for NLP. Association for Computational Linguistics.
L. Mich. 1996. NL-OOPS: From natural language to object oriented requirements using the natural language processing system LOLITA. Natural Language Engineering 2, 2 (1996).
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. (2013). arXiv:arXiv:1301. 3781
Ruslan Mitkov. 1999. Anaphora resolution: the state of the art. Citeseer.
Ruslan Mitkov. 2014. Anaphora resolution. Routledge.
Natalia N Modjeska, Katja Markert, and Malvina Nissim. 2003. Using the web in machine learning for other-anaphora resolution. In Proceedings of the 2003 conference on Empirical methods in natural language processing.
Ray Offen. 2002. Domain understanding is the key to successful system development. Requirements engineering 7, 3 (2002).
Mohamed Osama, Aya Zaki-Ismail, Mohamed Abdelrazek, John Grundy, and Amani Ibrahim. 2020. Score-based automatic detection and resolution of syntactic ambiguity in natural language requirements. In 2020 IEEE International Conference on Software Maintenance and Evolution.
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32. Curran Associates, Inc.
Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825-2830.
Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation. In Empirical Methods in Natural Language Processing.
Jason Phang, Thibault Févry, and Samuel R Bowman. 2018. Sentence encoders on stilts: Supplementary training on intermediate labeled-data tasks. (2018). arXiv:arXiv:1811. 01088
Steven Piantadosi, Harry Tily, and Edward Gibson. 2012. The communicative function of ambiguity in language. Cognition 122, 3 (2012).
Massimo Poesio, Roland Stuckardt, and Yannick Versley. 2016. Anaphora resolution. Springer.
Klaus Pohl. 2010. Requirements Engineering (1st ed.). Springer.
Sameer Pradhan, Lance Ramshaw, Mitch Marcus, Martha Palmer, Ralph Weischedel, and Nianwen Xue. 2011. CoNLL-2011 shared task: Modeling unrestricted coreference in ontonotes. In Proceedings of the Fifteenth Conference on Computational Natural Language Learning: Shared Task. 1-27.
Peng Qi, Yuhao Zhang, Yuhui Zhang, Jason Bolton, and Christopher D. Manning. 2020. Stanza: A Python Natural Language Processing Toolkit for Many Human Languages. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations.
Felipe Quecole, Maisa Cristina Duarte, and Estevam Rafael Hruschka. 2018. Coupling for Coreference Resolution in a Never-ending Learning System. Journal of Information and Data Management 9, 2 (2018).
Karthik Raghunathan, Heeyoung Lee, Sudarshan Rangarajan, Nathanael Chambers, Mihai Surdeanu, Dan Jurafsky, and Christopher D Manning. 2010. A multipass sieve for coreference resolution. In Proceedings of the 2010 conference on empirical methods in natural language processing.
Marta Recasens, Lluís Màrquez, Emili Sapena, M Antònia Martí, Mariona Taulé, Véronique Hoste, Massimo Poesio, and Yannick Versley. 2010. Semeval-2010 task 1: Coreference resolution in multiple languages. In Proceedings of the 5th International Workshop on Semantic Evaluation.
Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. (2019). arXiv:arXiv:1908. 10084
Cristina Ribeiro and Daniel Berry. 2020. The prevalence and severity of persistent ambiguity in software requirements specifications: Is a special effort needed to find them? Science of Computer Programming 195 (2020).
Marcel Robeer, Garm Lucassen, Jan Martijn E. M. van der Werf, Fabiano Dalpiaz, and Sjaak Brinkkemper. 2016. Automated Extraction of Conceptual Models from User Stories via NLP. In Proceedings of the 24th IEEE International Requirements Engineering Conference.
Danissa Rodriguez, Doris Carver, and Anas Mahmoud. 2018. An efficient wikipedia-based approach for better understanding of natural language text related to user requirements. In Proceedings of the 39th IEEE Aerospace Conference.
Benedetta Rosadini, Alessio Ferrari, Gloria Gori, Alessandro Fantechi, Stefania Gnesi, Iacopo Trotta, and Stefano Bacherini. 2017. Using NLP to Detect Requirements Defects: An Industrial Experience in the Railway Domain. In Proceedings of the 23rdWorking Conference on Requirements Engineering: Foundation for Software Quality.
Chetan Arora Mehrdad Sabetzadeh Saad Ezzini, Sallam Abualhaija. 2021. "Online Annex (online)". Available at https://tinyurl. com/2p9k2zf2, August 2021.
Nicolas Sannier, Morayo Adedjouma, Mehrdad Sabetzadeh, and Lionel Briand. 2017. An automated framework for detection and resolution of cross references in legal texts. Requirements Engineering 22, 2 (2017).
Unnati Shah and Devesh Jinwala. 2015. Resolving Ambiguities in Natural Language Software Requirements: A Comprehensive Survey. SIGSOFT Software Engineering Notes 40, 5 (2015).
Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, and Tie-Yan Liu. 2020. Mpnet: Masked and permuted pre-training for language understanding. (2020). arXiv:arXiv:2004. 09297
Rhea Sukthanker, Soujanya Poria, Erik Cambria, and Ramkumar Thirunavukarasu. 2020. Anaphora and coreference resolution: A review. Information Fusion 59 (2020).
Pang-Ning Tan, Michael Steinbach, and Vipin Kumar. 2016. Introduction to data mining. Pearson Education India.
Sri Tjong and Daniel Berry. 2013. The design of SREE-a prototype potential ambiguity finder for requirements specifications and lessons learned. In Proceedings of the 19th Working Conference on Requirements Engineering: Foundation for Software Quality.
Guido Van Rossum and Fred L. Drake. 2009. Python 3 Reference Manual. CreateSpace.
Yawen Wang, Lin Shi, Mingyang Li, Qing Wang, and Yun Yang. 2020. A Deep Context-wise Method for Coreference Detection in Natural Language Requirements. In 2020 IEEE 28th International Requirements Engineering Conference.
Ian Witten, Eibe Frank, Mark Hall, and Christopher Pal. 2011. Data Mining: Practical Machine Learning Tools and Techniques (4th ed.). Elsevier.
ThomasWolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander M. Rush. 2020. Transformers: State-of-the-Art Natural Language Processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics.
Wei Wu, Fei Wang, Arianna Yuan, Fei Wu, and Jiwei Li. 2020. CorefQA: Coreference resolution as query-based span prediction. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.
Hui Yang, Anne De Roeck, Vincenzo Gervasi, Alistair Willis, and Bashar Nuseibeh. 2010. Extending nocuous ambiguity analysis for anaphora in natural language requirements. In Proceedings of the 18th IEEE International Requirements Engineering Conference. IEEE.
Hui Yang, Anne de Roeck, Vincenzo Gervasi, Alistair Willis, and Bashar Nuseibeh. 2011. Analysing anaphoric ambiguity in natural language requirements. Requirements Engineering 16, 3 (2011).