Computer Science - Software Engineering; Requirements Engineering (RE); Natural Language Processing (NLP); Replication; Tool Reconstruction; Annotation; ID Card
Résumé :
[en] Natural language processing (NLP) techniques have been widely applied in the requirements engineering (RE) field to support tasks such as classification and ambiguity detection. Despite its empirical vocation, RE research has given limited attention to replication of NLP for RE studies. Replication is hampered by several factors, including the context specificity of the studies, the heterogeneity of the tasks involving NLP, the tasks’ inherent hairiness, and, in turn, the heterogeneous reporting structure. To address these issues, we propose a new artifact, referred to as ID-Card, whose goal is to provide a structured summary of research papers emphasizing replication-relevant information. We construct the ID-Card through a structured, iterative process based on design science. In this article: (i) we report on hands-on experiences of replication; (ii) we review the state-of-the-art and extract replication-relevant information: (iii) we identify, through focus groups, challenges across two typical dimensions of replication: data annotation and tool reconstruction; and (iv) we present the concept and structure of the ID-Card to mitigate the identified challenges. This study aims to create awareness of replication in NLP for RE. We propose an ID-Card that is intended to foster study replication but can also be used in other contexts, e.g., for educational purposes.
Centre de recherche :
Interdisciplinary Centre for Security, Reliability and Trust (SnT) > SVV - Software Verification and Validation
Disciplines :
Sciences informatiques
Auteur, co-auteur :
ABUALHAIJA, Sallam ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SVV
BaŞAk Aydemir, F.
Dalpiaz, Fabiano
Dell'Anna, Davide
Ferrari, Alessio
Franch, Xavier
Fucci, Davide
Co-auteurs externes :
yes
Langue du document :
Anglais
Titre :
Replication and Verifiability in Requirements Engineering: the NLP for RE Case
Date de publication/diffusion :
juin 2024
Titre du périodique :
ACM Transactions on Software Engineering and Methodology
ISSN :
1049-331X
Maison d'édition :
Association for Computing Machinery (ACM), Etats-Unis
Sallam Abualhaija, Fatma Başak Aydemir, Fabiano Dalpiaz, Davide Dell’Anna, Alessio Ferrari, Xavier Franch, and Davide Fucci. 2024. Online Annex: Replication in Requirements Engineering: The NLP for RE Case. https://doi.org/10.6084/m9.figshare.21824481
Sharif Ahmed, Arif Ahmed, and Nasir U. Eisty. 2022. Automatic transformation of natural to unified modeling language: A systematic review. In 2022 IEEE/ACIS 20th International Conference on Software Engineering Research, Management and Applications (SERA’22). 112–119. https://doi.org/10.1109/SERA54885.2022.9806783
Carlos E. Anchundia and Efraín R. Fonseca C. 2020. Resources for reproducibility of experiments in empirical software engineering: Topics derived from a secondary study. IEEE Access 8 (2020), 8992–9004. https://doi.org/10.1109/ACCESS.2020.2964587
Monya Baker. 2016. 1,500 scientists lift the lid on reproducibility. Nature 533, 7604 (2016).
D. Berry, E. Kamsties, and M. Krieger. 2003. From Contract Drafting to Software Specification: Linguistic Sources of Ambiguity, A Handbook. Retrieved from http://se.uwaterloo.ca/~dberry/handbook/ambiguityHandbook.pdf
Rosanna L. Breen. 2006. A practical guide to focus-group research. Journal of Geography in Higher Education 30, 3 (2006), 463–475.
Jeffrey C. Carver. 2010. Towards reporting guidelines for experimental replications: A proposal. In 1st International Workshop on Replication in Empirical Software Engineering, Vol. 1. 1–4.
Danqi Chen and Christopher Manning. 2014. A fast and accurate dependency parser using neural networks. In Proceedings of the 18th Conference on Empirical Methods in Natural Language Processing (EMNLP’14).
Jane Cleland-Huang, Raffaella Settimi, Xuchang Zou, and Peter Solc. 2006. The detection and classification of nonfunctional requirements with application to early aspects. In 14th IEEE International Requirements Engineering Conference. 39–48.
Jane Cleland-Huang, Raffaella Settimi, Xuchang Zou, and Peter Solc. 2007. Automated classification of non-functional requirements. Requirements Engineering 12, 2 (2007), 103–120.
Margarita Cruz, Beatriz Bernárdez, Amador Durán, Jose A. Galindo, and Antonio Ruiz-Cortés. 2019. Replication of studies in empirical software engineering: A systematic mapping study, from 2013 to 2018. IEEE Access 8 (2019), 26773–26791.
Fabio Q. B. Da Silva, Marcos Suassuna, A. César C. França, Alicia M. Grubb, Tatiana B. Gouveia, Cleviton V. F. Monteiro, and Igor Ebrahim dos Santos. 2014. Replication of empirical studies in software engineering research: A systematic mapping study. Empirical Software Engineering 19, 3 (2014), 501–557.
Fabiano Dalpiaz, Davide Dell’Anna, Fatma Basak Aydemir, and Sercan Çevikol. 2019. Requirements classification with interpretable machine learning and dependency parsing. In Proceedings of the 27th IEEE International Requirements Engineering Conference, RE 2019. 142–152. https://doi.org/10.1109/RE.2019.00025
Fabiano Dalpiaz, Alessio Ferrari, Xavier Franch, and Cristina Palomares. 2018. Natural language processing for requirements engineering: The best is yet to come. IEEE Software 35, 5 (2018), 115–119.
Fabiano Dalpiaz, Ivor van der Schalk, and Garm Lucassen. 2018. Pinpointing ambiguity and incompleteness in requirements engineering via information visualization and NLP. In Proceedings of the 24th Working Conference on Requirements Engineering: Foundation for Software Quality (REFSQ’18).
Fred D. Davis. 1989. Perceived usefulness, perceived ease of use, and user acceptance of information technology. MIS Quarterly 13, 3 (1989), 319–340. http://www.jstor.org/stable/249008
F. De Bruijn and H. Dekkers. 2010. Ambiguity in natural language software requirements: A case study. In Proceedings of the International Workshop on Requirements Engineering: Foundation of Software Quality (REFSQ) (Essen, DE). 233–247.
Cleyton V. C. De Magalhães, Fabio Q. B. da Silva, Ronnie E. S. Santos, and Marcos Suassuna. 2015. Investigations about replication of empirical studies in software engineering: A systematic mapping study. Information and Software Technology 64 (2015), 76–101.
Jacek Dąbrowski, Emmanuel Letier, Anna Perini, and Angelo Susi. 2023. Mining and searching app reviews for requirements engineering: Evaluation and replication studies. Information Systems 114 (2023), 102181. https://doi.org/10.1016/j.is.2023.102181
Saad Ezzini, Sallam Abualhaija, Chetan Arora, and Mehrdad Sabetzadeh. 2022. Automated handling of anaphoric ambiguity in requirements: A multi-solution study. In 2022 IEEE/ACM 44th International Conference on Software Engineering.
Saad Ezzini, Sallam Abualhaija, Chetan Arora, Mehrdad Sabetzadeh, and Lionel C. Briand. 2021. Using domain-specific corpora for improved handling of ambiguity in requirements. In 2021 IEEE/ACM 43rd International Conference on Software Engineering.
D. Méndez Fernández, Stefan Wagner, Marcos Kalinowski, Michael Felderer, Priscilla Mafra, Antonio Vetrò, Tayana Conte, M.-T. Christiansson, Des Greer, Casper Lassenius, et al. 2017. Naming the pain in requirements engineering. Empirical Software Engineering 22, 5 (2017), 2298–2338.
Alessio Ferrari, Felice Dell’Orletta, Andrea Esuli, Vincenzo Gervasi, and Stefania Gnesi. 2017. Natural language requirements processing: A 4D vision. IEEE Softw. 34, 6 (2017), 28–35.
Alessio Ferrari, Giorgio Oronzo Spagnolo, and Stefania Gnesi. 2017. PURE: A dataset of public requirements documents. In 2017 IEEE 25th International Requirements Engineering Conference. 502–505. https://doi.org/10.1109/RE.2017. 29
Xavier Franch, Cristina Palomares, Carme Quer, Panagiota Chatzipetrou, and Tony Gorschek. 2023. The state-of-practice in requirements specification: An extended interview study at 12 companies. Requirements Engineering (2023). https://doi.org/10.1007/s00766-023-00399-7
Vincenzo Gervasi, Alessio Ferrari, Didar Zowghi, and Paola Spoletini. 2019. Ambiguity in requirements engineering: Towards a unifying framework. In From Software Engineering to Formal Methods and Tools, and Back. Springer.
Martin Glinz. 2007. On non-functional requirements. In 15th IEEE International Requirements Engineering Conference (RE’07). IEEE, 21–26.
Jesús M. González-Barahona and Gregorio Robles. 2012. On the reproducibility of empirical software engineering studies based on data retrieved from development repositories. Empirical Software Engineering 17 (2012), 75–89.
Jesus M. Gonzalez-Barahona and Gregorio Robles. 2023. Revisiting the reproducibility of empirical software engineering studies based on data retrieved from development repositories. Information and Software Technology 164 (2023), 107318.
Ben Hermann, Stefan Winter, and Janet Siegmund. 2020. Community expectations for research artifacts and evaluation processes. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 469–480.
David W. Johnson and Frank P. Johnson. 1991. Joining Together: Group Theory and Group Skills. Prentice-Hall, Inc.
Natalia Juristo and Sira Vegas. 2011. The role of non-exact replications in software engineering experiments. Empirical Software Engineering 16, 3 (2011), 295–324.
Erik Kamsties and Barbara Peach. 2000. Taming ambiguity in natural language requirements. In Proceedings of the 13th International Conference on Software and Systems Engineering and Applications (ICSSEA’00).
Nikita Kitaev and Dan Klein. 2018. Constituency parsing with a self-attentive encoder. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15–20, 2018, Volume 1: Long Papers, Iryna Gurevych and Yusuke Miyao (Eds.). Association for Computational Linguistics, 2676–2686. https://doi.org/10.18653/v1/P18-1249
Nadzeya Kiyavitskaya, Nicola Zeni, Luisa Mich, and Daniel Berry. 2008. Requirements for tools for ambiguity identification and measurement in natural language requirements specifications. Requirements Engineering 13, 3 (2008).
Zijad Kurtanović and Walid Maalej. 2017. Automatically classifying functional and non-functional requirements using supervised machine learning. In 2017 IEEE 25th International Requirements Engineering Conference (RE’17). IEEE, 490–495.
Zijad Kurtanović and Walid Maalej. 2018. On user rationale in software engineering. Requirements Engineering 23, 3 (2018), 357–379.
J. Richard Landis and Gary G. Koch. 1977. An application of hierarchical kappa-type statistics in the assessment of majority agreement among multiple observers. Biometrics 33, 2 (1977).
Feng-Lin Li, Jennifer Horkoff, John Mylopoulos, Renata S. S. Guizzardi, Giancarlo Guizzardi, Alexander Borgida, and Lin Liu. 2014. Non-functional requirements as qualities, with a spice of ontology. In 2014 IEEE 22nd International Requirements Engineering Conference (RE’14). IEEE, 293–302.
Daniel Mendez, Daniel Graziotin, Stefan Wagner, and Heidi Seibold. 2020. Open science in software engineering. Contemporary Empirical Methods in Software Engineering (2020), 477–501.
Daniel Méndez Fernández, Martin Monperrus, Robert Feldt, and Thomas Zimmermann. 2019. The open science initiative of the Empirical Software Engineering journal. Empirical Software Engineering 24 (2019), 1057–1060.
Lloyd Montgomery, Davide Fucci, Abir Bouraffa, Lisa Scholz, and Walid Maalej. 2022. Empirical research on requirements quality: A systematic mapping study. Requirements Engineering (2022), 1–27.
Erik Jan Philippo, Werner Heijstek, Bas Kruiswijk, Michel R. V. Chaudron, and Daniel M. Berry. 2013. Requirement ambiguity not as important as expected — results of an empirical evaluation. In Proceedings of the International Workshop on Requirements Engineering: Foundation of Software Quality (REFSQ’13) (Essen, DE). 65–79.
Shekoufeh Rahimi, Kevin Charles Lano, and Chenghua Lin. 2022. Requirement formalisation using natural language processing and machine learning: A systematic review. In International Conference on Model-Based Software and Systems Engineering. SCITEPRESS Digital Library, 1–8.
Cristina Ribeiro and Daniel Berry. 2020. The prevalence and severity of persistent ambiguity in software requirements specifications: Is a special effort needed to find them? Science of Computer Programming 195 (2020), 102472.
Faiz Ali Shah, Kairit Sirts, and Dietmar Pfahl. 2019. Is the SAFE approach too simple for app feature extraction? A replication study. In Requirements Engineering: Foundation for Software Quality, Eric Knauss and Michael Goedicke (Eds.). Springer International Publishing, Cham, 21–36.
Martin Shepperd, Nemitari Ajienka, and Steve Counsell. 2018. The role and value of replication in empirical software engineering results. Information and Software Technology 99 (2018), 120–132.
Forrest J. Shull, Jeffrey C. Carver, Sira Vegas, and Natalia Juristo. 2008. The role of replications in empirical software engineering. Empirical Software Engineering 13, 2 (2008), 211–218.
Roel J. Wieringa. 2014. Design Science Methodology for Information Systems and Software Engineering. Springer.
Jonas Winkler and Andreas Vogelsang. 2016. Automatic classification of requirements based on convolutional neural networks. In 24th IEEE International Requirements Engineering Conference, RE 2016, Beijing, China, September 12–16, 2016. IEEE Computer Society, 39–45. https://doi.org/10.1109/REW.2016.021
Stefan Winter, Christopher S. Timperley, Ben Hermann, Jurgen Cito, Jonathan Bell, Michael Hilton, and Dirk Beyer. 2022. A retrospective study of one decade of artifact evaluations. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 145–156.
Claes Wohlin, Per Runeson, Martin Höst, Magnus C. Ohlsson, Björn Regnell, and Anders Wesslén. 2012. Experimentation in Software Engineering. Springer Science & Business Media.
Hui Yang, Anne de Roeck, Vincenzo Gervasi, Alistair Willis, and Bashar Nuseibeh. 2011. Analysing anaphoric ambiguity in natural language requirements. Requirements Engineering 16, 3 (May 2011), 163. https://doi.org/10.1007/s00766-011-0119-y
Hui Yang, Anne de Roeck, Alistair Willis, and Bashar Nuseibeh. 2010. A methodology for automatic identification of nocuous ambiguity. In Proceedings of the 23rd International Conference on Computational Linguistics (Coling’10). Coling 2010 Organizing Committee, Beijing, China, 1218–1226. https://aclanthology.org/C10-1137
Liping Zhao, Waad Alhoshan, Alessio Ferrari, Keletso J. Letsholo, Muideen A. Ajagbe, Erol-Valeriu Chioasca, and Riza T. Batista-Navarro. 2021. Natural language processing for requirements engineering: A systematic mapping study. ACM Computing Surveys (CSUR) 54, 3 (2021), 1–41.
Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, et al. 2023. A survey of large language models. arXiv preprint arXiv:2303.18223 (2023).