Requirements Engineering; Natural-language Requirements; Natural Language Processing; Domain-specific Corpus Generation; Wikipedia
Résumé :
[en] We introduce WikiDoMiner -- a tool for automatically generating domain-specific corpora by crawling Wikipedia. WikiDoMiner helps requirements engineers create an external knowledge resource that is specific to the underlying domain of a given requirements specification (RS). Being able to build such a resource is important since domain-specific datasets are scarce. WikiDoMiner generates a corpus by first extracting a set of domain-specific keywords from a given RS, and then querying Wikipedia for these keywords. The output of WikiDoMiner is a set of Wikipedia articles relevant to the domain of the input RS. Mining Wikipedia for domain-specific knowledge can be beneficial for multiple requirements engineering tasks, e.g., ambiguity handling, requirements classification, and question answering. WikiDoMiner is publicly available on Zenodo under an open-source license (https://doi.org/10.5281/zenodo.6672682)
Centre de recherche :
Interdisciplinary Centre for Security, Reliability and Trust (SnT) > SVV - Software Verification and Validation
Disciplines :
Sciences informatiques
Auteur, co-auteur :
EZZINI, Saad ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SVV
ABUALHAIJA, Sallam ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SVV
Sabetzadeh, Mehrdad
Co-auteurs externes :
yes
Langue du document :
Anglais
Titre :
WikiDoMiner: Wikipedia Domain-Specific Miner
Date de publication/diffusion :
2022
Nom de la manifestation :
ACM SIGSOFT CONFERENCE ON THE FOUNDATIONS OF SOFTWARE ENGINEERING
Date de la manifestation :
from 14-11-2022 to 18-11-2022
Titre de l'ouvrage principal :
ACM SIGSOFT CONFERENCE ON THE FOUNDATIONS OF SOFTWARE ENGINEERING
Maison d'édition :
Association for Computing Machinery
Peer reviewed :
Peer reviewed
Projet FnR :
FNR12632261 - Early Quality Assurance Of Critical Systems, 2018 (01/01/2019-31/12/2021) - Mehrdad Sabetzadeh
Sallam Abualhaija, Chetan Arora, Mehrdad Sabetzadeh, Lionel Briand, and Eduardo Vaz. 2019. A Machine Learning-Based Approach for Demarcating Requirements in Textual Specifications. In Proceedings of the 27th IEEE International Requirements Engineering Conference (RE'19).
Chetan Arora, Mehrdad Sabetzadeh, Lionel Briand, and Frank Zimmer. 2017. Automated Extraction and Clustering of Requirements Glossary Terms. IEEE Transactions on Software Engineering 43, 10 (2017).
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. (2018). ArXiv:arXiv:1810. 04805
Saad Ezzini, Sallam Abualhaija, Chetan Arora, and Mehrdad Sabetzadeh. 2022. Automated Handling of Anaphoric Ambiguity: A multi-solution Study. In 2022 IEEE/ACM 44th International Conference on Software Engineering.
Saad Ezzini, Sallam Abualhaija, Chetan Arora, Mehrdad Sabetzadeh, and Lionel C Briand. 2021. Using domain-specific corpora for improved handling of ambiguity in requirements. In 2021 IEEE/ACM 43rd International Conference on Software Engineering.
Christiane Fellbaum. 1998. WordNet: An Electronic Lexical Database (1st ed.). The MIT Press.
Alessio Ferrari and Andrea Esuli. 2019. An NLP approach for cross-domain ambiguity detection in requirements engineering. Automated Software Engineering 26, 3 (2019).
Alessio Ferrari, Gloria Gori, Benedetta Rosadini, Iacopo Trotta, Stefano Bacherini, Alessandro Fantechi, and Stefania Gnesi. 2018. Detecting requirements defects with NLP patterns: An industrial experience in the railway domain. Empirical Software Engineering 23, 6 (2018).
Alessio Ferrari, Giorgio Oronzo Spagnolo, and Stefania Gnesi. 2017. Pure: A dataset of public requirements documents. In 2017 IEEE 25th International Requirements Engineering Conference.
Matthew Honnibal, Ines Montani, Sofie Van Landeghem, and Adriane Boyd. 2020. spaCy: Industrial-strength Natural Language Processing in Python. https: //doi. org/10. 5281/zenodo. 1212303
Dan Jurafsky and James H. Martin. 2020. Speech and Language Processing (3rd ed.). https://web. stanford. edu/~jurafsky/slp3/(visited 2021-06-04).
Edward Loper and Steven Bird. 2002. NLTK: The Natural Language Toolkit. In Proceedings of the ACL-02 Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics.
M. McGill and G. Salton. 1983. Introduction to Modern Information Retrieval. McGraw-Hill.
George Miller. 1995. WordNet: A lexical database for English. Commun. ACM 38, 11 (1995).
Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825-2830.
Guido Van Rossum and Fred L. Drake. 2009. Python 3 Reference Manual. CreateSpace.
Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel Bowman. 2018. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. 353-355.
Jonas Winkler and Andreas Vogelsang. 2018. Using Tools to Assist Identification of Non-requirements in Requirements Specifications-A Controlled Experiment. In Proceedings of the 24th International Working Conference on Requirements Engineering: Foundation for Software Quality (REFSQ'18).
Liping Zhao, Waad Alhoshan, Alessio Ferrari, Keletso J Letsholo, Muideen A Ajagbe, Erol-Valeriu Chioasca, and Riza T Batista-Navarro. 2021. Natural language processing for requirements engineering: A systematic mapping study. ACM Computing Surveys (CSUR) 54, 3 (2021), 1-41.