Natural-language Requirements; Question Answering (QA); Language Models; Natural Language Processing (NLP); Natural Language Generation (NLG); BERT; T5
Abstract :
[en] Abstract—By virtue of being prevalently written in natural language (NL), requirements are prone to various defects, e.g., inconsistency and incompleteness. As such, requirements are frequently subject to quality assurance processes. These processes, when carried out entirely manually, are tedious and may further overlook important quality issues due to time and budget pressures. In this paper, we propose QAssist – a question-answering (QA) approach that provides automated assistance to stakeholders, including requirements engineers, during the analysis of NL requirements. Posing a question and getting an instant answer is beneficial in various quality-assurance scenarios, e.g., incompleteness detection. Answering requirements-related questions automatically is challenging since the scope of the search for answers can go beyond the given requirements specification. To that end, QAssist provides support for mining external domain-knowledge resources. Our work is one of the first initiatives to bring together QA and external domain knowledge for addressing requirements engineering challenges. We evaluate QAssist on a dataset covering three application domains and containing a total of 387 question-answer pairs. We experiment with state-of-the-art QA methods, based primarily on recent large-scale language models. In our empirical study, QAssist localizes the answer to a question to three passages within the requirements specification and within the external domain-knowledge resource with an average recall of 90.1% and 96.5%, respectively. QAssist extracts the actual answer to the posed question with an average accuracy of 84.2%. Index Terms—Natural-language Requirements, Question Answering (QA), Language Models, Natural Language Processing (NLP), Natural Language Generation (NLG), BERT, T5.
Research center :
Interdisciplinary Centre for Security, Reliability and Trust (SnT) > SVV - Software Verification and Validation
Disciplines :
Computer science
Author, co-author :
EZZINI, Saad ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SVV
ABUALHAIJA, Sallam ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SVV
A. van Lamsweerde, Requirements Engineering: From System Goals to UML Models to Software Specifications, 1st ed. Wiley, 2009.
K. Pohl, Requirements Engineering, 1st ed. Springer, 2010.
L. Zhao, W. Alhoshan, A. Ferrari, K. J. Letsholo, M. A. Ajagbe, E.-V. Chioasca, and R. T. Batista-Navarro, "Natural language processing (nlp) for requirements engineering: A systematic mapping study, " arXiv preprint arXiv: 2004. 01099, 2020.
A. Ferrari and A. Esuli, "An NLP approach for cross-domain ambiguity detection in requirements engineering, " Automated Software Engineering, vol. 26, no. 3, 2019.
S. Ezzini, S. Abualhaija, C. Arora, M. Sabetzadeh, and L. C. Briand, "Using domain-specific corpora for improved handling of ambiguity in requirements, " in 2021 IEEE/ACM 43rd International Conference on Software Engineering, 2021.
F. Dalpiaz, I. Schalk, and G. Lucassen, "Pinpointing ambiguity and incompleteness in requirements engineering via information visualization and NLP, " in Proceedings of the 24th Working Conference on Requirements Engineering: Foundation for Software Quality, 2018.
C. Arora, M. Sabetzadeh, and L. C. Briand, "An empirical study on the potential usefulness of domain models for completeness checking of requirements, " Empirical Software Engineering, vol. 24, no. 4, pp. 2509-2539, 2019.
I. Hadar, A. Zamansky, and D. M. Berry, "The inconsistency between theory and practice in managing inconsistency in requirements engineering, " Empirical Software Engineering, vol. 24, no. 6, pp. 3972-4005, 2019.
D. Jurafsky and J. H. Martin, Speech and Language Processing, 3rd ed., 2020, https://web. stanford. edu/?jurafsky/slp3/ (visited 2021-06-04).
C. Arora, M. Sabetzadeh, L. Briand, and F. Zimmer, "Automated extraction and clustering of requirements glossary terms, " IEEE Transactions on Software Engineering, vol. 43, no. 10, 2017.
S. Ezzini, S. Abualhaija, and M. Sabetzadeh, "Wikidominer: Wikipedia domain-specific miner, " in Proceedings of the 17th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, 2022.
F. Zhu, W. Lei, C. Wang, J. Zheng, S. Poria, and T.-S. Chua, "Retrieving and reading: A comprehensive survey on open-domain question answering, " arXiv preprint arXiv: 2101. 00774, 2021.
J. I. Maletic and M. L. Collard, "Tql: A query language to support traceability, " in 2009 ICSE workshop on traceability in emerging forms of software engineering. IEEE, 2009, pp. 16-20.
P. Mäder and J. Cleland-Huang, "A visual language for modeling and executing traceability queries, " Software & Systems Modeling, vol. 12, no. 3, pp. 537-553, 2013.
P. Pruski, S. Lohar, W. Goss, A. Rasin, and J. Cleland-Huang, "Tiqi: Answering unstructured natural language trace queries, " Requirements Engineering, vol. 20, no. 3, pp. 215-232, 2015.
J. Lin, Y. Liu, J. Guo, J. Cleland-Huang, W. Goss, W. Liu, S. Lohar, N. Monaikul, and A. Rasin, "Tiqi: A natural language interface for querying software project data, " in 2017 32nd IEEE/ACM International Conference on Automated Software Engineering. IEEE, 2017, pp. 973-977.
S. Malviya, M. Vierhauser, J. Cleland-Huang, and S. Ghaisas, "What questions do requirements engineers ask?" in 2017 IEEE 25th International Requirements Engineering Conference. IEEE, 2017, pp. 100-109.
S. Abualhaija, C. Arora, A. Sleimi, and L. Briand, "Automated question answering for improved understanding of compliance requirements: A multi-document study, " in In Proceedings of the 30th IEEE International Requirements Engineering Conference, Melbourne, Australia 15-19 August 2022, 2022.
M. A. C. Soares and F. S. Parreiras, "A literature review on question answering techniques, paradigms and systems, " Journal of King Saud University-Computer and Information Sciences, vol. 32, no. 6, pp. 635-646, 2020.
P. Rajpurkar, J. Zhang, K. Lopyrev, and P. Liang, "Squad: 100, 000+questions for machine comprehension of text, " arXiv preprint arXiv: 1606. 05250, 2016.
M. Joshi, E. Choi, D. S. Weld, and L. Zettlemoyer, "Triviaqa: A large scale distantly supervised challenge dataset for reading comprehension, " arXiv preprint arXiv: 1705. 03551, 2017.
T. Kwiatkowski, J. Palomaki, O. Redfield, M. Collins, A. Parikh, C. Alberti, D. Epstein, I. Polosukhin, J. Devlin, K. Lee et al., "Natural questions: A benchmark for question answering research, " Transactions of the Association for Computational Linguistics, vol. 7, pp. 453-466, 2019.
A. Pampari, P. Raghavan, J. Liang, and J. Peng, "emrqa: A large corpus for question answering on electronic medical records, " in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018, pp. 2357-2368.
J. He, M. Fu, and M. Tu, "Applying deep matching networks to chinese medical question answering: A study and a dataset, " BMC medical informatics and decision making, vol. 19, no. 2, pp. 91-100, 2019.
Y. Tian, W. Ma, F. Xia, and Y. Song, "Chimed: A chinese medical corpus for question answering, " in Proceedings of the 18th BioNLP Workshop and Shared Task, 2019, pp. 250-260.
Z. Hu, "Research and implementation of railway technical specification question answering system based on deep learning, " in 2020 IEEE 5th Information Technology and Mechatronics Engineering Conference (ITOEC), 2020, pp. 5-9.
D. Chen, A. Fisch, J. Weston, and A. Bordes, "Reading Wikipedia to answer open-domain questions, " in Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2017, pp. 1870-1879.
M. McGill and G. Salton, Introduction to Modern Information Retrieval. McGraw-Hill, 1983.
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "BERT: Pre-training of deep bidirectional transformers for language understanding, " 2018.
S. Liu, X. Zhang, S. Zhang, H. Wang, and W. Zhang, "Neural machine reading comprehension: Methods and trends, " Applied Sciences, vol. 9, no. 18, p. 3698, 2019.
C. Manning, P. Raghavan, and H. Schutze, Introduction to Information Retrieval, 1st ed. Cambridge University Press, 2008.
K. S. Jones, "A statistical interpretation of term specificity and its application in retrieval, " Journal of documentation, 1972.
S. Robertson and H. Zaragoza, The probabilistic relevance framework: BM25 and beyond. Now Publishers Inc, 2009.
N. Thakur, N. Reimers, A. Rücklé, A. Srivastava, and I. Gurevych, "Beir: A heterogenous benchmark for zero-shot evaluation of information retrieval models, " arXiv preprint arXiv: 2104. 08663, 2021.
R. Nogueira and K. Cho, "Passage re-ranking with bert, " arXiv preprint arXiv: 1901. 04085, 2019.
K. Wang, N. Thakur, N. Reimers, and I. Gurevych, "Gpl: Generative pseudo labeling for unsupervised domain adaptation of dense retrieval, " arXiv preprint arXiv: 2112. 07577, 2021.
S. Zhuang and G. Zuccon, "Dealing with typos for bert-based passage retrieval and ranking, " in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021, pp. 2836-2842.
D. Chen and W.-t. Yih, "Open-domain question answering, " in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts. Online: Association for Computational Linguistics, 2020, pp. 34-37.
P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. Küttler, M. Lewis, W.-t. Yih, T. Rocktäschel et al., "Retrievalaugmented generation for knowledge-intensive nlp tasks, " Advances in Neural Information Processing Systems, vol. 33, pp. 9459-9474, 2020.
S. J. Pan and Q. Yang, "A survey on transfer learning, " IEEE Transactions on knowledge and data engineering, vol. 22, no. 10, pp. 1345-1359, 2009.
F. Petroni, T. Rocktäschel, P. Lewis, A. Bakhtin, Y. Wu, A. H. Miller, and S. Riedel, "Language models as knowledge bases?" arXiv preprint arXiv: 1909. 01066, 2019.
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, "Attention is all you need, " arXiv preprint arXiv: 1706. 03762, 2017.
K. Clark, M.-T. Luong, Q. V. Le, and C. D. Manning, "Electra: Pretraining text encoders as discriminators rather than generators, " arXiv preprint arXiv: 2003. 10555, 2020.
Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, and R. Soricut, "Albert: A lite bert for self-supervised learning of language representations, " arXiv preprint arXiv: 1909. 11942, 2019.
V. Sanh, L. Debut, J. Chaumond, and T. Wolf, "Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter, " arXiv preprint arXiv: 1910. 01108, 2019.
W. Wang, F. Wei, L. Dong, H. Bao, N. Yang, and M. Zhou, "Minilm: Deep self-attention distillation for task-agnostic compression of pretrained transformers, " Advances in Neural Information Processing Systems, vol. 33, pp. 5776-5788, 2020.
Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov, "Roberta: A robustly optimized bert pretraining approach, " arXiv preprint arXiv: 1907. 11692, 2019.
J. Gou, B. Yu, S. J. Maybank, and D. Tao, "Knowledge distillation: A survey, " International Journal of Computer Vision, vol. 129, no. 6, pp. 1789-1819, 2021.
C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P. J. Liu, "Exploring the limits of transfer learning with a unified text-to-text transformer, " 2019.
D. Milne, O. Medelyan, and I. Witten, "Mining domain-specific thesauri from wikipedia: A case study, " in Proceedings of the 5th IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings) (WI'06), 2006.
G. Cui, Q. Lu, W. Li, and Y. Chen, "Corpus exploitation from Wikipedia for ontology construction, " in Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08). Marrakech, Morocco: European Language Resources Association (ELRA), May 2008.
A. Ferrari, G. O. Spagnolo, and S. Gnesi, "Pure: A dataset of public requirements documents, " in 2017 IEEE 25th International Requirements Engineering Conference, 2017.
K. Saxena, T. Singh, A. Patil, S. Sunkle, and V. Kulkarni, "Leveraging Wikipedia navigational templates for curating domain-specific fuzzy conceptual bases, " in Proceedings of the Second Workshop on Data Science with Human in the Loop: Language Advances. Online: Association for Computational Linguistics, Jun. 2021, pp. 1-7.
T. Kluyver, B. Ragan-Kelley, F. Pérez, B. Granger, M. Bussonnier, J. Frederic, K. Kelley, J. Hamrick, J. Grout, S. Corlay, P. Ivanov, D. Avila, S. Abdalla, and C. Willing, "Jupyter notebooks-a publishing format for reproducible computational workflows, " in Positioning and Power in Academic Publishing: Players, Agents and Agendas, 2016.
T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz, J. Davison, S. Shleifer, P. von Platen, C. Ma, Y. Jernite, J. Plu, C. Xu, T. L. Scao, S. Gugger, M. Drame, Q. Lhoest, and A. M. Rush, "Transformers: State-of-the-art natural language processing, " in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, 2020.
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg et al., "Scikit-learn: Machine learning in Python, " Journal of Machine Learning Research, vol. 12, pp. 2825-2830, 2011.
B. Dorian, J. Sarthak, N. Vít, and nlp4whp, "dorianbrown/rank bm25, " 2022. [Online]. Available: https://doi. org/10. 5281/zenodo. 6106156
N. Thakur, N. Reimers, A. Rücklé, A. Srivastava, and I. Gurevych, "BEIR: A heterogeneous benchmark for zero-shot evaluation of information retrieval models, " in Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021.
J. Goldsmith, "The wikipedia libray, " 2022. [Online]. Available: https://pypi. org/project/wikipedia/
E. Loper and S. Bird, "Nltk: The natural language toolkit, " in Proceedings of the ACL-02 Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics, 2002.
X. Du and C. Cardie, "Identifying where to focus in reading comprehension for neural question generation, " in Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Copenhagen, Denmark: Association for Computational Linguistics, 2017, pp. 2067-2073.
L. Pan, W. Lei, T.-S. Chua, and M.-Y. Kan, "Recent advances in neural question generation, " arXiv preprint arXiv: 1905. 08949, 2019.
G. Miller, "WordNet: A lexical database for English, " Communications of the ACM, vol. 38, no. 11, 1995.
M. Hanna and O. Bojar, "A fine-grained analysis of BERTScore, " in Proceedings of the Sixth Conference on Machine Translation. Online: Association for Computational Linguistics, 2021, pp. 507-517.
D. Ramage, A. N. Rafferty, and C. D. Manning, "Random walks for text semantic similarity, " in Proceedings of the 2009 workshop on graphbased methods for natural language processing (TextGraphs-4), 2009, pp. 23-31.
B. B. Cambazoglu, M. Sanderson, F. Scholer, and B. Croft, "A review of public datasets in question answering research, " in ACM SIGIR Forum, vol. 54, no. 2. ACM New York, NY, USA, 2021, pp. 1-23.
J. S. Whissell and C. L. Clarke, "Improving document clustering using okapi bm25 feature weighting, " Information retrieval, vol. 14, no. 5, pp. 466-487, 2011.
J. Risch, T. Möller, J. Gutsch, and M. Pietsch, "Semantic answer similarity for evaluating question answering models, " arXiv preprint arXiv: 2108. 06130, 2021.
A. Sleimi, M. Ceci, N. Sannier, M. Sabetzadeh, L. Briand, and J. Dann, "A query system for extracting requirements-related information from legal texts, " in 27th IEEE International Requirements Engineering Conference. IEEE, 2019.
G. M. Kanchev, P. K. Murukannaiah, A. K. Chopra, and P. Sawyer, "Canary: An interactive and query-based approach to extract requirements from online forums, " in 2017 IEEE 25th International Requirements Engineering Conference. IEEE, 2017, pp. 470-471.
T. Hao, X. Li, Y. He, F. L. Wang, and Y. Qu, "Recent progress in leveraging deep learning methods for question answering, " Neural Computing and Applications, pp. 1-19, 2022.
A. A. Yusuf, F. Chong, and M. Xianling, "An analysis of graph convolutional networks and recent datasets for visual question answering, " Artificial Intelligence Review, pp. 1-24, 2022.
H. Jin, Y. Luo, C. Gao, X. Tang, and P. Yuan, "Comqa: Question answering over knowledge base via semantic matching, " IEEE Access, vol. 7, pp. 75 235-75 246, 2019.
D. Diefenbach, A. Both, K. Singh, and P. Maret, "Towards a question answering system over the semantic web, " Semantic Web, vol. 11, no. 3, pp. 421-439, 2020.
B. Ojokoh and E. Adebisi, "A review of question answering systems, " Journal of Web Engineering, vol. 17, no. 8, pp. 717-758, 2018.
L. Jing, C. Gulcehre, J. Peurifoy, Y. Shen, M. Tegmark, M. Soljacic, and Y. Bengio, "Gated orthogonal recurrent units: On learning to forget, " Neural computation, vol. 31, no. 4, pp. 765-783, 2019.
A. Wulamu, Z. Sun, Y. Xie, C. Xu, and A. Yang, "An improved end-toend memory network for qa tasks, " CMC-COMPUTERS MATERIALS & CONTINUA, vol. 60, no. 3, pp. 1283-1295, 2019.
Q. Ren, X. Cheng, and S. Su, "Multi-task learning with generative adversarial training for multi-passage machine reading comprehension, " in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 05, 2020, pp. 8705-8712.
T. Parshakova, F. Rameau, A. Serdega, I. S. Kweon, and D.-S. Kim, "Latent question interpretation through variational adaptation, " IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 27, no. 11, pp. 1713-1724, 2019.
A. Wang, A. Singh, J. Michael, F. Hill, O. Levy, and S. R. Bowman, "Glue: A multi-task benchmark and analysis platform for natural language understanding, " arXiv preprint arXiv: 1804. 07461, 2018.
V. Kumar, Y. Hua, G. Ramakrishnan, G. Qi, L. Gao, and Y.-F. Li, "Difficulty-controllable multi-hop question generation from knowledge graphs, " in International Semantic Web Conference. Springer, 2019, pp. 382-398.
N. F. Liu, T. Lee, R. Jia, and P. Liang, "Can small and synthetic benchmarks drive modeling innovation? a retrospective study of question answering modeling approaches, " arXiv preprint arXiv: 2102. 01065, 2021.
M. Bartolo, T. Thrush, R. Jia, S. Riedel, P. Stenetorp, and D. Kiela, "Improving question answering model robustness with synthetic adversarial data generation, " arXiv preprint arXiv: 2104. 08678, 2021.
A. D. Lelkes, V. Q. Tran, and C. Yu, "Quiz-style question generation for news stories, " in Proceedings of the Web Conference 2021, 2021, pp. 2501-2511.
S. Gupta, A. Agarwal, M. Gaur, K. Roy, V. Narayanan, P. Kumaraguru, and A. Sheth, "Learning to automate follow-up question generation using process knowledge for depression triage on reddit posts, " arXiv preprint arXiv: 2205. 13884, 2022.