On Comparing Mutation Testing Tools through Learning-based Mutant Selection

OJDANIC, Milos; KHANFIR, Ahmed; GARG, Aayush; DEGIOVANNI, Renzo Gaston; PAPADAKIS, Mike; LE TRAON, Yves

doi:10.1109/AST58925.2023.00008

Download

Paper published in a book (Scientific congresses, symposiums and conference proceedings)

On Comparing Mutation Testing Tools through Learning-based Mutant Selection

OJDANIC, Milos; KHANFIR, Ahmed; GARG, Aayush et al.

2023 • In On Comparing Mutation Testing Tools through Learning-based Mutant Selection

Peer reviewed

Permalink
https://hdl.handle.net/10993/55802

DOI
10.1109/AST58925.2023.00008

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

On_Comparing_Mutation_Testing_Tools_through_Learning-based_Mutant_Selection.pdf

Publisher postprint (631.14 kB)

Download

All documents in ORBilu are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

Software Testing; Fault Seeding; Mutation Testing; Empirical Study; Empirical Comparison

Abstract :

[en] Recently many mutation testing tools have been proposed that rely on bug-fix patterns and natural language models trained on large code corpus. As these tools operate fundamentally differently from the grammar-based traditional approaches, a question arises of how these tools compare in terms of 1) fault detection and 2) cost-effectiveness. Simultaneously, mutation testing research proposes mutant selection approaches based on machine learning to mitigate its application cost. This raises another question: How do the existing mutation testing tools compare when guided by mutant selection approaches? To answer these questions, we compare four existing tools – μBERT (uses pre-trained language model for fault seeding), IBIR (relies on inverted fix-patterns), DeepMutation (generates mutants by employing Neural Machine Translation) and PIT (ap- plies standard grammar-based rules) in terms of fault detection capability and cost-effectiveness, in conjunction with standard and deep learning based mutant selection strategies. Our results show that IBIR has the highest fault detection capability among the four tools; however, it is not the most cost-effective when considering different selection strategies. On the other hand, μBERT having a relatively lower fault detection capability, is the most cost-effective among the four tools. Our results also indicate that comparing mutation testing tools when using deep learning-based mutant selection strategies can lead to different conclusions than the standard mutant selection. For instance, our results demonstrate that combining μBERT with deep learning- based mutant selection yields 12% higher fault detection than the considered tools.

Research center :

Interdisciplinary Centre for Security, Reliability and Trust (SnT) > SerVal - Security, Reasoning & Validation

Disciplines :

Computer science

Author, co-author :

OJDANIC, Milos ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SerVal

KHANFIR, Ahmed ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SerVal

GARG, Aayush ; University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)

DEGIOVANNI, Renzo Gaston ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SerVal

PAPADAKIS, Mike ; University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)

LE TRAON, Yves ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SerVal

External co-authors :

Language :

English

Title :

On Comparing Mutation Testing Tools through Learning-based Mutant Selection

Publication date :

2023

Event name :

4th ACM/IEEE International Conference on Automation of Software Test (AST 2023)

Event date :

From Mon 15 - Tue 16 May 2023 Melbourne, Australia

Audience :

International

Main work title :

On Comparing Mutation Testing Tools through Learning-based Mutant Selection

Pages :

Peer reviewed :

Peer reviewed

Additional URL :

https://ieeexplore.ieee.org/document/10173980

FnR Project :

FNR13646587 - Risk Analysis Of Software Requirements Specification, 2019 (01/07/2020-30/06/2023) - Michail Papadakis

Available on ORBilu :

since 19 August 2023

Statistics

Number of views

251 (6 by Unilu)

Number of downloads

270 (10 by Unilu)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

OpenAlex citations

Bibliography

Deepmutation. https://github. com/micheletufano/DeepMutation.
Defects4j issue-353. https://github. com/rjust/defects4j/issues/353.
Master branch deepmutation. https://github. com/micheletufano/DeepMutation/commit/a20882d8fbd107762e2d40f5742d838242dbf1e5.
src2abs. https://github. com/micheletufano/src2abs.
Paul Ammann, Marcio Eduardo Delamaro, and Jeff Offutt. Establishing theoretical minimal sets of mutants. In 2014 IEEE seventh international conference on software testing, verification and validation, pages 21-30. IEEE, 2014.
Paul Ammann and Jeff Offutt. Introduction to software testing. Cambridge University Press, 2016.
J. H. Andrews, L. C. Briand, Y. Labiche, and A. S. Namin. Using mutation analysis for assessing and comparing testing coverage criteria. IEEE Transactions on Software Engineering, 32 (8): 608-624, 2006.
James H. Andrews, Lionel C. Briand, and Yvan Labiche. Is mutation an appropriate tool for testing experiments? In Gruia-Catalin Roman, William G. Griswold, and Bashar Nuseibeh, editors, 27th International Conference on Software Engineering (ICSE 2005), 15-21 May 2005, St. Louis, Missouri, USA, pages 402-411. ACM, 2005.
Andrea Arcuri and Lionel Briand. A practical guide for using statistical tests to assess randomized algorithms in software engineering. In Proceedings of the 33rd International Conference on Software Engineering, ICSE '11, page 1-10, New York, NY, USA, 2011. Association for Computing Machinery.
Jean Arlat, Alain Costes, Yves Crouzet, Jean-Claude Laprie, and David Powell. Fault injection and dependability evaluation of fault-tolerant systems. IEEE Trans. Computers, 42 (8): 913-923, 1993.
Moritz Beller, Chu-Pan Wong, Johannes Bader, Andrew Scott, Mateusz Machalica, Satish Chandra, and Erik Meijer. What it would take to use mutation testing in industry-a study at facebook, 2021.
Marcel Böhme and Abhik Roychoudhury. Corebench: studying complexity of regression errors. In Corina S. Pasareanu and Darko Marinov, editors, International Symposium on Software Testing and Analysis, ISSTA '14, San Jose, CA, USA-July 21-26, 2014, pages 105-115. ACM, 2014.
Denny Britz, Anna Goldie, Minh-Thang Luong, and Quoc Le. Massive exploration of neural machine translation architectures. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 1442-1451, Copenhagen, Denmark, September 2017. Association for Computational Linguistics.
David Bingham Brown, Michael Vaughn, Ben Liblit, and Thomas Reps. The care and feeding of wild-caught mutants. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2017, page 511-522. Association for Computing Machinery, 2017.
Thierry Titcheu Chekam, Mike Papadakis, Tegawendé F. Bissyandé, Yves Le Traon, and Koushik Sen. Selecting fault revealing mutants. Empir. Softw. Eng., 25 (1): 434-487, 2020.
Thierry Titcheu Chekam, Mike Papadakis, Yves Le Traon, and Mark Harman. An empirical study on mutation, statement and branch coverage fault revelation that avoids the unreliable clean program assumption. In Sebastián Uchitel, Alessandro Orso, and Martin P. Robillard, editors, Proceedings of the 39th International Conference on Software Engineering, ICSE 2017, Buenos Aires, Argentina, May 20-28, 2017, pages 597-608. IEEE/ACM, 2017.
Jörgen Christmansson and Ram Chillarege. Generation of error set that emulates software faults based on field data. In Digest of Papers: FTCS-26, The Twenty-Sixth Annual International Symposium on Fault-Tolerant Computing, 1996, pages 304-313. IEEE Computer Society, 1996.
Henryy Coles, Thomas Laurent, Christopher Henard, Mike Papadakis, and Anthony Ventresque. PIT: A practical mutation testing tool for java (demo). In Andreas Zeller and Abhik Roychoudhury, editors, Proceedings of the 25th International Symposium on Software Testing and Analysis, ISSTA 2016, Saarbrücken, Germany, July 18-20, 2016, pages 449-452. ACM, 2016.
Renzo Degiovanni and Mike Papadakis. BERT: Mutation testing using pre-trained language models. In Mutation Workshop at ICST. IEEE, 2022.
Richard Demillo, R. J. Lipton, and F. G. Sayward. Hints on test data selection: Help for the practicing programmer. Computer, 11: 34-41, 05 1978.
Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, and Ming Zhou. Codebert: A pre-trained model for programming and natural languages. In Trevor Cohn, Yulan He, and Yang Liu, editors, Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings, EMNLP 2020, Online Event, 16-20 November 2020, volume EMNLP 2020 of Findings of ACL, pages 1536-1547. Association for Computational Linguistics, 2020.
Aayush Garg, Renzo Degiovanni, Matthieu Jimenez, Maxime Cordy, Mike Papadakis, and Yves Le Traon. Learning from what we know: How to perform vulnerability prediction using noisy historical data. Empir. Softw. Eng., 27 (7): 169, 2022.
Aayush Garg, Milos Ojdanic, Renzo Degiovanni, Thierry Titcheu Chekam, Mike Papadakis, and Yves Le Traon. Cerebro: Static subsuming mutant selection. IEEE Transactions on Software Engineering, pages 1-1, 2022.
Rahul Gopinath, Carlos Jensen, and Alex Groce. Mutations: How close are they to real faults? In Proceedings of the 2014 IEEE 25th International Symposium on Software Reliability Engineering, ISSRE '14, page 189-200, USA, 2014. IEEE Computer Society.
Yue Jia and Mark Harman. Higher order mutation testing. Information and Software Technology, 51 (10): 1379-1393, 2009. Source Code Analysis and Manipulation, SCAM 2008.
René Just. The major mutation framework: Efficient and scalable mutation analysis for java. In Proceedings of the 2014 international symposium on software testing and analysis, pages 433-436, 2014.
René Just, Darioush Jalali, and Michael D. Ernst. Defects4J: A database of existing faults to enable controlled testing studies for Java programs. In Proceedings of the 2014 International Symposium on Software Testing and Analysis (ISSTA), pages 437-440, 2014.
Samuel J Kaufman, Ryan Featherman, Justin Alvin, Bob Kurtz, Paul Ammann, and René Just. Prioritizing mutants to guide mutation testing. In Proceedings of the 44th International Conference on Software Engineering, pages 1743-1754, 2022.
Ahmed Khanfir, Anil Koyuncu, Mike Papadakis, Maxime Cordy, Tegawende F. Bissyandé, Jacques Klein, and Yves Le Traon. Ibir: Bug report driven fault injection. ACM Trans. Softw. Eng. Methodol., may 2022.
M. Kintis, M. Papadakis, and N. Malevris. Evaluating mutation testing alternatives: A collateral experiment. In 2010 Asia Pacific Software Engineering Conference, pages 300-309, 2010.
Marinos Kintis, Mike Papadakis, Andreas Papadopoulos, Evangelos Valvis, and Nicos Malevris. Analysing and comparing the effectiveness of mutation testing tools: A manual study. In 2016 IEEE 16th International Working Conference on Source Code Analysis and Manipulation (SCAM), pages 147-156. IEEE, 2016.
Marinos Kintis, Mike Papadakis, Andreas Papadopoulos, Evangelos Valvis, Nicos Malevris, and Yves Le Traon. How effective are mutation testing tools? an empirical analysis of java mutation testing tools with manual analysis and real faults. Empir. Softw. Eng., 23 (4): 2426-2463, 2018.
B. Kurtz, P. Ammann, M. E. Delamaro, J. Offutt, and L. Deng. Mutant subsumption graphs. In 2014 IEEE Seventh International Conference on Software Testing, Verification and Validation Workshops, pages 176-185, 2014.
Bob Kurtz, Paul Ammann, Jeff Offutt, Márcio Eduardo Delamaro, Mariet Kurtz, and Nida Gökçe. Analyzing the validity of selective mutation with dominator mutants. In Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2016, Seattle, WA, USA, November 13-18, 2016, pages 571-582, 2016.
T. Laurent, M. Papadakis, M. Kintis, C. Henard, Y. L. Traon, and A. Ventresque. Assessing and improving the mutation testing practice of pit. In 2017 IEEE International Conference on Software Testing, Verification and Validation (ICST), pages 430-435, March 2017.
Yiling Lou, Ali Ghanbari, Xia Li, Lingming Zhang, Haotian Zhang, Dan Hao, and Lu Zhang. Can automated program repair refine fault localization? a unified debugging approach. In ISSTA '20: 29th ACM SIGSOFT International Symposium on Software Testing and Analysis, Virtual Event, USA, July 18-22, 2020, pages 75-87. ACM, 2020.
Yu-Seung Ma, Jeff Offutt, and Yong Rae Kwon. Mujava: An automated class mutation system. Softw. Test. Verification Reliab., 15 (2): 97-133, 2005.
B. W. Matthews. Comparison of the predicted and observed secondary structure of t4 phage lysozyme. Biochimica et Biophysica Acta (BBA)-Protein Structure, 405 (2): 442-451, 1975.
Roberto Natella, Domenico Cotroneo, João Durães, and Henrique Madeira. On fault representativeness of software fault injection. IEEE Trans. Software Eng., 39 (1): 80-96, 2013.
A. Jefferson Offutt. Investigations of the software testing coupling effect. ACM Trans. Softw. Eng. Methodol., 1 (1): 5-20, January 1992.
A Jefferson Offutt and Roland H Untch. Mutation 2000: Uniting the orthogonal. Mutation testing for the new century, pages 34-44, 2001.
Milos Ojdaníc, Wei Ma, Thomas Laurent, Thierry Titcheu Chekam, Anthony Ventresque, and Mike Papadakis. On the use of commitrelevant mutants. Empirical Software Engineering, 27 (5): 1-31, 2022.
Milos Ojdanic, Ezekiel Soremekun, Renzo Degiovanni, Mike Papadakis, and Yves Le Traon. Mutation testing in evolving systems: Studying the relevance of mutants to code evolution. ACM Transactions on Software Engineering and Methodology, 2022.
Mike Papadakis, Thierry Titcheu Chekam, and Yves Le Traon. Mutant quality indicators. In 2018 IEEE International Conference on Software Testing, Verification and Validation Workshops, ICST Workshops, Västeras, Sweden, April 9-13, 2018, pages 32-39. IEEE Computer Society, 2018.
Mike Papadakis, Christopher Henard, Mark Harman, Yue Jia, and Yves Le Traon. Threats to the validity of mutation-based test assessment. In Proceedings of the 25th International Symposium on Software Testing and Analysis, ISSTA 2016, Saarbrücken, Germany, July 18-20, 2016, pages 354-365, 2016.
Mike Papadakis, Marinos Kintis, Jie Zhang, Yue Jia, Yves Le Traon, and Mark Harman. Chapter six-mutation testing advances: An analysis and survey. Advances in Computers, 112: 275-378, 2019.
Mike Papadakis and Yves Le Traon. Metallaxis-fl: mutation-based fault localization. Software Testing, Verification and Reliability, 25 (5-7): 605-628, 2015.
Jibesh Patra and Michael Pradel. Semantic bug seeding: A learningbased approach for creating realistic bugs. ESEC/FSE 2021, page 906-918, New York, NY, USA, 2021. Association for Computing Machinery.
Cedric Richter and Heike Wehrheim. Learning realistic mutations: Bug creation for neural bug detectors. In 2022 IEEE Conference on Software Testing, Verification and Validation (ICST), pages 162-173, 2022.
Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. Sequence to sequence learning with neural networks, 2014.
Michele Tufano, Jason Kimko, Shiya Wang, Cody Watson, Gabriele Bavota, Massimiliano Di Penta, and Denys Poshyvanyk. Deepmutation: A neural mutation tool. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: Companion Proceedings, ICSE '20, page 29-32, New York, NY, USA, 2020. Association for Computing Machinery.
Michele Tufano, Cody Watson, Gabriele Bavota, Massimiliano Di Penta, Martin White, and Denys Poshyvanyk. An empirical study on learning bug-fixing patches in the wild via neural machine translation. ACM Trans. Softw. Eng. Methodol., 28 (4): 19: 1-19: 29, 2019.
Michele Tufano, Cody Watson, Gabriele Bavota, Massimiliano Di Penta, Martin White, and Denys Poshyvanyk. Learning how to mutate source code from bug-fixes, 2019.
András Vargha and Harold D. Delaney. A critique and improvement of the "cl"d common language effect size statistics of mcgraw and wong. Journal of Educational and Behavioral Statistics, 25 (2): 101-132, 2000.