Intent-Based Mutation Testing: From Naturally Written Programming Intents to Mutants

HAMIDI, Asma Sadjida; KHANFIR, Ahmed; PAPADAKIS, Michail

doi:10.1109/ICSTW64639.2025.10962508

Download

Paper published in a book (Scientific congresses, symposiums and conference proceedings)

Intent-Based Mutation Testing: From Naturally Written Programming Intents to Mutants

HAMIDI, Asma Sadjida; KHANFIR, Ahmed; PAPADAKIS, Michail

2025 • In Fasolino, Anna Rita (Ed.) 2025 IEEE International Conference on Software Testing, Verification and Validation Workshops, ICSTW 2025

Peer reviewed

Permalink
https://hdl.handle.net/10993/67328

DOI
10.1109/ICSTW64639.2025.10962508

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

icstcomp25mutation-id205-p-f4cf8980a3-164055-preprint.pdf

Author postprint (452.44 kB)

Download

All documents in ORBilu are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

Mutation testing; Program behavior; Program specification; Software testing; Large language models

Abstract :

[en] This paper presents intent-based mutation testing, a testing approach that generates mutations by changing the programming intents that are implemented in the programs under test. In contrast to traditional mutation testing, which changes (mutates) the way programs are written, intent mutation changes (mutates) the behavior of the programs by producing mutations that implement (slightly) different intents than those implemented in the original program. The mutations of the programming intents represent possible corner cases and misunderstandings of the program behavior, i.e., program specifications, and thus can capture different classes of faults than traditional (syntax-based) mutation. Moreover, since programming intents can be implemented in different ways, intent-based mutation testing can generate diverse and complex mutations that are close to the original programming intents (specifications) and thus direct testing towards the intent variants of the program behavior/specifications. We implement intent-based mutation testing using Large Language Models (LLMs) that mutate programming intents and transform them into mutants. We evaluate intent-based mutation on 29 programs and show that it generates mutations that are syntactically complex, semantically diverse, and quite different (semantically) from the traditional ones. We also show that 55% of the intent-based mutations are not subsumed by traditional mutations. Overall, our analysis shows that intent-based mutation testing can be a powerful complement to traditional (syntax-based) mutation testing.

Research center :

Interdisciplinary Centre for Security, Reliability and Trust (SnT) > SerVal - Security, Reasoning & Validation

Disciplines :

Computer science

Author, co-author :

HAMIDI, Asma Sadjida ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SerVal

KHANFIR, Ahmed ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust > SerVal > Team Yves LE TRAON

PAPADAKIS, Michail ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SerVal

External co-authors :

yes

Language :

English

Title :

Intent-Based Mutation Testing: From Naturally Written Programming Intents to Mutants

Publication date :

31 March 2025

Event name :

2025 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW)

Event place :

Naples, Italy

Event date :

31-03-2025 => 04-04-2025

Main work title :

2025 IEEE International Conference on Software Testing, Verification and Validation Workshops, ICSTW 2025

Editor :

Fasolino, Anna Rita

Publisher :

Institute of Electrical and Electronics Engineers Inc.

ISBN/EAN :

9798331534677

Pages :

347-357

Peer reviewed :

Peer reviewed

Development Goals :

9. Industry, innovation and infrastructure

Additional URL :

http://xplorestaging.ieee.org/ielx8/10962452/10962453/10962508.pdf?arnumber=10962508

Available on ORBilu :

since 17 January 2026

Statistics

Number of views

21 (0 by Unilu)

Number of downloads

10 (0 by Unilu)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

OpenCitations

OpenAlex citations

Bibliography

T. T. Chekam, M. Papadakis, Y. L. Traon, and M. Harman, “An empirical study on mutation, statement and branch coverage fault revelation that avoids the unreliable clean program assumption,” in International Conference on Software Engineering, ICSE, 2017, pp. 597–608.
M. Papadakis, M. Kintis, J. Zhang, Y. Jia, Y. L. Traon, and M. Harman, “Chapter six - mutation testing advances: An analysis and survey,” Advances in Computers, vol. 112, pp. 275–378, 2019.
T. T. Chekam, M. Papadakis, T. F. Bissyandé, Y. L. Traon, and K. Sen, “Selecting fault revealing mutants,” Empirical Software Engineering, vol. 25, no. 1, pp. 434–487, 2020.
S. J. Kaufman, R. Featherman, J. Alvin, B. Kurtz, P. Ammann, and R. Just, “Prioritizing mutants to guide mutation testing,” in International Conference on Software Engineering, 2022, p. 1743–1754.
M. Tufano, J. Kimko, S. Wang, C. Watson, G. Bavota, M. Di Penta, and D. Poshyvanyk, “Deepmutation: A neural mutation tool,” in International Conference on Software Engineering: Companion Proceedings, ser. ICSE, 2020, p. 29–32.
M. Marcozzi, S. Bardin, N. Kosmatov, M. Papadakis, V. Prevosto, and L. Correnson, “Time to clean your test objectives,” in International Conference on Software Engineering, ICSE, 2018, pp. 456–467.
M. Papadakis, Y. Jia, M. Harman, and Y. L. Traon, “Trivial compiler equivalence: A large scale empirical study of a simple, fast and effective equivalent mutant detection technique,” in 37th IEEE/ACM International Conference on Software Engineering, ICSE, 2015, pp. 936–946.
M. Kintis, M. Papadakis, and N. Malevris, “Employing second-order mutation for isolating first-order equivalent mutants,” Softw. Test. Verification Reliab., vol. 25, no. 5-7, pp. 508–535, 2015.
Y. Ma, J. Offutt, and Y. R. Kwon, “Mujava: an automated class mutation system,” Softw. Test. Verification Reliab., vol. 15, no. 2, pp. 97–133, 2005.
T. Laurent, M. Papadakis, M. Kintis, C. Henard, Y. L. Traon, and A. Ventresque, “Assessing and improving the mutation testing practice of pit,” in 2017 IEEE International Conference on Software Testing, Verification and Validation (ICST), March 2017, pp. 430–435.
H. Coles, T. Laurent, C. Henard, M. Papadakis, and A. Ventresque, “Pit: A practical mutation testing tool for java (demo),” in Proceedings of the 25th International Symposium on Software Testing and Analysis, 2016, p. 449–452.
P. Ammann and J. Offutt, Introduction to Software Testing. Cambridge University Press, 2008.
A. J. Offutt, A. Lee, G. Rothermel, R. H. Untch, and C. Zapf, “An experimental determination of sufficient mutant operators,” ACM Trans. Softw. Eng. Methodol., vol. 5, no. 2, pp. 99–118, 1996.
M. Kintis, M. Papadakis, A. Papadopoulos, E. Valvis, N. Malevris, and Y. L. Traon, “How effective are mutation testing tools? an empirical analysis of java mutation testing tools with manual analysis and real faults,” Empir. Softw. Eng., vol. 23, no. 4, pp. 2426–2463, 2018.
Z. Tian, J. Chen, Q. Zhu, J. Yang, and L. Zhang, “Learning to construct better mutation faults,” in Proceedings of the International Conference on Automated Software Engineering, 2022, pp. 1–13.
J. Patra and M. Pradel, “Semantic bug seeding: A learning-based approach for creating realistic bugs,” in ESEC/FSE Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2021, p. 906–918.
A. Khanfir, A. Koyuncu, M. Papadakis, M. Cordy, T. F. Bissyandé, J. Klein, and Y. Le Traon, “Ibir: Bug report driven fault injection,” ACM Trans. Softw. Eng. Methodol., may 2022.
K. Herzig and A. Zeller, “Untangling changes,” Unpublished manuscript, September, vol. 37, pp. 38–40, 2011.
R. Degiovanni and M. Papadakis, “µbert: Mutation testing using pre-trained language models,” in 15th IEEE International Conference on Software Testing, Verification and Validation Workshops ICST Workshops, 2022, pp. 160–169.
A. Khanfir, R. Degiovanni, M. Papadakis, and Y. L. Traon, “Efficient mutation testing via pre-trained language models,” arXiv:2301.03543, 2023.
Z. Feng, D. Guo, D. Tang, N. Duan, X. Feng, M. Gong, L. Shou, B. Qin, T. Liu, D. Jiang, and M. Zhou, “Codebert: A pre-trained model for programming and natural languages,” in Conference on Empirical Methods in Natural Language Processing: Findings, EMNLP, 2020, pp. 1536–1547.
M. Ojdanic, A. Garg, A. Khanfir, R. Degiovanni, M. Papadakis, and Y. Le Traon, “Syntactic versus semantic similarity of artificial and real faults in mutation testing studies,” IEEE Transactions on Software Engineering, vol. 49, no. 7, pp. 3922–3938, 2023.
M. Ojdanic, A. Khanfir, A. Garg, R. Degiovanni, M. Papadakis, and Y. Le Traon, “On comparing mutation testing tools through learning-based mutant selection,” in 2023 IEEE/ACM International Conference on Automation of Software Test (AST). IEEE, 2023, pp. 35–46.
“Github copilot,” https://github.com/features/copilot.
M. Chen, J. Tworek, H. Jun, Q. Yuan, H. P. de Oliveira Pinto, J. Kaplan, H. Edwards, Y. Burda, N. Joseph, G. Brockman et al., “Evaluating large language models trained on code.(2021),” arXiv:2107.03374, 2021.
“Amazon codewhisperer,” https://aws.amazon.com/codewhisperer/.
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pretraining of deep bidirectional transformers for language understanding,” arXiv:1810.04805, 2018.
M. Papadakis, T. T. Chekam, and Y. L. Traon, “Mutant quality indicators,” in 2018 IEEE International Conference on Software Testing, Verification and Validation Workshops, 2018, pp. 32–39.
M. Chen, J. Tworek, H. Jun, Q. Yuan, H. P. D. O. Pinto, J. Kaplan, H. Edwards, Y. Burda, N. Joseph, G. Brockman et al., “Evaluating large language models trained on code,” arXiv:2107.03374, 2021.
Q. Zheng, X. Xia, X. Zou, Y. Dong, S. Wang, Y. Xue, L. Shen, Z. Wang, A. Wang, Y. Li et al., “Codegeex: A pre-trained model for code generation with multilingual benchmarking on humaneval-x,” in Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2023, pp. 5673–5684.
J. Liu, C. S. Xia, Y. Wang, and L. Zhang, “Is your code generated by chatgpt really correct? rigorous evaluation of large language models for code generation,” Advances in Neural Information Processing Systems, vol. 36, 2024.
B. Kurtz, P. Ammann, M. E. Delamaro, J. Offutt, and L. Deng, “Mutant subsumption graphs,” in International Conference on Software Testing, Verification, and Validation Workshops ICSTW, 2014, p. 176–185.