Variable Renaming-Based Adversarial Test Generation for Code Model: Benchmark and Enhancement

[en] Robustness testing is essential for evaluating deep learning models, particularly under unforeseen circumstances. Adversarial test generation, a fundamental approach in robustness testing, is prevalent in computer vision and natural language processing, and it has gained considerable attention in code tasks recently. The Variable Renaming-Based Adversarial Test Generation (VRTG), which deceives models by altering variable names, is a key focus. VRTG involves substitution construction and variable name searching, but its systematic design remains a challenge due to the empirical nature of these components. This paper introduces the first benchmark to examine the impact of various substitutions and search algorithms on VRTG effectiveness, exploring improvements for existing VRTGs. Our benchmark includes three substitution construction types, six substitution position rank ways and seven search algorithms. Analysis of four code understanding tasks and three pre-trained code models using our benchmark reveals that combining RNNS and Genetic Algorithm with code-based substitution is more effective for VRTG construction. Notably, this method outperforms the advanced black-box variable renaming test generation technique, ALERT, by up to 22.57%.

Disciplines :

Computer science

Author, co-author :

WEN, Jin ; University of Luxembourg

HU, Qiang ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust > SerVal > Team Yves LE TRAON ; Tianjin University, China

GUO, Yuejun ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust > SerVal > Team Yves LE TRAON ; Luxembourg Institute of Science and Technology, Luxembourg

CORDY, Maxime ; University of Luxembourg

Le Traon, Yves ; University of Luxembourg, Luxembourg

External co-authors :

yes

Language :

English

Title :

Variable Renaming-Based Adversarial Test Generation for Code Model: Benchmark and Enhancement

Publication date :

14 March 2025

Journal title :

ACM Transactions on Software Engineering and Methodology

ISSN :

1049-331X

Publisher :

Association for Computing Machinery (ACM)

Peer reviewed :

Peer Reviewed verified by ORBi

Additional URL :

https://dl.acm.org/doi/pdf/10.1145/3723353

Available on ORBilu :

since 31 March 2025

Statistics

Number of views

150 (19 by Unilu)

Number of downloads

384 (9 by Unilu)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

OpenCitations

OpenAlex citations

Bibliography

Miltiadis Allamanis, Earl T. Barr, Premkumar Devanbu, and Charles Sutton. 2018. A survey of machine learning for big code and naturalness. ACM Computing Surveys 51, 4, Article 81 (Jul. 2018), 37 pages. DOI: 10.1145/3212695
Bander Alsulami, Edwin Dauber, Richard Harang, Spiros Mancoridis, and Rachel Greenstadt. 2017. Source code authorship attribution using long short-term memory based networks. In Computer Security-ESORICS 2017. Simon N. Foley, Dieter Gollmann, and Einar Snekkenes (Eds.), Springer International Publishing, Cham, 65–82.
Maksym Andriushchenko, Francesco Croce, Nicolas Flammarion, and Matthias Hein. 2020. Square attack: A query-efficient black-box adversarial attack via random search. In European Conference on Computer Vision. Springer Science+Business Media, Germany, 484–501.
AtCoder Inc. 2017. Problem statement: B - shift only. Accessed December 30, 2024 from https://atcoder.jp/contests/abc081/tasks/abc081_b
AtCoder Inc. 2018. Problem statement: C - *3 or /2. Accessed December 30, 2024 from https://atcoder.jp/contests/abc100/tasks/abc100_c
Berkay Berabi, Jingxuan He, Veselin Raychev, and Martin Vechev. 2021. TFix: learning to fix coding errors with a text-to-text transformer. In Proceedings of the 38th International Conference on Machine Learning. Marina Meila and Tong Zhang (Eds.), Proceedings of Machine Learning Research, Vol. 139, PMLR, 780–791. Retrieved from https://proceedings.mlr.press/v139/berabi21a.html
Arjun Nitin Bhagoji, Warren He, Bo Li, and Dawn Song. 2018. Practical black-box attacks on deep neural networks using efficient query mechanisms. In Proceedings of the European Conference on Computer Vision. Springer Science+Business Media, Germany.
Pavol Bielik and Martin Vechev. 2020. Adversarial robustness for code. In Proceedings of the 37th International Conference on Machine Learning (ICML’20). JMLR.org, Article 84, 12 pages.
Wieland Brendel, Jonas Rauber, and Matthias Bethge. 2018. Decision-based adversarial attacks: Reliable attacks against black-box machine learning models. In Proceedings of the International Conference on Learning Representations. OpenReview.net.
Nicholas Carlini and David Wagner. 2017. Towards evaluating the robustness of neural networks. In Proceedings of the IEEE Symposium on Security and Privacy (SP). IEEE Computer Society, Los Alamitos, CA, 39–57. DOI: 10.1109/SP.2017.49
Boyuan Chen, Mingzhi Wen, Yong Shi, Dayi Lin, Gopi Krishnan Rajbahadur, and Zhen Ming Jiang. 2022. Towards training reproducible deep learning models. In Proceedings of the 44th International Conference on Software Engineering (ICSE ’22). ACM, New York, NY, 2202–2214. DOI: 10.1145/3510003.3510163
Francesco Croce and Matthias Hein. 2020. Minimally distorted adversarial examples with a fast adaptive boundary attack. In Proceedings of the International Conference on Machine Learning. Hal Daumé III and Aarti Singh (Eds.), Proceedings of Machine Learning Research, Vol. 119, PMLR, 2196–2205. Retrieved from https://proceedings.mlr.press/v119/croce20a.html
Zeming Dong, Qiang Hu, Xiaofei Xie, Maxime Cordy, Mike Papadakis, and Jianjun Zhao. 2024. Importance guided data augmentation for neural-based code understanding. arXiv:2402.15769. Retrieved from http://arxiv.org/abs/2402.15769
Javid Ebrahimi, Anyi Rao, Daniel Lowd, and Dejing Dou. 2018. HotFlip: White-box adversarial examples for text classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL’18). Iryna Gurevych and Yusuke Miyao (Eds.), Short Papers, Vol. 2, Association for Computational Linguistics, 31–36. DOI: 10.18653/v1/P18-2006
Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, et al. 2020. CodeBERT: A pre-trained model for programming and natural languages. In Findings of the Association for Computational Linguistics (EMNLP ’20). Association for Computational Linguistics, 1536–1547. DOI: 10.18653/v1/2020.findings-emnlp.139
GitHub, Inc. 2021. GitHub Copilot Your AI pair programmer. Accessed December 30, 2024 from https://github.com/features/copilot
Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. 2015. Explaining and harnessing adversarial examples. arXiv:1412.6572. Retrieved from https://arxiv.org/abs/1412.6572
Xiaodong Gu, Hongyu Zhang, and Sunghun Kim. 2018. Deep code search. In Proceedings of the 40th International Conference on Software Engineering (ICSE ’18). ACM, New York, NY, 933–944. DOI: 10.1145/3180155.3180167
Daya Guo, Shuo Ren, Shuai Lu, Zhangyin Feng, Duyu Tang, Shujie Liu, Long Zhou, Nan Duan, Alexey Svyatkovskiy, Shengyu Fu, et al. 2021. GraphCodeBERT: Pre-training code representations with data flow. In Proceedings of the International Conference on Learning Representations. OpenReview.net.
Qiang Hu, Yuejun Guo, Maxime Cordy, Xiaofei Xie, Lei Ma, Mike Papadakis, and Yves Le Traon. 2022. An empirical study on data distribution-aware test selection for deep learning enhancement. ACM Transactions on Software Engineering and Methodology 31, 4, Article 78 (Jul. 2022), 30 pages. DOI: 10.1145/3511598
Qiang Hu, Yuejun Guo, Xiaofei Xie, Maxime Cordy, Mike Papadakis, Lei Ma, and Yves Le Traon. 2023. CodeS: Towards code model generalization under distribution shift. In Proceedings of the IEEE/ACM 45th International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER). IEEE, 1–6. DOI: 10.1109/ICSE-NIER58687.2023.00007
Yujin Huang, Terry Yue Zhuo, Qiongkai Xu, Han Hu, Xingliang Yuan, and Chunyang Chen. 2023. Training-free lexical backdoor attacks on language models. In Proceedings of the ACM Web Conference 2023 (WWW ’23). Ying Ding, Jie Tang, Juan F. Sequeda, Lora Aroyo, Carlos Castillo, and Geert-Jan Houben (Eds.), ACM, New York, NY, 2198–2208. DOI: 10.1145/3543507.3583348
huggingface. 2016. Hugging Face-The AI community building the future. Accessed December 30, 2024 from https://huggingface.coAccessed
Akshita Jha and Chandan K. Reddy. 2023. CodeAttack: Code-based adversarial attacks for pre-trained programming language models. In Proceedings of the 37th AAAI Conference on Artificial Intelligence and 35th Conference on Innovative Applications of Artificial Intelligence and 13th Symposium on Educational Advances in Artificial Intelligence (AAAI ’23/IAAI ’23/EAAI ’23). Brian Williams, Yiling Chen, and Jennifer Neville (Eds.), AAAI Press, Article 1670, 9 pages. DOI: 10.1609/AAAI.V37I12.26739
Di Jin, Zhijing Jin, Joey Tianyi Zhou, and Peter Szolovits. 2020. Is Bert really robust? A strong baseline for natural language attack on text classification and entailment. In Proceedings of the AAAI Conference on Artificial Intelligence 34, 5 (Apr. 2020), 8018–8025. DOI: 10.1609/aaai.v34i05.6311
Pang Wei Koh, Shiori Sagawa, Henrik Marklund, Sang Michael Xie, Marvin Zhang, Akshay Balsubramani, Weihua Hu, Michihiro Yasunaga, Richard Lanas Phillips, and Irena Gao. 2021. WILDS: A benchmark of in-the-wild distribution shifts. In Proceedings of the International Conference on Machine Learning (ICML).
Alexey Kurakin, Ian J. Goodfellow, and Samy Bengio. 2017. Adversarial examples in the physical world. In ICLR Workshop.
Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. 2018. Towards deep learning models resistant to adversarial attacks. In Proceedings of the 6th International Conference on Learning Representations (ICLR’18). OpenReview.net. Retrieved from https://openreview.net/forum?id=rJzIBfZAb
Changan Niu, Chuanyi Li, Bin Luo, and Vincent Ng. 2022. Deep learning meets software engineering: a survey on pre-trained models of source code. In Proceedings of the 31st International Joint Conference on Artificial Intelligence (IJCAI ’22). Lud De Raedt (Ed.), International Joint Conferences on Artificial Intelligence Organization, 5546–5555. DOI: 10.24963/ijcai.2022/775
Long Phan, Hieu Tran, Daniel Le, Hieu Nguyen, James Anibal, Alec Peltekian, and Yanfang Ye. 2021. CoTexT: Multi-task learning with code-text transformer. In Proceedings of the 1st Workshop on Natural Language Processing for Programming (NLP4Prog ’21). Royi Lachmy, Ziyu Yao, Greg Durrett, Milos Gligoric, Junyi Jessy Li, Ray Mooney, Graham Neubig, Yu Su, Huan Sun, and Reut Tsarfaty (Eds.), Association for Computational Linguistics, Online, 40–47. DOI: 10.18653/v1/2021.nlp4prog-1.5
Ruchir Puri, David S. Kung, Geert Janssen, Wei Zhang, Giacomo Domeniconi, Vladimir Zolotov, Julian Dolby, Jie Chen, Mihir Choudhury, Lindsey Decker, et al. 2021. CodeNet: A large-scale AI for code dataset for learning a diversity of coding tasks. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks. Retrieved from https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/hash/a5bfc9e07964f8dddeb95fc584cd965d-Abstract-round2.html
Shuhuai Ren, Yihe Deng, Kun He, and Wanxiang Che. 2019. Generating natural language adversarial examples through probability weighted word saliency. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 1085–1097. DOI: 10.18653/v1/P19-1103
Vincenzo Riccio, Gunel Jahangirova, Andrea Stocco, Nargiz Humbatova, Michael Weiss, and Paolo Tonella. 2020. Testing machine learning based systems: A systematic mapping. Empirical Software Engineering 25, 6 (2020), 5193–5254. DOI: 10.1007/S10664-020-09881-0
Tran Van Sang, Tran Phuong Thao, Rie Shigetomi Yamaguchi, and Toshiyuki Nakata. 2022. Enhancing boundary attack in adversarial image using square random constraint. In Proceedings of the 2022 ACM on International Workshop on Security and Privacy Analytics (IWSPA ’22). ACM, New York, NY, 13–23. DOI: 10.1145/3510548.3519373
Shashank Srikant, Sijia Liu, Tamara Mitrovska, Shiyu Chang, Quanfu Fan, Gaoyuan Zhang, and Una-May O’Reilly. 2021. Generating adversarial computer programs using optimized obfuscations. In Proceedings of the International Conference on Learning Representations. Retrieved from https://openreview.net/forum?id=PH5PH9ZO_4
Jeffrey Svajlenko, Judith F. Islam, Iman Keivanloo, Chanchal Kumar Roy, and Mohammad Mamun Mia. 2014. Towards a big data curated benchmark of inter-project code clones. In Proceedings of the 30th IEEE International Conference on Software Maintenance and Evolution. IEEE, 476–480. DOI: 10.1109/ICSME.2014.77
Florian Tramèr, Nicholas Carlini, Wieland Brendel, and Aleksander Madry. 2020. On adaptive attacks to adversarial example defenses. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020 (NeurIPS ’20). Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (Eds.). Retrieved from https://proceedings.neurips.cc/paper/2020/hash/11f38f8ecd71867b42433548d1078e38-Abstract.html
Faqeer ur Rehman and Madhusudan Srinivasan. 2023. Metamorphic testing for machine learning: applicability, challenges, and research opportunities. In Proceedings of the 2023 IEEE International Conference on Artificial Intelligence Testing (AITest). IEEE, 34–39. DOI: 10.1109/AITEST58265.2023.00014
Vivian van der Werf, Alaaeddin Swidan, Felienne Hermans, Marcus Specht, and Efthimia Aivaloglou. 2024. Teachers’ beliefs and practices on the naming of variables in introductory Python programming courses. In Proceedings of the 46th International Conference on Software Engineering: Software Engineering Education and Training (SEET@ICSE ’24). ACM, 368–379. DOI: 10.1145/3639474.3640069
Yue Wang, Weishi Wang, Shafiq R. Joty, and Steven C. H. Hoi. 2021. CodeT5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (Eds.), Association for Computational Linguistics, 8696–8708. DOI: 10.18653/v1/2021.emnlp-main.685
Zhilong Wang, Lan Zhang, Chen Cao, Nanqing Luo, Xinzhi Luo, and Peng Liu. 2024. How does naming affect language models on code analysis tasks? Journal of Software Engineering and Applications 17, 11 (2024), 803–816. 10.4236/jsea.2024.1711044
Jin Wen. 2023. Black-box variable renaming attack for code model: Benchmark and enhancement. Retrieved December 30, 2024 from https://sites.google.com/view/variable-attack-for-code-model
Martin White, Michele Tufano, Christopher Vendome, and Denys Poshyvanyk. 2016. Deep learning code fragments for code clone detection. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering (ASE ’16). ACM, New York, NY, 87–98. DOI: 10.1145/2970276.2970326
Zhou Yang, Jieke Shi, Junda He, and David Lo. 2022. Natural attack for pre-trained models of code. In Proceedings of the 44th International Conference on Software Engineering (ICSE ’22). ACM, New York, NY, 1482–1493. DOI: 10.1145/3510003.3510146
Noam Yefet, Uri Alon, and Eran Yahav. 2020. Adversarial examples for models of code. Proceedings of the ACM Programming Languages 4, OOPSLA, Article 162 (Nov. 2020), 30 pages. DOI: 10.1145/3428230
Jin Yong Yoo and Yanjun Qi. 2021. Towards improving adversarial training of NLP models. In Findings of the Association for Computational Linguistics (EMNLP ’21). Association for Computational Linguistics, Punta Cana, Dominican Republic, 945–956. DOI: 10.18653/v1/2021.findings-emnlp.81
Huangzhao Zhang, Zhuo Li, Ge Li, Lei Ma, Yang Liu, and Zhi Jin. 2020. Generating adversarial examples for holding robustness of source code processing models. In Proceedings of the AAAI Conference on Artificial Intelligence, 1169–1176. DOI: 10.1609/aaai.v34i01.5469
Jie Zhang, Wei Ma, Qiang Hu, Shangqing Liu, Xiaofei Xie, Yves Le Traon, and Yang Liu. 2023. A black-box attack on code models via representation nearest neighbor search. In Findings of the Association for Computational Linguistics (EMNLP ’23).Houda Bouamor, Juan Pino, and Kalika Bali (Eds.), Association for Computational Linguistics, Singapore, 9706–9716. DOI: 10.18653/V1/2023.FINDINGS-EMNLP.649
Jie M. Zhang, Mark Harman, Lei Ma, and Yang Liu. 2022. Machine learning testing: Survey, landscapes and horizons. IEEE Transactions on Software Engineering 48, 2 (2022), 1–36. DOI: 10.1109/TSE.2019.2962027
Yaqin Zhou, Shangqing Liu, Jingkai Siow, Xiaoning Du, and Yang Liu. 2019. Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks. In Proceedings of the 33rd International Conference on Neural Information Processing Systems. Curran Associates Inc., Red Hook, NY, Article 915, 11 pages.