[en] Robustness testing is essential for evaluating deep learning models, particularly under unforeseen circumstances. Adversarial test generation, a fundamental approach in robustness testing, is prevalent in computer vision and natural language processing, and it has gained considerable attention in code tasks recently. The Variable Renaming-Based Adversarial Test Generation (VRTG), which deceives models by altering variable names, is a key focus. VRTG involves substitution construction and variable name searching, but its systematic design remains a challenge due to the empirical nature of these components. This paper introduces the first benchmark to examine the impact of various substitutions and search algorithms on VRTG effectiveness, exploring improvements for existing VRTGs. Our benchmark includes three substitution construction types, six substitution position rank ways and seven search algorithms. Analysis of four code understanding tasks and three pre-trained code models using our benchmark reveals that combining RNNS and Genetic Algorithm with code-based substitution is more effective for VRTG construction. Notably, this method outperforms the advanced black-box variable renaming test generation technique, ALERT, by up to 22.57%.
HU, Qiang ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust > SerVal > Team Yves LE TRAON ; Tianjin University, China
GUO, Yuejun ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust > SerVal > Team Yves LE TRAON ; Luxembourg Institute of Science and Technology, Luxembourg