Article (Scientific journals)
Assessing the Robustness of Test Selection Methods for Deep Neural Networks
HU, Qiang; Guo, Yuejun; Xie, Xiaofei et al.
2025In ACM Transactions on Software Engineering and Methodology, 34 (7)
Peer Reviewed verified by ORBi
 

Files


Full Text
2308.01314v1.pdf
Author postprint (1.89 MB)
Download

All documents in ORBilu are protected by a user license.

Send to



Details



Keywords :
deep learning testing; empirical study; fault detection; performance estimation; test selection; Deep learning testing; Empirical studies; Faults detection; Labelings; Neural-networks; Performance estimation; Real-world; Reliability robustness; Selection methods; Test selection; Software
Abstract :
[en] Regularly testing deep learning-powered systems on newly collected data is critical to ensure their reliability, robustness, and efficacy in real-world applications. This process is demanding due to the significant time and human effort required for labeling new data. While test selection methods alleviate manual labor by labeling and evaluating only a subset of data while meeting testing criteria, we observe that such methods with reported promising results are simply evaluated, e.g., testing on original test data. The question arises: are they always reliable? In this article, we explore when and to what extent test selection methods fail. First, we identify potential pitfalls of 11 selection methods based on their construction. Second, we conduct a study to empirically confirm the existence of these pitfalls. Furthermore, we demonstrate how pitfalls can break the reliability of these methods. Concretely, methods for fault detection suffer from data that are: (1) correctly classified but uncertain or (2) misclassified but confident. Remarkably, the test relative coverage achieved by such methods drops by up to 86.85%. Besides, methods for performance estimation are sensitive to the choice of intermediate-layer output. The effectiveness of such methods can be even worse than random selection when using an inappropriate layer.
Disciplines :
Computer science
Author, co-author :
HU, Qiang  ;  University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust > SerVal > Team Yves LE TRAON ; Tianjin University, Tianjin, China
Guo, Yuejun ;  ITIS, Luxembourg Institute of Science and Technology, Esch-sur-Alzette, Luxembourg
Xie, Xiaofei ;  Singapore Management University, Singapore, Singapore
CORDY, Maxime  ;  University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SerVal
Ma, Wei ;  Singapore Management University, Singapore, Singapore
PAPADAKIS, Michail  ;  University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SerVal
Ma, Lei ;  The University of Tokyo, Tokyo, Japan ; University of Alberta, Edmonton, Canada
LE TRAON, Yves  ;  University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)
External co-authors :
yes
Language :
English
Title :
Assessing the Robustness of Test Selection Methods for Deep Neural Networks
Publication date :
14 August 2025
Journal title :
ACM Transactions on Software Engineering and Methodology
ISSN :
1049-331X
Publisher :
Association for Computing Machinery
Volume :
34
Issue :
7
Peer reviewed :
Peer Reviewed verified by ORBi
Funding text :
Yuejun Guo is funded by the European Union\u2019s Horizon Research and Innovation Programme, as part of the project LAZARUS (Grant Agreement no. 101070303). The content of this article does not reflect the official opinion of the European Union. Responsibility for the information and views expressed therein lies entirely with the authors.
Available on ORBilu :
since 06 February 2026

Statistics


Number of views
2 (0 by Unilu)
Number of downloads
1 (0 by Unilu)

Scopus citations®
 
2
Scopus citations®
without self-citations
2
OpenCitations
 
0
OpenAlex citations
 
4
WoS citations
 
2

Bibliography


Similar publications



Contact ORBilu