Article (Scientific journals)
Black-Box Testing of Deep Neural Networks through Test Case Diversity
Aghababaeyan, Zohreh; Abdellatif, Manel; Briand, Lionel et al.
2023In IEEE Transactions on Software Engineering
Peer reviewed


Full Text
Author preprint (6.29 MB)

All documents in ORBilu are protected by a user license.

Send to


Keywords :
Deep Neural Network; Testing
Abstract :
[en] Deep Neural Networks (DNNs) have been extensively used in many areas including image processing, medical diagnostics and autonomous driving. However, DNNs can exhibit erroneous behaviours that may lead to critical errors, especially when used in safety-critical systems. Inspired by testing techniques for traditional software systems, researchers have proposed neuron coverage criteria, as an analogy to source code coverage, to guide the testing of DNNs. Despite very active research on DNN coverage, several recent studies have questioned the usefulness of such criteria in guiding DNN testing. Further, from a practical standpoint, these criteria are white-box as they require access to the internals or training data of DNNs, which is often not feasible or convenient. Measuring such coverage requires executing DNNs with candidate inputs to guide testing, which is not an option in many practical contexts. In this paper, we investigate diversity metrics as an alternative to white-box coverage criteria. For the previously mentioned reasons, we require such metrics to be black-box and not rely on the execution and outputs of DNNs under test. To this end, we first select and adapt three diversity metrics and study, in a controlled manner, their capacity to measure actual diversity in input sets. We then analyze their statistical association with fault detection using four datasets and five DNNs. We further compare diversity with state-of-the-art white-box coverage criteria. As a mechanism to enable such analysis, we also propose a novel way to estimate fault detection in DNNs. Our experiments show that relying on the diversity of image features embedded in test input sets is a more reliable indicator than coverage criteria to effectively guide DNN testing. Indeed, we found that one of our selected black-box diversity metrics far outperforms existing coverage criteria in terms of fault-revealing capability and computational time. Results also confirm the suspicions that state-of-the-art coverage criteria are not adequate to guide the construction of test input sets to detect as many faults as possible using natural inputs.
Research center :
Interdisciplinary Centre for Security, Reliability and Trust (SnT) > SVV - Software Verification and Validation
Disciplines :
Computer science
Author, co-author :
Aghababaeyan, Zohreh;  University of Ottawa
Abdellatif, Manel;  Ecole de Technologie Supérieure
Briand, Lionel ;  University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SVV
S, Ramesh;  General Motor
Bagherzadeh, Mojtaba;  University of Ottawa
External co-authors :
Language :
Title :
Black-Box Testing of Deep Neural Networks through Test Case Diversity
Publication date :
Journal title :
IEEE Transactions on Software Engineering
Publisher :
Institute of Electrical and Electronics Engineers, New-York, United States - New York
Peer reviewed :
Peer reviewed
Focus Area :
Security, Reliability and Trust
Funders :
General Motors
Available on ORBilu :
since 03 March 2023


Number of views
88 (7 by Unilu)
Number of downloads
39 (5 by Unilu)

Scopus citations®
Scopus citations®
without self-citations


Similar publications

Contact ORBilu