Reference : Black-Box Testing of Deep Neural Networks through Test Case Diversity |
Scientific journals : Article | |||
Engineering, computing & technology : Computer science | |||
Security, Reliability and Trust | |||
http://hdl.handle.net/10993/54517 | |||
Black-Box Testing of Deep Neural Networks through Test Case Diversity | |
English | |
Aghababaeyan, Zohreh [University of Ottawa] | |
Abdellatif, Manel [Ecole de Technologie Supérieure] | |
Briand, Lionel ![]() | |
S, Ramesh [General Motor] | |
Bagherzadeh, Mojtaba [University of Ottawa] | |
In press | |
IEEE Transactions on Software Engineering | |
Institute of Electrical and Electronics Engineers | |
Yes | |
International | |
0098-5589 | |
1939-3520 | |
New-York | |
United States - New York | |
[en] Deep Neural Network ; Testing | |
[en] Deep Neural Networks (DNNs) have been extensively used in many areas including image processing, medical diagnostics and autonomous driving. However, DNNs can exhibit erroneous behaviours that may lead to critical errors, especially when used in safety-critical systems. Inspired by testing techniques for traditional software systems, researchers have proposed neuron coverage criteria, as an analogy to source code coverage, to guide the testing of DNNs. Despite very active research on DNN coverage, several recent studies have questioned the usefulness of such criteria in guiding DNN testing. Further, from a practical standpoint, these criteria are white-box as they require access to the internals or training data of DNNs, which is often not feasible or convenient. Measuring such coverage requires executing DNNs with candidate inputs to guide testing, which is not an option in many practical contexts.
In this paper, we investigate diversity metrics as an alternative to white-box coverage criteria. For the previously mentioned reasons, we require such metrics to be black-box and not rely on the execution and outputs of DNNs under test. To this end, we first select and adapt three diversity metrics and study, in a controlled manner, their capacity to measure actual diversity in input sets. We then analyze their statistical association with fault detection using four datasets and five DNNs. We further compare diversity with state-of-the-art white-box coverage criteria. As a mechanism to enable such analysis, we also propose a novel way to estimate fault detection in DNNs. Our experiments show that relying on the diversity of image features embedded in test input sets is a more reliable indicator than coverage criteria to effectively guide DNN testing. Indeed, we found that one of our selected black-box diversity metrics far outperforms existing coverage criteria in terms of fault-revealing capability and computational time. Results also confirm the suspicions that state-of-the-art coverage criteria are not adequate to guide the construction of test input sets to detect as many faults as possible using natural inputs. | |
General Motors | |
Researchers | |
http://hdl.handle.net/10993/54517 |
File(s) associated to this reference | ||||||||||||||
Fulltext file(s):
| ||||||||||||||
All documents in ORBilu are protected by a user license.