Abstract :
[en] The adoption of deep neural networks (DNNs) in safety-critical contexts is
often prevented by the lack of effective means to explain their results,
especially when they are erroneous. In our previous work, we proposed a
white-box approach (HUDD) and a black-box approach (SAFE) to automatically
characterize DNN failures. They both identify clusters of similar images from a
potentially large set of images leading to DNN failures. However, the analysis
pipelines for HUDD and SAFE were instantiated in specific ways according to
common practices, deferring the analysis of other pipelines to future work. In
this paper, we report on an empirical evaluation of 99 different pipelines for
root cause analysis of DNN failures. They combine transfer learning,
autoencoders, heatmaps of neuron relevance, dimensionality reduction
techniques, and different clustering algorithms. Our results show that the best
pipeline combines transfer learning, DBSCAN, and UMAP. It leads to clusters
almost exclusively capturing images of the same failure scenario, thus
facilitating root cause analysis. Further, it generates distinct clusters for
each root cause of failure, thus enabling engineers to detect all the unsafe
scenarios. Interestingly, these results hold even for failure scenarios that
are only observed in a small percentage of the failing images.
Scopus citations®
without self-citations
3