DNN Explanation; DNN Functional Safety Analysis; Debugging; Heatmaps; AI
Abstract :
[en] Deep neural networks (DNNs) are increasingly im- portant in safety-critical systems, for example in their perception layer to analyze images. Unfortunately, there is a lack of methods to ensure the functional safety of DNN-based components.
We observe three major challenges with existing practices regarding DNNs in safety-critical systems: (1) scenarios that are underrepresented in the test set may lead to serious safety violation risks, but may, however, remain unnoticed; (2) char- acterizing such high-risk scenarios is critical for safety analysis; (3) retraining DNNs to address these risks is poorly supported when causes of violations are difficult to determine.
To address these problems in the context of DNNs analyzing images, we propose HUDD, an approach that automatically supports the identification of root causes for DNN errors. HUDD identifies root causes by applying a clustering algorithm to heatmaps capturing the relevance of every DNN neuron on the DNN outcome. Also, HUDD retrains DNNs with images that are automatically selected based on their relatedness to the identified image clusters.
We evaluated HUDD with DNNs from the automotive domain. HUDD was able to identify all the distinct root causes of DNN errors, thus supporting safety analysis. Also, our retraining approach has shown to be more effective at improving DNN accuracy than existing approaches.
Research center :
Interdisciplinary Centre for Security, Reliability and Trust (SnT) > Software Verification and Validation Lab (SVV Lab) ULHPC - University of Luxembourg: High Performance Computing
Disciplines :
Computer science
Author, co-author :
FAHMY, Hazem ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SVV
PASTORE, Fabrizio ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SVV
Bagherzadeh, Mojtaba; University of Ottawa
BRIAND, Lionel ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SVV
External co-authors :
yes
Language :
English
Title :
Supporting DNN Safety Analysis and Retraining through Heatmap-based Unsupervised Learning
Publication date :
May 2021
Journal title :
IEEE Transactions on Reliability
eISSN :
0018-9529
Publisher :
Institute of Electrical and Electronics Engineers, New-York, United States - New York
Special issue title :
Special Section on Quality Assurance of Machine Learning Systems
Volume :
70
Issue :
4
Pages :
1641-1657
Peer reviewed :
Peer Reviewed verified by ORBi
European Projects :
H2020 - 694277 - TUNE - Testing the Untestable: Model Testing of Complex Software-Intensive Systems
Tesla Inc., "Overview of neural net for vision, sonar and radar processing software," 2019. [Online]. Available: https://www.tesla.com/BLOG/ ALL-TESLA-CARS-BEING-PRODUCED-NOW-HAVE-FULL/-SELF-DRIVING-HARDWARE?redirect=no
R. A. Naqvi, M. Arsalan, G. Batchuluun, H. S. Yoon, and K. R. Park, "Deep learning-based gaze detection system for automobile drivers using a NIR camera sensor," Sensors, vol. 18, no. 2, pp. 1-34, 2018.
International Organization for Standardization, "ISO, ISO-24765-2017, systems and software engineering-vocabulary," 2020.
R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, "Grad-cam: Visual explanations from deep networks via gradientbased localization," in Proc. IEEE Int. Conf. Comput. Vis., Oct. 2017, pp. 618-626.
International Organization for Standardization, "ISO, ISO 26262-1:2018, road vehicles: Functional safety," 2020.
International Organization for Standardization, "ISO/PAS 21448:2019, road vehicles: Safety of the intended functionality," 2020.
G. Montavon, A. Binder, S. Lapuschkin, W. Samek, and K. R. Müller, Layer-Wise Relevance Propagation: An Overview. Cham, Switzerland: Springer International Publishing, 2019, pp. 193-209.
S. Ma, Y. Liu, W.-C. Lee, X. Zhang, and A. Grama, "Mode: Automated neural network model debugging via state differential analysis and input selection," in Proc. 26th ACM Joint Meeting Eur. Softw. Eng. Conf. Symp. Foundations Softw. Eng., ser. ESEC/FSE 2018, New York, NY, USA: ACM, 2018, pp. 175-186.
R. S. King, Cluster Analysis and Data Mining: An Introduction. Herndon, VI, USA: Mercury Learning & Information, 2014.
E. Wood, T. Baltruŝaitis, L.-P. Morency, P. Robinson, and A. Bulling, "Learning an appearance-based gaze estimator from one million synthesised images," in Proc. 9th Biennial ACM Symp. Eye Tracking Res. Appl., ser. ETRA '16, New York, NY, USA: ACM, 2016, pp. 131-138.
Blender, "Blender 3d simulation and rendering engine," 2020. [Online]. Available: https://www.blender.org/
H. Daume III, A Course in Machine Learning, 2020. [Online]. Available: http://ciml.info/
R. Garcia, A. C. Telea, B. C. da Silva, J. Torresen, and J. L. D. Comba, "A task-and-technique centered survey on visual analytics for deep learning model engineering," Comput. Graph., vol. 77, pp. 30-49, 2018.
V. Petsiuk, A. Das, and K. Saenko, "Rise: Randomized input sampling for explanation of black-box models," in Proc. Brit. Mach. Vis. Conf., 2018, pp. 1-11.
P. Dabkowski and Y. Gal, "Real time image saliency for black box classifiers," in Proc. 31st Int.Conf.Neural Inf. Process. Syst., ser.NIPS17, Red Hook, NY, USA: Curran Associates Inc., 2017, pp. 6970-6979.
M. D. Zeiler and R. Fergus, "f," in Proc. Eur. Conf. Comput. Vis., D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, Eds. Cham, Switzerland: Springer International Publishing, 2014, pp. 818-833.
J. Springenberg, A. Dosovitskiy, T. Brox, and M. Riedmiller, "Striving for simplicity: The all convolutional net," in Proc. Int. Conf. Learn. Representations (workshop track), 2015, pp. 1-14.
B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, "Learning deep features for discriminative localization," in Proc. IEEE Conf.Comput. Vis. Pattern Recognit., Jun. 2016, pp. 2921-2929.
G. Castanon and J. Byrne, "Visualizing and quantifying discriminative features for face recognition," in Proc. 13th IEEE Int. Conf. Autom. Face Gesture Recognit., May 2018, pp. 16-23.
W. Samek, A. Binder, G. Montavon, S. Lapuschkin, and K. Muller, "Evaluating the visualization of what a deep neural network has learned," IEEE Trans. Neural Netw. Learn. Syst., vol. 28, no. 11, pp. 2660-2673, Nov. 2017.
G. Montavon, S. Lapuschkin, A. Binder, W. Samek, and K.-R. Muller, "Explaining nonlinear classification decisions with deep taylor decomposition," Pattern Recognit., vol. 65, pp. 211-222, 2017.
I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. Cambridge, MA, USA: MIT Press, 2016. [Online]. Available: http://www. deeplearningbook.org.
A. Newell, K. Yang, and J. Deng, "Stacked hourglass networks for human pose estimation," in roc. Eur. Conf. Comput. Vis., B. J. Leibe Matas, N. Sebe, and M. Welling, Eds. Cham, Switzerland: Springer International Publishing, 2016, pp. 483-499.
G. Montavon, "WIFS 2017 tutorial on methods for understanding DNNs and their predictions," 2019. [Online]. Available: http://heatmapping.org/ wifs2017/
J. H.Ward, Jr., "Hierarchical grouping to optimize an objective function," Amer. Stat. Assoc. J., vol. 58, pp. 236-244, 1963.
R. R. Sokal and C. D. Michener, "A statistical method for evaluating systematic relationships," Univ. Kansas Sci. Bull., vol. 38, pp. 1409-1438, 1958.
F. Murtagh and P. Contreras, "Algorithms for hierarchical clustering: An overview," WIREs Data Mining Knowl. Discov., vol. 2, no. 1, pp. 86-97, 2012.
V. Satopaa, J. Albrecht, D. Irwin, and B. Raghavan, "Finding a "kneedle" in a haystack: Detecting knee points in system behavior," in Proc. 31st Int. Conf. Distrib. Comput. Syst. Workshops, 2011, pp. 166-171.
Z. Li et al., "Generic and robust localization of multi-dimensional root causes," in Proc. IEEE 30th Int. Symp. Softw. Rel. Eng., 2019, pp. 47-57.
L. Jendele, M. Schwenk, D. Cremarenco, I. Janicijevic, and M. Rybalkin, "Efficient automated decomposition of build targets at large-scale," in Proc. 12th IEEE Conf. Softw. Testing, Validation Verification, 2019, pp. 457-464.
R. L. Thorndike, "Who belongs in the family?" Psychometrika, vol. 18, no. 4, pp. 267-276, 1953.
J. MacQueen, "Some methods for classification and analysis of multivariate observations," in Proc. 5th Berkeley Symp. Math. Statist. Probability-Vol. 1, L. M. Le Cam and J. Neyman, Eds. Berkeley, CA, USA: University of California Press, 1967, pp. 281-297.
F. Murtagh and P. Legendre,"Ward's hierarchical agglomerative clustering method: Which algorithms implementWard's criterion?" J. Classification, vol. 31, no. 3, pp. 274-295, Oct. 2014.
C. D. Manning, P. Raghavan, and H. Schütze, Introduction to Information Retrieval. Cambridge, U.K.: Cambridge University Press, 2008.
M. Inaba, N. Katoh, and H. Imai, "Variance-based k-clustering algorithms by Voronoi diagrams and randomization," IEICE Trans. Inform. Syst., vol. 83, pp. 1199-1206, 2000.
A. Vattani, "k-means requires exponentially many iterations even in the plane," Discrete Comput. Geometry, vol. 45, no. 4, pp. 596-616, 2011.
J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques, 3rd ed. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2011.
B. Fornberg, "Generation of finite difference formulas on arbitrarily spaced grids," Math. Comp., vol. 51, pp. 699-706, 1988.
S. Bakhshi, D. A. Shamma, L. Kennedy, Y. Song, P. de Juan, and J. J. Kaye, "Fast, cheap, and good: Why animated GIFs engage us," in Proc. CHIConf. Hum. FactorsComput. Syst., ser. CHI '16,NewYork,NY, USA: Association for Computing Machinery, 2016, pp. 575-586.
SciPy, "Pyton framework for mathematics, science, and engineering," 2020. [Online]. Available: https://scipy.org/
H. Fahmy, F. Pastore, M. Bagherzadeh, and L. Briand, "HUDD: Toolset and replicability package," 2020. [Online]. Available: https://sntsvv. github.io/HUDD/
A. Krizhevsky, I. Sutskever, and G. E. Hinton, "ImageNet classification with deep convolutional neural networks," Commun. ACM, vol. 60, no. 6, pp. 84-90, May 2017.
MakeHuman Community, "MakeHuman computer graphics middleware for the prototyping of humanoids," 2020. [Online]. Available: http://www. makehumancommunity.org
Z. Liu, P. Luo, X. Wang, and X. Tang, "Deep learning face attributes in the wild," in Proc. IEEE Int. Conf. Comput. Vis., 2015, pp. 3730-3738.
INI, "TRaffic sign dataset," 2020. [Online]. Available: http://benchmark. ini.rub.de/?section=gtsrb&subsection=dataset
A. Vargha and H. D. Delaney, "A critique and improvement of the CL common language effect size statistics of Mcgraw and Wong," J. Educ. Behav. Statist., vol. 25, no. 2, pp. 101-132, 2000.
A. Arcuri and L. Briand, "A practical guide for using statistical tests to assess randomized algorithms in software engineering," in Proc. 33rd Int. Conf. Softw. Eng., ser. ICSE '11, New York, NY, USA: ACM, 2011, pp. 1-10.
S. Varrette, P. Bouvry, H. Cartiaux, and F. Georgatos, "Management of an academic HPC cluster: The UL experience," in Proc. Int. Conf. High Perform. Comput. Simul., Bologna, Italy: IEEE, Jul. 2014, pp. 959-967.
X. Huang et al., "A survey of safety and trustworthiness of deep neural networks," Comput. Sci. Rev., vol. 37, 2018, Art. no. 100270.
J. M. Zhang, M. Harman, L. Ma, and Y. Liu, "Machine learning testing: Survey, landscapes and horizons," in IEEE Trans. Softw. Eng., pp. 1, 2020. doi: 10.1109/TSE.2019.2962027.
D. Gopinath, H. Converse, C. Pasareanu, and A. Taly, "Property inference for deep neural networks," in Proc. 34th IEEE/ACM Int. Conf. Automated Softw. Eng., 2019, pp. 797-809.
M. T. Ribeiro, S. Singh, andC.Guestrin, "Anchors: High-precisionmodelagnostic explanations," in Proc. AAAI Conf. Artif. Intell., 2018, pp. 1527-1535.
J. Kim, R. Feldt, and S. Yoo, "Guiding deep learning system testing using surprise adequacy," in Proc. 41st Int. Conf. Softw. Eng., Ser., 2019, pp. 1039-1049.
H. F. Eniser, S. Gerasimou, and A. Sen, "Deepfault: Fault localization for deep neural networks," in Fundamental Approaches to Software Engineering, R. Hähnle and W. van der Aalst, Eds., Cham, Switzerland: Springer International Publishing, 2019, pp. 171-191.
H. Zhang andW. K. Chan, "Apricot: A weight-adaptation approach to fixing deep learning models," in Proc. 34th IEEE/ACM Int. Conf. Automated Softw. Eng., Nov 2019, pp. 376-387.
X. Gao, R. K. Saha, M. R. Prasad, and A. Roychoudhury, "Fuzz testing based data augmentation to improve robustness of deep neural networks," in Proc. 42nd Int. Conf. Softw. Eng., Ser., New York, NY, USA: ACM, 2020, pp. 1147-1158.
L. Engstrom, B. Tran, D. Tsipras, L. Schmidt, and A. Madry, "Exploring the landscape of spatial robustness," in Proc. 36th Int. Conf.Mach. Learn., ser. Proc. Mach. Learn. Res., vol. 97, K. Chaudhuri and R. Salakhutdinov, Eds., Long Beach, CA, USA: PMLRG, Jun. 2019, pp. 1802-1811.
B. Settles, "Active learning," Synth. Lectures Artif. Intell. Mach. Learn., vol. 18, no. 1, pp. 1-111, 2012.
O. Sener and S. Savarese, "Active learning for convolutional neural networks: A core-set approach," in Proc. 6th Int. Conf. Learn. Representations, Conf. Track Proc., 2018, pp. 1-13.
Y. Shen, H. Yun, Z. Lipton, Y. Kronrod, and A. Anandkumar, "Deep active learning for named entity recognition," in Proc. 2nd Workshop Representation Learn. NLP, Vancouver, Canada: Association for Computational Linguistics, Aug. 2017, pp. 252-256.
B. Settles, Active Learning: Active Learning. San Rafael, CA, USA: Morgan & Claypool Publishers, 2011.
Z. Li, X.Ma, C. Xu, C. Cao, J. Xu, and J. Lü, "Boosting operational DNN testing efficiency through conditioning," in Proc. 27th ACM JointMeeting Eur. Softw. Eng. Conf. Symp. Foundations Softw. Eng., ser., New York, NY, USA: Association for Computing Machinery, 2019, pp. 499-509.
J. Chen, Z. Wu, Z. Wang, H. You, L. Zhang, and M. Yan, "Practical accuracy estimation for efficient deep neural network testing," ACMTrans. Softw. Eng. Methodol., vol. 29, no. 4, pp. 1-35, Oct. 2020.
L. McInnes, J. Healy, and S. Astels, "HDBSCAN: Hierarchical density based clustering," J. Open Source Softw., vol. 2, no. 11, 2017, Art. no. 205.
M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, "A density-based algorithm for discovering clusters in large spatial databases with noise," in Proc. 2nd Int. Conf. Knowl. Discov. Data Mining, ser. KDD'96., AAAI Press, 1996, pp. 226-231.
E. Oja and Z. Yuan, "The fastica algorithm revisited: Convergence analysis," IEEE Trans. Neural Netw., vol. 17, no. 6, pp. 1370-1381, Nov. 2006.
N. Humbatova, G. Jahangirova, G. Bavota, V. Riccio, A. Stocco, and P. Tonella, "Taxonomy of real faults in deep learning systems," in Proc. 42nd Int. Conf. Softw. Eng., New York, NY, USA: Association for Computing Machinery, 2020, pp. 1110-1121.