DNN Explanation; Search-based Testing; Functional Safety; Debugging; AI
Résumé :
[en] When Deep Neural Networks (DNNs) are used in safety-critical systems, engineers should determine the safety risks associated with failures (i.e., erroneous outputs) observed during testing. For DNNs processing images, engineers visually inspect all failure-inducing images to determine common characteristics among them. Such characteristics correspond to hazard-triggering events (e.g., low illumination) that are essential inputs for safety analysis. Though informative, such activity is expensive and error-prone.
To support such safety analysis practices, we propose SEDE, a technique that generates readable descriptions for commonalities in failure-inducing, real-world images and improves the DNN through effective retraining. SEDE leverages the availability of simulators, which are commonly used for cyber-physical systems. It relies on genetic algorithms to drive simulators towards the generation of images that are similar to failure-inducing, real-world images in the test set; it then employs rule learning algorithms to derive expressions that capture commonalities in terms of simulator parameter values. The derived expressions are then used to generate additional images to retrain and improve the DNN.
With DNNs performing in-car sensing tasks, SEDE successfully characterized hazard-triggering events leading to a DNN accuracy drop. Also, SEDE enabled retraining leading to significant improvements in DNN accuracy, up to 18 percentage points.
Centre de recherche :
Interdisciplinary Centre for Security, Reliability and Trust (SnT) > Software Verification and Validation Lab (SVV Lab) ULHPC - University of Luxembourg: High Performance Computing
Disciplines :
Sciences informatiques
Auteur, co-auteur :
FAHMY, Hazem ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SVV
PASTORE, Fabrizio ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SVV
BRIAND, Lionel ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SVV
Stifter, Thomas
Co-auteurs externes :
yes
Langue du document :
Anglais
Titre :
Simulator-based explanation and debugging of hazard-triggering events in DNN-based safety-critical systems
Date de publication/diffusion :
27 mai 2023
Titre du périodique :
ACM Transactions on Software Engineering and Methodology
ISSN :
1049-331X
Maison d'édition :
Association for Computing Machinery (ACM), Etats-Unis
Raja Ben Abdessalem, Shiva Nejati, Lionel C. Briand, and Thomas Stifter. 2018. Testing vision-based control systems using learnable evolutionary algorithms. In Proceedings of the 40th International Conference on Software Engineering (ICSE'18). ACM, New York, NY, 1016-1026. https://doi.org/10.1145/3180155.3180160
Raja Ben Abdessalem, Shiva Nejati, Lionel C. Briand, and Thomas Stifter. 2018. Testing vision-based control systems using learnable evolutionary algorithms. In Proceedings of the IEEE/ACM 40th International Conference on Software Engineering (ICSE'18). IEEE, 1016-1026.
Authors of this paper. 2022. SEDE Code Repository. Retrieved from https://github.com/SNTSVV/SEDE.
Authors of this paper. 2022. SEDE: Replicability Package. Retrieved fromhttps://doi.org/10.6084/m9.figshare.19467401. v1.
Raja Ben Abdessalem, Shiva Nejati, Lionel C. Briand, and Thomas Stifter. 2016. Testing advanced driver assistance systems using multi-objective search and neural networks. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering (ASE'16). 63-74.
Jordan J. Bird, Diego R. Faria, Anikó Ekárt, and Pedro P. S. Ayrosa. 2020. From simulation to reality: CNN transfer learning for scene classification. In Proceedings of the IEEE 10th International Conference on Intelligent Systems (IS'20). 619-625. https://doi.org/10.1109/IS48319.2020.9199968
Blender. 2020. Blender 3D Simulation and Rendering Engine. Retrieved from https://www.blender.org/.
N. Carlini and D. Wagner. 2017. Towards evaluating the robustness of neural networks. In Proceedings of the IEEE Symposium on Security and Privacy (SP'17). IEEE Computer Society, Los Alamitos, CA, 39-57. https://doi.org/10.1109/ SP.2017.49
WilliamW. Cohen. 1995. Fast effective rule induction. In Proceedings of the International Conference on Machine Learning, Armand Prieditis and Stuart Russell (Eds.). Morgan Kaufmann, San Francisco, CA, 115-123. https://doi.org/10. 1016/B978-1-55860-377-6.50023-2
MakeHuman community. 2020. MakeHuman Computer Graphics Middleware for the Prototyping of Humanoids. Retrieved from http://www.makehumancommunity.org.
Piotr Dabkowski and Yarin Gal. 2017. Real time image saliency for black box classifiers. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS'17). Curran Associates Inc., Red Hook, NY, 6970-6979.
Kalyanmoy Deb, Amrit Pratap, Sameer Agarwal, and T. Meyarivan. 2002. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6, 2 (2002), 182-197. https://doi.org/10.1109/4235.996017
Steve Dias Da Cruz, Bertram Taetz, Thomas Stifter, and Didier Stricker. 2022. Autoencoder attractors for uncertainty estimation. In Proceedings of the IEEE International Conference on Pattern Recognition (ICPR'22).
Steve Dias Da Cruz, Bertram Taetz, Thomas Stifter, and Didier Stricker. 2022. Autoencoder for synthetic to real generalization: From simple to more complex scenes. In Proceedings of the IEEE International Conference on Pattern Recognition (ICPR'22).
Logan Engstrom, Brandon Tran, Dimitris Tsipras, Ludwig Schmidt, and Aleksander Madry. 2019. Exploring the landscape of spatial robustness. In Proceedings of the 36th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 97), Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.). PMLRG, Long Beach, CA, 1802-1811. http://proceedings.mlr.press/v97/engstrom19a.html.
Hasan Ferit Eniser, Simos Gerasimou, and Alper Sen. 2019. DeepFault: Fault localization for deep neural networks. In Fundamental Approaches to Software Engineering, Reiner Hähnle and Wil van der Aalst (Eds.). Springer International Publishing, Cham, 171-191.
Hazem Fahmy, Fabrizio Pastore, Mojtaba Bagherzadeh, and Lionel Briand. 2021. Supporting deep neural network safety analysis and retraining through heatmap-based unsupervised learning. IEEE Trans. Reliabil. 70, 4 (2021), 1641-1657. https://doi.org/10.1109/TR.2021.3074750
Gabriele Fanelli, Matthias Dantone, Juergen Gall, Andrea Fossati, and Luc Van Gool. 2013. Random forests for real time 3D face analysis. Int. J. Comput. Vision 101, 3 (February 2013), 437-458.
Eibe Frank and Ian H. Witten. 1998. Generating accurate rule sets without global optimization. In Proceedings of the 15th International Conference on Machine Learning, J. Shavlik (Ed.). Morgan Kaufmann, 144-151.
Gordon Fraser and Andrea Arcuri. 2013. Whole test suite generation. IEEE Trans. Softw. Eng. 39, 2 (2013), 276-291. https://doi.org/10.1109/TSE.2012.14
Xiang Gao, Ripon K. Saha, Mukul R. Prasad, and Abhik Roychoudhury. 2020. Fuzz testing based data augmentation to improve robustness of deep neural networks. In Proceedings of the 42nd International Conference on Software Engineering (ICSE'20). Association for Computing Machinery, New York, NY, 10 pages.
Rafael Garcia, Alexandru C. Telea, Bruno Castro da Silva, Jim Torresen, and Joao Luiz Dihl Comba. 2018. A task-andtechnique centered survey on visual analytics for deep learning model engineering. Comput. Graph. 77 (2018), 30-49. https://doi.org/10.1016/j.cag.2018.09.018
FaceShift GmbH. U.S. Patent 9378576, May 2016. Online modeling for real-time facial animation.
Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. 2015. Explaining and harnessing adversarial examples. In Proceedings of the 3rd International Conference on Learning Representations (ICLR'15). 1-11.
Fitash Ul Haq, Donghwan Shin, and Lionel Briand. 2022. Efficient online testing for DNN-enabled systems using surrogate-assisted and many-objective optimization. In Proceedings of the IEEE/ACM 44th International Conference on Software Engineering (ICSE'22). 811-822. https://doi.org/10.1145/3510003.3510188
Fitash Ul Haq, Donghwan Shin, Lionel C. Briand, Thomas Stifter, and Jun Wang. 2021. Automatic test suite generation for key-points detection DNNs using many-objective search (experience paper). In Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA'21). Association for Computing Machinery, New York, NY, 91-102. https://doi.org/10.1145/3460319.3464802
Fitash Ul Haq, Donghwan Shin, Shiva Nejati, and Lionel Briand. 2021. Can offline testing of deep neural networks replace their online testing? A case study of automated driving systems. Empir. Softw. Eng. 26, 5 (September 2021), 30 pages. https://doi.org/10.1007/s10664-021-09982-4
Fitash Ul Haq, Donghwan Shin, Shiva Nejati, and Lionel C. Briand. 2020. Comparing offline and online testing of deep neural networks: An autonomous car case study. In Proceedings of the IEEE 13th International Conference on Software Testing, Validation and Verification (ICST'20). 85-95. https://doi.org/10.1109/ICST46399.2020.00019
IEE. 2020. IEE Sensing Solution. Retrieved from www.iee.lu.
Innvestigate. 2020. DNN Explanation. Retrieved from https://github.com/albermax/innvestigate.
Tadanobu Inoue, Subhajit Choudhury, Giovanni De Magistris, and Sakyasingha Dasgupta. 2018. Transfer learning from synthetic to real images using variational autoencoders for precise position detection. In Proceedings of the 25th IEEE International Conference on Image Processing (ICIP'18). 2725-2729. https://doi.org/10.1109/ICIP.2018.8451064
International Organization for Standardization. 2020. ISO/PAS 21448:2019, Road Vehicles: Safety of the Intended Functionality.
Hisao Ishibuchi, Yuji Sakane, Noritaka Tsukamoto, and Yusuke Nojima. 2009. Evolutionary many-objective optimization by NSGA-II and MOEA/D with large populations. In Proceedings of the IEEE International Conference on Systems, Man and Cybernetics. 1758-1763. https://doi.org/10.1109/ICSMC.2009.5346628
Edward Kim, Divya Gopinath, Corina Pâsâreanu, and Sanjit A. Seshia. 2020. A programmatic and semantic approach to explaining and debugging neural network based object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR'20). 11125-11134. https://doi.org/10.1109/CVPR42600.2020.01114
Jinhan Kim, Robert Feldt, and Shin Yoo. 2019. Guiding deep learning system testing using surprise adequacy. In Proceedings of the 41st International Conference on Software Engineering (ICSE'19). IEEE Press, 1039-1049. https: //doi.org/10.1109/ICSE.2019.00108
Jiman Kim and Chanjong Park. 2017. End-to-end ego lane estimation based on sequential transfer learning for selfdriving cars. In Proceedings of the IEEE Conference on Computer Vision and Pattern RecognitionWorkshops (CVPRW'17). 1194-1202. https://doi.org/10.1109/CVPRW.2017.158
Ronald S. King. 2014. Cluster Analysis and Data Mining: An Introduction. Mercury Learning & Information.
Barbara Kitchenham, Lech Madeyski, David Budgen, Jacky Keung, Pearl Brereton, Stuart Charters, Shirley Gibbs, and Amnart Pohthong. 2017. Robust statistical methods for empirical software engineering. Empir. Softw. Eng. 22, 2 (2017), 579-630. https://doi.org/10.1007/s10664-016-9437-5
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2017. ImageNet classification with deep convolutional neural networks. Commun. ACM 60, 6 (May 2017), 84-90. https://doi.org/10.1145/3065386
Bingdong Li, Jinlong Li, Ke Tang, and Xin Yao. 2015. Many-objective evolutionary algorithms: A survey. ACM Comput. Surv. 48, 1, Article 13 (September 2015), 35 pages. https://doi.org/10.1145/2792984
Zewen Li, Fan Liu,Wenjie Yang, Shouheng Peng, and Jun Zhou. 2021.Asurvey of convolutional neural networks:Analysis, applications, and prospects. IEEE Trans. Neural Netw. Learn. Syst. (2021), 1-21. https://doi.org/10.1109/TNNLS. 2021.3084827
Shiqing Ma, Yingqi Liu, Wen-Chuan Lee, Xiangyu Zhang, and Ananth Grama. 2018. MODE: Automated neural network model debugging via state differential analysis and input selection. In Proceedings of the 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2018). ACM, New York, NY, 175-186. https://doi.org/10.1145/3236024.3236082
Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. 2018. Towards deep learning models resistant to adversarial attacks. In Proceedings of the 6th International Conference on Learning Representations (ICLR'18). 0-28.
ManuelBastioniLAB. 2016. Character Creation Tool for Blender. Retrieved from https://github.com/animate1978/MBLab.
Microsoft. 2010. Retrieved from https://developer.microsoft.com/en-us/windows/kinect/.
Christoph Molnar. 2011. Interpretable Machine Learning. LuLu.
Grégoire Montavon, Alexander Binder, Sebastian Lapuschkin,Wojciech Samek, and Klaus Robert Müller. 2019. Layer-Wise Relevance Propagation: An Overview. Springer International Publishing, Cham, 193-209. https://doi.org/10.1007/ 978-3-030-28954-6_10
Grégoire Montavon, Sebastian Lapuschkin, Alexander Binder, Wojciech Samek, and Klaus Robert Müller. 2017. Explaining nonlinear classification decisions with deep Taylor decomposition. Pattern Recogn. 65 (2017), 211-222. https://doi.org/10.1016/j.patcog.2016.11.008
Mohamed El Mostadi, HélèneWaeselynck, and Jean-Marc Gabriel. 2022. Virtual test scenarios for ADAS: Distance to real scenarios matters! In Proceedings of the IEEE Intelligent Vehicles Symposium (IV'22). 836-841. https://doi.org/10. 1109/IV51971.2022.9827170
Rizwan Ali Naqvi, Muhammad Arsalan, Ganbayar Batchuluun, Hyo Sik Yoon, and Kang Ryoung Park. 2018. Deep learning-based gaze detection system for automobile drivers using a NIR camera sensor. Sensors 18, 2 (2018). https: //doi.org/10.3390/s18020456
Alejandro Newell, Kaiyu Yang, and Jia Deng. 2016. Stacked hourglass networks for human pose estimation. In Proceedings of the European Conference on Computer Vision (ECCV'16), Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling (Eds.). Springer International Publishing, Cham, 483-499.
Annibale Panichella, Fitsum Meshesha Kifetew, and Paolo Tonella. 2018. Automated test case generation as a manyobjective optimisation problem with dynamic selection of the targets. IEEE Trans. Softw. Eng. 44, 2 (2018), 122-158. https://doi.org/10.1109/TSE.2017.2663435
Vitali Petsiuk, Abir Das, and Kate Saenko. 2018. RISE: Randomized input sampling for explanation of black-boxmodels. In Proceedings of the British Machine Vision Conference (BMVC'18).
PyTorch. 2020. PyTorch DNN Framework. Retrieved from https://pytorch.org.
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2018. Anchors: High-precision model-agnostic explanations. In Proceedings of the AAAI Conference on Artificial Intelligence.
Vincenzo Riccio, Nargiz Humbatova, Gunel Jahangirova, and Paolo Tonella. 2021. DeepMetis: Augmenting a deep learning test set to increase its mutation score. In Proceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering (ASE'21). IEEE Press, 355-367. https://doi.org/10.1109/ASE51524.2021.9678764
Vincenzo Riccio and Paolo Tonella. 2020. Model-based exploration of the frontier of behaviours for deep learning system testing. In Proceedings of the 28th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE'20). 876-888. https://doi.org/10.1145/3368089.3409730 arXiv:2007.02787
Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. Grad-CAM: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision (ICCV'17). 618-626. https://doi.org/10.1109/ICCV.2017.74
Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. 2015. Striving for simplicity: The all convolutional net. In Proceedings of the International Conference on Learning Representations (ICLR'15).
DLIB Team. DLIB Team. 2022. C++ toolkit containing machine learning algorithms and tools for creating complex software.
Tesla, Inc. 2019. Overview of neural net for vision, sonar and radar processing software. https://www.tesla.com/blog/ all-tesla-cars-being-produced-now-have-full/-self-driving-hardware?redirect=no.
Vedat Togan and Aye T. Daloglu. 2008. An improved genetic algorithm with initial population strategy and selfadaptive member grouping. Comput. Struct. 86, 11 (2008), 1204-1218. https://doi.org/10.1016/j.compstruc.2007.11.006
TorchRay. 2020. DNN Explanation. Retrieved from https://github.com/facebookresearch/TorchRay.
Jingyi Wang, Jialuo Chen, Youcheng Sun, Xingjun Ma, Dongxia Wang, Jun Sun, and Peng Cheng. 2021. RobOT: Robustness-oriented testing for deep learning systems. In Proceedings of the 43rd International Conference on Software Engineering (ICSE'21). IEEE Press, 300-311. https://doi.org/10.1109/ICSE43902.2021.00038
Xiaofei Xie, Lei Ma, Felix Juefei-Xu, Minhui Xue, Hongxu Chen, Yang Liu, Jianjun Zhao, Bo Li, Jianxiong Yin, and Simon See. 2019. Deephunter: A coverage-guided fuzz testing framework for deep neural networks. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA'19). 158-168. https://doi.org/ 10.1145/3293882.3330579
Matthew D. Zeiler and Rob Fergus. 2014. Visualizing and understanding convolutional networks. In Proceedings of the Eurpoean Conference on Computer Vision (ECCV'14), David Fleet, Tomas Pajdla, Bernt Schiele, and Tinne Tuytelaars (Eds.). Springer International Publishing, Cham, 818-833.
Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba. 2016. Learning deep features for discriminative localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'16). 2921-2929. https://doi.org/10.1109/CVPR.2016.319
Tahereh Zohdinasab, Vincenzo Riccio, Alessio Gambi, and Paolo Tonella. 2021. Deephyperion: Exploring the feature space of deep learning-based systems through illumination search. In Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis. 79-90.