Aries: Efficient Testing of Deep Neural Networks via Labeling-Free Accuracy Estimation

HU, Qiang; GUO, Yuejun; Xie, Xiaofei; CORDY, Maxime; PAPADAKIS, Mike; Ma, Lei; Traon, YvesLe

doi:10.1109/ICSE48619.2023.00152

Download

Paper published in a journal (Scientific congresses, symposiums and conference proceedings)

Aries: Efficient Testing of Deep Neural Networks via Labeling-Free Accuracy Estimation

HU, Qiang; GUO, Yuejun; Xie, Xiaofei et al.

2023 • In 45th IEEE/ACM International Conference on Software Engineering (ICSE), p. 1776–1787

Peer reviewed

Permalink
https://hdl.handle.net/10993/59234

DOI
10.1109/ICSE48619.2023.00152

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

ICSE2023_Acc_Estimation (1).pdf

Author postprint (726.61 kB)

Download

All documents in ORBilu are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Disciplines :

Computer science

Author, co-author :

HU, Qiang ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SerVal

GUO, Yuejun ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust > SerVal > Team Yves LE TRAON

Xie, Xiaofei

CORDY, Maxime ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SerVal

PAPADAKIS, Mike ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SerVal

Ma, Lei

Traon, YvesLe

External co-authors :

yes

Language :

English

Title :

Aries: Efficient Testing of Deep Neural Networks via Labeling-Free Accuracy Estimation

Publication date :

2023

Event name :

45th IEEE/ACM International Conference on Software Engineering (ICSE)

Event date :

2023

Audience :

International

Journal title :

45th IEEE/ACM International Conference on Software Engineering (ICSE)

Pages :

1776–1787

Peer reviewed :

Peer reviewed

Additional URL :

https://doi.org/10.1109/ICSE48619.2023.00152

Available on ORBilu :

since 28 December 2023

Statistics

Number of views

12 (0 by Unilu)

Number of downloads

18 (2 by Unilu)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

OpenCitations

Bibliography

Y. Sun, D. Liang, X. Wang, and X. Tang, "Deepid3: face recognition with very deep neural networks, " arXiv preprint arXiv: 1502. 00873, 2015.
M. Wang and W. Deng, "Deep face recognition: A survey, " Neurocomputing, vol. 429, pp. 215-244, 2021.
S. Grigorescu, B. Trasnea, T. Cocias, and G. Macesanu, "A survey of deep learning techniques for autonomous driving, " Journal of Field Robotics, vol. 37, no. 3, pp. 362-386, 2020.
K. Muhammad, A. Ullah, J. Lloret, J. Del Ser, and V. H. C. de Albuquerque, "Deep learning for safe autonomous driving: Current challenges and future directions, " IEEE Transactions on Intelligent Transportation Systems, vol. 22, no. 7, pp. 4316-4336, 2020.
O. Vinyals, I. Babuschkin, W. M. Czarnecki, M. Mathieu, A. Dudzik, J. Chung, D. H. Choi, R. Powell, T. Ewalds, P. Georgiev et al., "Grandmaster level in starcraft ii using multi-agent reinforcement learning, " Nature, vol. 575, no. 7782, pp. 350-354, 2019.
D. Ye, G. Chen, W. Zhang, S. Chen, B. Yuan, B. Liu, J. Chen, Z. Liu, F. Qiu, H. Yu et al., "Towards playing full moba games with deep reinforcement learning, " Advances in Neural Information Processing Systems, vol. 33, pp. 621-632, 2020.
Y. Tian, K. Pei, S. Jana, and B. Ray, "Deeptest: Automated testing of deep-neural-network-driven autonomous cars, " in Proceedings of the 40th international conference on software engineering, 2018, pp. 303-314.
Z. Li, M. Pan, T. Zhang, and X. Li, "Testing dnn-based autonomous driving systems under critical environmental conditions, " in International Conference on Machine Learning. PMLR, 2021, pp. 6471-6482.
Y. Zhou, S. Liu, J. Siow, X. Du, and Y. Liu, "Devign: effective vulnerability identification by learning comprehensive program semantics via graph neural networks, " in Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc., 2019.
G. Rothermel and M. J. Harrold, "A safe, efficient regression test selection technique, " ACM Transactions on Software Engineering and Methodology (TOSEM), vol. 6, no. 2, pp. 173-210, 1997.
E. Engström, P. Runeson, and M. Skoglund, "A systematic review on regression test selection techniques, " Information and Software Technology, vol. 52, no. 1, pp. 14-30, 2010.
Z. Li, X. Ma, C. Xu, C. Cao, J. Xu, and J. Lü, "Boosting operational dnn testing efficiency through conditioning, " in Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2019, pp. 499-509.
Q. Hu, Y. Guo, M. Cordy, X. Xie, L. Ma, M. Papadakis, and Y. Le Traon, "An empirical study on data distribution-aware test selection for deep learning enhancement, " ACM Transactions on Software Engineering and Methodology, vol. 31, no. 4, 2022.
W. Deng and L. Zheng, "Are labels always necessary for classifier accuracy evaluation?" in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 15 069-15 078.
J. Chen, Z. Wu, Z. Wang, H. You, L. Zhang, and M. Yan, "Practical accuracy estimation for efficient deep neural network testing, " ACM Transactions on Software Engineering and Methodology (TOSEM), vol. 29, no. 4, pp. 1-35, 2020.
J. M. Zhang, M. Harman, L. Ma, and Y. Liu, "Machine learning testing: survey, landscapes and horizons, " IEEE Transactions on Software Engineering, 2020.
P. W. Koh, S. Sagawa, H. Marklund, S. M. Xie, M. Zhang, A. Balsubramani, W. Hu, M. Yasunaga, R. L. Phillips, I. Gao, T. Lee, E. David, I. Stavness, W. Guo, B. Earnshaw, I. Haque, S. M. Beery, J. Leskovec, A. Kundaje, E. Pierson, S. Levine, C. Finn, and P. Liang, "Wilds: A benchmark of in-the-wild distribution shifts, " in Proceedings of the 38th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, M. Meila and T. Zhang, Eds., vol. 139. PMLR, 18-24 Jul 2021, pp. 5637-5664. [Online]. Available: https://proceedings. mlr. press/v139/koh21a. html
D. Hendrycks and T. Dietterich, "Benchmarking neural network robustness to common corruptions and perturbations, " arXiv preprint arXiv: 1903. 12261, 2019.
Z. Wang, H. You, J. Chen, Y. Zhang, X. Dong, and W. Zhang, "Prioritizing test inputs for deep neural networks via mutation analysis, " in IEEE/ACM 43rd International Conference on Software Engineering (ICSE), 2021, pp. 397-409.
Y. Gal and Z. Ghahramani, "Dropout as a bayesian approximation: Representing model uncertainty in deep learning, " in international conference on machine learning. PMLR, 2016, pp. 1050-1059.
X. Zhang, X. Xie, L. Ma, X. Du, Q. Hu, Y. Liu, J. Zhao, and M. Sun, "Towards characterizing adversarial defects of deep learning software from the lens of uncertainty, " in 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE). IEEE, 2020, pp. 739-751.
A. Krizhevsky and G. Hinton, "Learning multiple layers of features from tiny images, " University of Toronto, Toronto, Ontario, Tech. Rep. 0, 2009.
K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition, " in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770-778.
K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition, " arXiv preprint arXiv: 1409. 1556, 2014.
Y. Le and X. Yang, "Tiny imagenet visual recognition challenge, " CS 231N, vol. 7, no. 7, p. 3, 2015.
Q. Hu, L. Ma, X. Xie, B. Yu, Y. Liu, and J. Zhao, "Deepmutation++: A mutation testing framework for deep learning systems, " in 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2019, pp. 1158-1161.
N. Humbatova, G. Jahangirova, and P. Tonella, "Deepcrime: Mutation testing of deep learning systems based on real faults, " in Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis, 2021, pp. 67-78.
M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard et al., "Tensorflow: A system for largescale machine learning, " in 12th {USENIX} symposium on operating systems design and implementation ({OSDI} 16), 2016, pp. 265-283.
N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, "Dropout: A simple way to prevent neural networks from overfitting, " The journal of machine learning research, vol. 15, no. 1, pp. 1929-1958, 2014.
K. Ren, T. Zheng, Z. Qin, and X. Liu, "Adversarial attacks and defenses in deep learning, " Engineering, vol. 6, no. 3, pp. 346-360, 2020.
Y. Feng, Q. Shi, X. Gao, J. Wan, C. Fang, and Z. Chen, "Deepgini: prioritizing massive tests to enhance the robustness of deep neural networks, " in Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis, 2020, pp. 177-188.
M. Ducoffe and F. Precioso, "Adversarial active learning for deep networks: A margin based approach, " CoRR, vol. abs/1802. 09841, 2018.
J. Cui, S. Liu, L. Wang, and J. Jia, "Learnable boundary guided adversarial training, " in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 15 721-15 730.
J. Z. Bengar, J. van deWeijer, B. Twardowski, and B. Raducanu, "Reducing label effort: self-supervised meets active learning, " in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 1631-1639.
K. Leino, Z. Wang, and M. Fredrikson, "Globally-robust neural networks, " in International Conference on Machine Learning. PMLR, 2021, pp. 6212-6222.
J. Nixon, M. W. Dusenberry, L. Zhang, G. Jerfel, and D. Tran, "Measuring calibration in deep learning. " in CVPR workshops, vol. 2, no. 7, 2019.
D. Hendrycks, N. Mu, E. D. Cubuk, B. Zoph, J. Gilmer, and B. Lakshminarayanan, "Augmix: A simple data processing method to improve robustness and uncertainty, " arXiv preprint arXiv: 1912. 02781, 2019.
D. Hendrycks, A. Zou, M. Mazeika, L. Tang, D. Song, and J. Steinhardt, "Pixmix: dreamlike pictures comprehensively improve safety measures, " arXiv preprint arXiv: 2112. 05135, 2021.
D. Kang, Y. Sun, D. Hendrycks, T. Brown, and J. Steinhardt, "Testing robustness against unforeseen adversaries, " arXiv preprint arXiv: 1908. 08016, 2019.
H. B. Braiek and F. Khomh, "Deepevolution: A search-based testing approach for deep neural networks, " in 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 2019, pp. 454-458.
C. Birchler, S. Khatiri, P. Derakhshanfar, S. Panichella, and A. Panichella, "Automated test cases prioritization for self-driving cars in virtual environments, " arXiv preprint arXiv: 2107. 09614, 2021.
A. Panichella and C. C. Liem, "What are we really testing in mutation testing for machine learning? a critical reflection, " in 2021 IEEE/ACM 43rd International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER). IEEE, 2021, pp. 66-70.
J. Zhou, F. Li, J. Dong, H. Zhang, and D. Hao, "Cost-effective testing of a deep learning model through input reduction, " in 2020 IEEE 31st International Symposium on Software Reliability Engineering (ISSRE). IEEE, 2020, pp. 289-300.
P. Godefroid, M. Y. Levin, D. A. Molnar et al., "Automated whitebox fuzz testing. " in NDSS, vol. 8, 2008, pp. 151-166.
J. Guo, Y. Jiang, Y. Zhao, Q. Chen, and J. Sun, "Dlfuzz: differential fuzzing testing of deep learning systems, " in Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2018, pp. 739-743.
X. Xie, L. Ma, F. Juefei-Xu, M. Xue, H. Chen, Y. Liu, J. Zhao, B. Li, J. Yin, and S. See, "Deephunter: A coverage-guided fuzz testing framework for deep neural networks, " in Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis, 2019, pp. 146-157.
X. Gao, R. K. Saha, M. R. Prasad, and A. Roychoudhury, "Fuzz testing based data augmentation to improve robustness of deep neural networks, " in 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE). IEEE, 2020, pp. 1147-1158.
J. Kim, R. Feldt, and S. Yoo, "Guiding deep learning system testing using surprise adequacy, " in 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). IEEE, 2019, pp. 1039-1049.
W. Ma, M. Papadakis, A. Tsakmalis, M. Cordy, and Y. L. Traon, "Test selection for deep learning systems, " ACM Transactions on Software Engineering and Methodology (TOSEM), vol. 30, no. 2, pp. 1-22, 2021.