[en] This article presents a comprehensive survey on test optimization in deep neural network (DNN) testing. Here, test optimization refers to testing with low data labeling effort. We analyzed 90 papers, including 43 from the software engineering (SE) community, 32 from the machine learning (ML) community, and 15 from other communities. Our study: (i) unifies the problems as well as terminologies associated with low-labeling cost testing, (ii) compares the distinct focal points of SE and ML communities, and (iii) reveals the pitfalls in existing literature. Furthermore, we highlight the research opportunities in this domain.
Disciplines :
Computer science
Author, co-author :
HU, Qiang ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SerVal
Yuejun Guo; LIST - Luxembourg Institute of Science and Technology [LU]
Xiaofei Xie; Singapore Management University
CORDY, Maxime ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SerVal
Lei Ma; The university of tokyo
PAPADAKIS, Mike ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SerVal
LE TRAON, Yves ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SerVal
External co-authors :
yes
Language :
English
Title :
Test Optimization in DNN Testing: A Survey
Publication date :
2024
Journal title :
ACM Transactions on Software Engineering and Methodology
ISSN :
1049-331X
Publisher :
Association for Computing Machinery (ACM), United States
scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.
Bibliography
Zohreh Aghababaeyan, Manel Abdellatif, Lionel Briand, S. Ramesh, and Mojtaba Bagherzadeh. 2023. Black-box testing of deep neural networks through test case diversity. IEEE Transactions on Software Engineering 49, 5 (May 2023), 3182–3204. DOI:https://doi.org/10.1109/TSE.2023.3243522
Zohreh Aghababaeyan, Manel Abdellatif, Mahboubeh Dadkhah, and Lionel Briand. 2023. DeepGD: a multi-objective black-box test selection approach for deep neural networks. arXiv:2303.04878 Retrieved from https://arxiv.org/pdf/2303.04878
Jonathan Aigrain and Marcin Detyniecki. 2019. Detecting adversarial examples and other misclassifications in neural networks by introspection. arXiv:1905.09186 Retrieved from https://arxiv.org/pdf/1905.09186
Hamzah Al-Qadasi, Changshun Wu, Yliès Falcone, and Saddek Bensalem. 2022. DeepAbstraction: 2-level prioritization for unlabeled test inputs in deep neural networks. In Proceedings of the IEEE International Conference On Artificial Intelligence Testing. IEEE, Piscataway, NJ, USA, 64–71. DOI:https://doi.org/10.1109/AITest55621.2022.00018
Mohammed Attaoui, Hazem Fahmy, Fabrizio Pastore, and Lionel Briand. 2023. Black-box safety analysis and retraining of dnns based on feature extraction and clustering. ACM Transactions on Software Engineering and Methodology 32, 3 (2023), 1–40. DOI:https://doi.org/10.1145/3550271
Christina Baek, Yiding Jiang, Aditi Raghunathan, and J Zico Kolter. 2022. Agreement-on-the-line: predicting the performance of neural networks under distribution shift. Advances in Neural Information Processing Systems 35 (2022), 19274–19289.
Shenglin Bao, Chaofeng Sha, Bihuan Chen, Xin Peng, and Wenyun Zhao. 2023. In defense of simple techniques for neural network test case selection. In Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis. Association for Computing Machinery, New York, NY, USA, 501–513. DOI:https://doi.org/10. 1145/3597926.3598073
Taejoon Byun, Vaibhav Sharma, Abhishek Vijayakumar, Sanjai Rayadurgam, and Darren Cofer. 2019. Input prioritization for testing neural networks. In Proceedings of the 2019 IEEE International Conference On Artificial Intelligence Testing. IEEE, 63–70. DOI:https://doi.org/10.1109/AITest.2019.000-6
Jinyin Chen, Jie Ge, and Haibin Zheng. 2022. ActGraph: prioritization of test cases based on deep neural network activation graph. Automated Software Engineering 30, 28 (2022), 28. DOI:https://doi.org/10.1007/s10515-023-00396-8
Jiefeng Chen, Frederick Liu, Besim Avci, Xi Wu, Yingyu Liang, and Somesh Jha. 2021. Detecting errors and estimating accuracy on unlabeled data with self-training ensembles. In Advances in Neural Information Processing Systems 34 (2021), 14980–14992.
Jialuo Chen, Jingyi Wang, Xingjun Ma, Youcheng Sun, Jun Sun, Peixin Zhang, and Peng Cheng. 2022. QuoTe: quality-oriented testing for deep learning systems. ACM Transactions on Software Engineering and Methodology 32, 5, Article 125 (2022), 33 pages. DOI:https://doi.org/10.1145/3582573
Junjie Chen, Zhuo Wu, Zan Wang, Hanmo You, Lingming Zhang, and Ming Yan. 2020. Practical accuracy estimation for efficient deep neural network testing. ACM Transactions on Software Engineering and Methodology 29, 4 (2020), 1–35. DOI:https://doi.org/10.1145/3394112
Lingjiao Chen, Matei Zaharia, and James Y. Zou. 2022. Estimating and explaining model performance when both covariates and labels shift. Advances in Neural Information Processing Systems 35 (2022), 11467–11479.
Ching-Yao Chuang, Antonio Torralba, and Stefanie Jegelka. 2020. Estimating generalization under distribution shifts via domain-invariant representations. In Proceedings of the 37th International conference on machine learning (Virtual). PMLR, Brookline, MA, USA, 1984–1994. Retrieved from https://proceedings.mlr.press/v119/chuang20a/chuang20a. pdf
Jürgen Cito, Isil Dillig, Seohyun Kim, Vijayaraghavan Murali, and Satish Chandra. 2021. Explaining mispredictions of machine learning models using rule induction. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. Association for Computing Machinery, New York, NY, USA, 716–727. DOI:https://doi.org/10.1145/3468264.3468614
Ciprian A. Corneanu, Sergio Escalera, and Aleix M. Martinez. 2020. Computing the testing error without a testing set. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (Virtual). IEEE Computer Society, Los Alamitos, CA, USA, 2674–2682. DOI:https://doi.org/10.1109/CVPR42600.2020.00275
Shuhao Cui, Shuhui Wang, Junbao Zhuo, Liang Li, Qingming Huang, and Qi Tian. 2020. Towards discriminability and diversity: batch nuclear-norm maximization under label insufficient situations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, Los Alamitos, CA, USA, 3940–3949.
Xueqi Dang, Yinghua Li, Mike Papadakis, Jacques Klein, Tegawendé F. Bissyandé, and Yves L. E. Traon. 2023. GraphPrior: mutation-based test input prioritization for graph neural networks. ACM Transactions on Software Engineering and Methodology 33, 1, Article 22 (November 2023), 40 pages. DOI:https://doi.org/10.1145/3607191
Chad DeChant, Seungwook Han, and Hod Lipson. 2019. Predicting the accuracy of neural networks from final and intermediate layer outputs. In Proceedings of the ICML 2019 Workshop on Identifying and Understanding Deep Learning Phenomena. OpenReview.net, Online, 1–6. Retrieved from https://openreview.net/pdf?id=H1xXwEB2h4
Weijian Deng, Stephen Gould, and Liang Zheng. 2021. What does rotation prediction tell us about classifier accuracy under varying testing environments?. In Proceedings of the International Conference on Machine Learning (Virtual). PMLR, Brookline, MA, USA, 2579–2589. Retrieved from https://proceedings.mlr.press/v139/deng21a/deng21a.pdf
Weijian Deng, Yumin Suh, Stephen Gould, and Liang Zheng. 2023. Confidence and dispersity speak: characterising prediction matrix for unsupervised accuracy estimation. arXiv:2302.01094 Retrieved from https://arxiv.org/pdf/2302. 01094
Weijian Deng and Liang Zheng. 2021. Are labels always necessary for classifier accuracy evaluation? In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Nashville, TN, USA, 15064–15073. DOI:https://doi.org/10.1109/CVPR46437.2021.01482
Weijian Deng and Liang Zheng. 2021. Are labels always necessary for classifier accuracy evaluation?. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Nashville, TN, USA, 15064–15073. DOI:https://doi.org/10.1109/CVPR46437.2021.01482
Yao Deng, Xi Zheng, Mengshi Zhang, Guannan Lou, and Tianyi Zhang. 2022. Scenario-based test reduction and prioritization for multi-module autonomous driving systems. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. Association for Computing Machinery, New York, NY, USA, 82–93. DOI:https://doi.org/10.1145/3540250.3549152
Hady Elsahar and Matthias Gallé. 2019. To annotate or not? predicting performance drop under domain shift. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Association for Computational Linguistics, Hong Kong, China, 2163–2173. DOI:https://doi.org/10.18653/v1/D19-1222
Yang Feng, Qingkai Shi, Xinyu Gao, Jun Wan, Chunrong Fang, and Zhenyu Chen. 2020. Deepgini: prioritizing massive tests to enhance the robustness of deep neural networks. In Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis. Association for Computing Machinery, New York, NY, USA, 177–188. DOI:https://doi.org/10.1145/3395363.3397357
Agency for Healthcare Research & Quality. 2017. MEPS HC-181: 2015 full year consolidated data file.
Xinyu Gao, Yang Feng, Yining Yin, Zixi Liu, Zhenyu Chen, and Baowen Xu. 2022. Adaptive test selection for deep neural networks. In Proceedings of the 44th International Conference on Software Engineering. Association for Computing Machinery, New York, NY, USA, 73–85. DOI:https://doi.org/10.1145/3510003.3510232
Saurabh Garg, Sivaraman Balakrishnan, Zachary Chase Lipton, Behnam Neyshabur, and Hanie Sedghi. 2021. Leveraging unlabeled data to predict out-of-distribution performance. In Proceedings of the NeurIPS 2021 Workshop on Distribution Shifts: Connecting Methods and Applications (Virtual). OpenReview.net, Online, 1–30.
Salah Ghamizi, Maxime Cordy, Martin Gubri, Mike Papadakis, Andrey Boystov, Yves Le Traon, and Anne Goujon. 2020. Search-based adversarial testing and improvement of constrained credit scoring systems. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. Association for Computing Machinery, New York, NY, USA, 1089–1100. DOI:https://doi.org/10.1145/ 3368089.3409739
GitHub, OpenAI. 2022. Project site of GitHub Copilot. Retrieved from https://github.com/features/copilot Accessed on January 23rd, 2024.
Federica Granese, Marco Romanelli, Daniele Gorla, Catuscia Palamidessi, and Pablo Piantanida. 2021. Doctor: a simple method for detecting misclassification errors. In Advances in Neural Information Processing Systems (NeurIPS’21). 34 (2021), 5669–5681.
Jiuxiang Gu, Zhenhua Wang, Jason Kuen, Lianyang Ma, Amir Shahroudy, Bing Shuai, Ting Liu, Xingxing Wang, Gang Wang, Jianfei Cai, et al. 2018. Recent advances in convolutional neural networks. Pattern recognition 77 (2018), 354–377. DOI:https://doi.org/10.1016/j.patcog.2017.10.013
Licong Guan and Xue Yuan. 2023. Instance segmentation model evaluation and rapid deployment for autonomous driving using domain differences. IEEE Transactions on Intelligent Transportation Systems 24, 4 (April 2023), 4050–4059. DOI:https://doi.org/10.1109/TITS.2023.3236626
Antonio Guerriero, Roberto Pietrantuono, and Stefano Russo. 2021. Operation is the hardest teacher: estimating DNN accuracy looking for mispredictions. In Proceedings of the 2021 IEEE/ACM 43rd International Conference on Software Engineering . IEEE Press, Madrid, Spain, 348–358. DOI:https://doi.org/10.1109/ICSE43902.2021.00042
Devin Guillory, Vaishaal Shankar, Sayna Ebrahimi, Trevor Darrell, and Ludwig Schmidt. 2021. Predicting with confidence on unseen distributions. In Proceedings of the IEEE/CVF international conference on computer vision. IEEE, Piscataway, NJ, USA, 1134–1144. DOI:https://doi.org/10.1109/ICCV48922.2021.00117
Yuejun Guo, Qiang Hu, Maxime Cordy, Michail Papadakis, and Yves Le Traon. 2023. DRE: density-based data selection with entropy for adversarial-robust deep learning models. Neural Computing and Applications 35, 5 (October 2023), 4009–4026. DOI:https://doi.org/10.1007/s00521-022-07812-2
Yao Hao, Zhiqiu Huang, Hongjing Guo, and Guohua Shen. 2023. Test input selection for deep neural network enhancement based on multiple-objective optimization. In Proceedings of the IEEE International Conference on Software Analysis, Evolution and Reengineering. IEEE Computer Society, Los Alamitos, CA, USA, 534–545. DOI:https: //doi.org/10.1109/SANER56733.2023.00056
Changtian He, Qing Sun, Ji Wu, Haiyan Yang, and Tao Yue. 2022. Feature difference based misclassified sample detection for CNN models deployed in online environment. In Proceedings of the IEEE 22nd International Conference on Software Quality, Reliability, and Security Companion. IEEE, Piscataway, NJ, USA, 768–769. DOI:https://doi.org/10. 1109/QRS-C57518.2022.00126
Dan Hendrycks and Thomas Dietterich. 2019. Benchmarking neural network robustness to common corruptions and perturbations. In Proceedings of the International Conference on Learning Representations. OpenReview.net, Online, 1–16.
Dan Hendrycks and Kevin Gimpel. 2017. A baseline for detecting misclassified and out-of-distribution dxamples in neural networks. In Proceedings of the International Conference on Learning Representations. OpenReview.net, Online, 1–12.
Neil Houlsby, Ferenc Huszár, Zoubin Ghahramani, and Máté Lengyel. 2011. Bayesian active learning for classification and preference learning. arXiv:1112.5745. Retrieved from https://arxiv.org/pdf/1112.5745
Guosheng Hu, Yongxin Yang, Dong Yi, Josef Kittler, William Christmas, Stan Z. Li, and Timothy Hospedales. 2015. When face recognition meets with deep learning: an evaluation of convolutional neural networks for face recognition. In Proceedings of the IEEE International Conference on Computer Vision Workshops. IEEE Computer Society, Los Alamitos, CA, USA, 384–392. DOI:https://doi.org/10.1109/ICCVW.2015.58
Qiang Hu, Yuejun Guo, Maxime Cordy, Xiaofei Xie, Lei Ma, Mike Papadakis, and Yves Le Traon. 2022. An empirical study on data distribution-aware test selection for deep learning enhancement. ACM Transactions on Software Engineering and Methodology 31, 4 (2022), 1–30. DOI:https://doi.org/10.1145/3511598
Qiang Hu, Yuejun Guo, Maxime Cordy, Xiaofei Xie, Wei Ma, Mike Papadakis, and Yves Le Traon. 2021. Towards exploring the limitations of active learning: an empirical study. In Proceedings of the 2021 36th IEEE/ACM International Conference on Automated Software Engineering. IEEE, Piscataway, NJ, USA, 917–929. DOI:https://doi.org/10.1109/ ASE51524.2021.9678672
Qiang Hu, Yuejun Guo, Xiaofei Xie, Maxime Cordy, Wei Ma, Mike Papadakis, and Yves Le Traon. 2023. Evaluating the robustness of test selection methods for deep neural networks. arXiv:2308.01314. Retrieved from https://arxiv.org/pdf/2308.01314
Qiang Hu, Yuejun Guo, Xiaofei Xie, Maxime Cordy, Mike Papadakis, and Yves Le Traon. 2023. LaF: labeling-free model selection for automated deep neural network reusing. ACM Transactions on Software Engineering and Methodology 33, 1, Article 25 (November 2023), 28 pages. DOI:https://doi.org/10.1145/3611666
Qiang Hu, Yuejun Guo, Xiaofei Xie, Maxime Cordy, Mike Papadakis, Lei Ma, and Yves Le Traon. 2023. Aries: efficient testing of deep neural networks via labeling-free accuracy estimation. In Proceedings of the 45th International Conference on Software Engineering. IEEE Press, Piscataway, NJ, USA, 1776–1787. DOI:https://doi.org/10.1109/ICSE48619. 2023.00152
Xiaowei Huang, Daniel Kroening, Wenjie Ruan, James Sharp, Youcheng Sun, Emese Thamo, Min Wu, and Xinping Yi. 2020. A survey of safety and trustworthiness of deep neural networks: Verification, testing, adversarial attack and defence, and interpretability. Computer Science Review 37 (2020), 100270. DOI:https://doi.org/10.1016/j.cosrev.2020. 100270
Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems 20, 4 (2002), 422–446. DOI:https://doi.org/10.1145/582415.582418
Heinrich Jiang, Been Kim, Melody Guan, and Maya Gupta. 2018. To trust or not to trust a classifier. In Proceedings of the 32nd International Conference on Neural Information Processing Systems (Vol. 31). Curran Associates Inc., Red Hook, NY, USA, 5546–5557.
Yiding Jiang, Dilip Krishnan, Hossein Mobahi, and Samy Bengio. 2019. Predicting the generalization gap in deep networks with margin distributions. arXiv:1810.00113. Retrieved from https://arxiv.org/pdf/1810.00113
Yiding Jiang, Vaishnavh Nagarajan, Christina Baek, and J Zico Kolter. 2022. Assessing generalization of SGD via disagreement. arXiv:2106.13799. Retrieved from https://arxiv.org/pdf/2106.13799
Jinhan Kim, Robert Feldt, and Shin Yoo. 2019. Guiding deep learning system testing using surprise adequacy. In Proceedings of the 41st International Conference on Software Engineering. IEEE Press, Montreal, Quebec, Canada, 1039âĂŞ1049. DOI:https://doi.org/10.1109/ICSE.2019.00108
Jinhan Kim, Robert Feldt, and Shin Yoo. 2023. Evaluating surprise adequacy for deep learning system testing. ACM Transactions on Software Engineering and Methodology 32, 2, Article 42 (March 2023), 29 pages. DOI:https://doi.org/ 10.1145/3546947
Jinhan Kim, Jeongil Ju, Robert Feldt, and Shin Yoo. 2020. Reducing dnn labelling cost using surprise adequacy: An industrial case study for autonomous driving. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. Association for Computing Machinery, New York, NY, USA, 1466âĂŞ1476. DOI:https://doi.org/10.1145/3368089.3417065
Denis Kleyko, Antonello Rosato, Edward Paxon Frady, Massimo Panella, and Friedrich T. Sommer. 2023. Perceptron theory can predict the accuracy of neural networks. IEEE Transactions on Neural Networks and Learning Systems (Early Access) (2023), 1–15. DOI:https://doi.org/10.1109/TNNLS.2023.3237381
Pang Wei Koh, Shiori Sagawa, Henrik Marklund, Sang Michael Xie, Marvin Zhang, Akshay Balsubramani, Weihua Hu, Michihiro Yasunaga, Richard Lanas Phillips, Irena Gao, Tony Lee, Etienne David, Ian Stavness, Wei Guo, Berton Earnshaw, Imran Haque, Sara M. Beery, Jure Leskovec, Anshul Kundaje, Emma Pierson, Sergey Levine, Chelsea Finn, and Percy Liang. 2021. WILDS: a benchmark of in-the-wild distribution shifts. In Proceedings of the 38th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 139), Marina Meila and Tong Zhang (Eds.). PMLR, Brookline, MA, USA, 5637–5664. Retrieved from https://proceedings.mlr.press/v139/koh21a.html
Ron Kohavi et al. 1995. A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of the 14th International Joint Conference on Artificial Intelligence - Volume 2. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1137–1143.
Jannik Kossen, Sebastian Farquhar, Yarin Gal, and Tom Rainforth. 2021. Active testing: Sample-efficient model evaluation. In Proceedings of the International Conference on Machine Learning. PMLR, Brookline, MA, USA, 5753–5763.
Jannik Kossen, Sebastian Farquhar, Yarin Gal, and Thomas Rainforth. 2022. Active surrogate estimators: An active learning approach to label-efficient model evaluation. In Advances in Neural Information Processing Systems (NeurIPS’22). 35 (2022), 24557–24570.
Kimin Lee, Honglak Lee, Kibok Lee, and Jinwoo Shin. 2018. Training confidence-calibrated classifiers for detecting out-of-distribution samples. arXiv:1711.09325. Retrieved from https://arxiv.org/pdf/1711.09325
Young-Woo Lee and Heung-Seok Chae. 2023. Selection of test samples to improve DNN test efficiency based on neuron clusters. Retrieved from https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4399496 Accessed on January 23rd, 2024.
Yu Li, Muxi Chen, Yannan Liu, Daojing He, and Qiang Xu. 2022. An empirical study on the efficacy of deep active learning for image classification. arXiv:2212.03088. Retrieved from https://arxiv.org/pdf/2212.03088
Yu Li, Muxi Chen, and Qiang Xu. 2022. HybridRepair: towards annotation-efficient repair for deep learning models. In Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis. Association for Computing Machinery, New York, NY, USA, 227–238. DOI:https://doi.org/10.1145/3533767.3534408
Yu Li, Min Li, Qiuxia Lai, Yannan Liu, and Qiang Xu. 2021. Testrank: bringing order into unlabeled test instances for deep learning tasks. In Proceedings of the Advances in Neural Information Processing Systems - Volume 34. 20874–20886. Retrieved from https://proceedings.neurips.cc/paper/2021/hash/ae78510109d46b0a6eef9820a4ca95d6-Abstract.html
Yuechen Li, Hanyu Pei, Linzhi Huang, and Beibei Yin. 2022. A distance-based dynamic random testing strategy for natural language processing DNN models. In Proceedings of the 2022 IEEE 22nd International Conference on Software Quality, Reliability and Security. IEEE, Piscataway, NJ, USA, 842–853. DOI:https://doi.org/10.1109/QRS57517.2022. 00089
Zeju Li, Konstantinos Kamnitsas, Mobarakol Islam, Chen Chen, and Ben Glocker. 2022. Estimating model performance under domain shifts with class-specific confidence scores. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, Berlin, Germany, 693–703.
Zenan Li, Xiaoxing Ma, Chang Xu, Chun Cao, Jingwei Xu, and Jian Lü. 2019. Boosting operational DNN testing efficiency through conditioning. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. Association for Computing Machinery, New York, NY, USA, 499–509. DOI:https://doi.org/10.1145/3338906.3338930
Zongjie Li, Chaozheng Wang, Zhibo Liu, Haoxuan Wang, Dong Chen, Shuai Wang, and Cuiyun Gao. 2023. Cctest: testing and repairing code completion systems. In Proceedings of the 45th International Conference on Software Engineering. IEEE Press, Piscataway, NJ, USA, 1238–1250. DOI:https://doi.org/10.1109/ICSE48619.2023.00110
Zixi Liu, Yang Feng, Yining Yin, and Zhenyu Chen. 2022. DeepState: selecting test suites to enhance the robustness of recurrent neural networks. In Proceedings of the 44th International Conference on Software Engineering. Association for Computing Machinery, New York, NY, USA, 598–609. DOI:https://doi.org/10.1145/3510003.3510231
Yuzhe Lu, Zhenlin Wang, Runtian Zhai, Soheil Kolouri, Joseph Campbell, and Katia P. Sycara. 2023. Predicting out-of-distribution error with confidence optimal transport. In ICLR 2023 Workshop on Pitfalls of Limited Data and Computation for Trustworthy ML (Kigali, Rwanda). OpenReview.net, Online, 1–8.
Julia Lust and Alexandru P. Condurache. 2022. Efficient detection of adversarial, out-of-distribution and other misclassified samples. Neurocomputing 470 (2022), 335–343. DOI:https://doi.org/10.1007/978-3-031-16449-1_66
Lei Ma, Felix Juefei-Xu, Minhui Xue, Qiang Hu, Sen Chen, Bo Li, Yang Liu, Jianjun Zhao, Jianxiong Yin, and Simon See. 2018. Secure deep learning engineering: A software quality assurance perspective. arXiv:1810.04538. Retrieved from https://arxiv.org/pdf/1810.04538
Shiqing Ma, Yingqi Liu, Wen-Chuan Lee, Xiangyu Zhang, and Ananth Grama. 2018. MODE: automated neural network model debugging via state differential analysis and input selection. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. Association for Computing Machinery, New York, NY, USA, 175–186. DOI:https://doi.org/10.1145/3236024.3236082
Wei Ma, Mike Papadakis, Anestis Tsakmalis, Maxime Cordy, and Yves Le Traon. 2021. Test selection for deep learning systems. ACM Transactions on Software Engineering and Methodology 30, 2 (2021), 1–22. DOI:https://doi.org/10.1145/ 3417330
Yu-Seung Ma, Shin Yoo, and Taeho Kim. 2021. Selecting test inputs for DNNs using differential testing with subspecialized model instances. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. Association for Computing Machinery, New York, NY, USA, 1467–1470. DOI:https://doi.org/10.1145/3468264.3473131
Omid Madani, David Pennock, and Gary Flake. 2004. Co-validation: using model disagreement on unlabeled data to validate classification algorithms. In Proceedings of the Advances in Neural Information Processing Systems, Vol. 17. MIT Press, Vancouver, British Columbia, Canada, 1–8.
Andrey Malinin and Mark Gales. 2018. Predictive uncertainty estimation via prior networks. In Proceedings of the 32nd International Conference on Neural Information Processing Systems. Curran Associates Inc., Red Hook, NY, USA, 7047–7058.
Ana I. Maqueda, Antonio Loquercio, Guillermo Gallego, Narciso García, and Davide Scaramuzza. 2018. Event-based vision meets deep learning on steering prediction for self-driving cars. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, Los Alamitos, CA, USA, 5419–5427. DOI:https: //doi.org/10.1109/CVPR.2018.00568
Charles H. Martin, Tongsu Peng, and Michael W. Mahoney. 2021. Predicting trends in the quality of state-of-the-art neural networks without access to training or testing data. Nature Communications 12, 1 (2021), 4122. DOI:https: //doi.org/10.1038/s41467-021-24025-8
Satoshi Masuda, Kohichi Ono, Toshiaki Yasue, and Nobuhiro Hosokawa. 2018. A survey of software quality for machine learning applications. In Proceedings of the 2018 IEEE International Conference on Software Testing, Verification and Validation Workshops. IEEE, Piscataway, NJ, USA, 279–284. DOI:https://doi.org/10.1109/ICSTW.2018.00061
Larry Medsker and Lakhmi C. Jain. 1999. Recurrent Neural Networks: Design and Applications (1st ed.). CRC Press, Inc., USA.
Linghan Meng, Yanhui Li, Lin Chen, Zhi Wang, Di Wu, Yuming Zhou, and Baowen Xu. 2021. Measuring discrimination to boost comparative testing for multiple deep learning models. In Proceedings of the IEEE/ACM 43rd International Conference on Software Engineering. IEEE, Piscataway, NJ, USA, 385–396. DOI:https://doi.org/10.1109/ ICSE43902.2021.00045
Vasilii Mosin, Miroslaw Staron, Darko Durisic, Francisco Gomes de Oliveira Neto, Sushant Kumar Pandey, and Ashok Chaitanya Koppisetty. 2022. Comparing input prioritization techniques for testing deep learning algorithms. In Proceedings of the 48th Euromicro Conference on Software Engineering and Advanced Applications. IEEE Computer Society, Los Alamitos, CA, USA, 76–83. DOI:https://doi.org/10.1109/SEAA56994.2022.00020
Zhonghao Pan, Shan Zhou, Jianmin Wang, Jinbo Wang, Jiao Jia, and Yang Feng. 2022. Test case prioritization for deep neural networks. In Proceedings of the 9th International Conference on Dependable Systems and Their Applications . IEEE, Piscataway, NJ, USA, 624–628. DOI:https://doi.org/10.1109/DSA56465.2022.00089
Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana. 2019. Deepxplore: automated whitebox testing of deep learning systems., 9 pages. DOI:https://doi.org/10.1145/3361566
Ruchir Puri, David S. Kung, Geert Janssen, Wei Zhang, Giacomo Domeniconi, Vladimir Zolotov, Julian Dolby, Jie Chen, Mihir Choudhury, Lindsey Decker, et al. 2021. Codenet: a large-scale ai for code dataset for learning a diversity of coding tasks. arXiv:2105.12655. Retrieved from https://arxiv.org/pdf/2105.12655
Xin Qiu and Risto Miikkulainen. 2022. Detecting misclassification errors in neural networks with a gaussian process model. In Proceedings of the AAAI Conference on Artificial Intelligence. Cambridge University Press, Cambridge, UK, 8017–8027.
Vincenzo Riccio, Gunel Jahangirova, Andrea Stocco, Nargiz Humbatova, Michael Weiss, and Paolo Tonella. 2020. Testing machine learning based systems: a systematic mapping. Empirical Software Engineering 25 (2020), 5193–5254. DOI:https://doi.org/10.1007/s10664-020-09881-0
Gregg Rothermel and Mary Jean Harrold. 1997. A safe, efficient regression test selection technique. ACM Transactions on Software Engineering and Methodology 6, 2 (1997), 173–210. DOI:https://doi.org/10.1145/248233.248262
Peter J. Rousseeuw. 1987. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics 20 (1987), 53–65. DOI:https://doi.org/10.1016/0377-0427(87)90125-7
Murat Sensoy, Maryam Saleki, Simon Julier, Reyhan Aydogan, and John Reid. 2021. Misclassification risk and uncertainty quantification in deep classifiers. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. IEEE Computer Society, Los Alamitos, CA, USA, 2483–2491. DOI:https://doi.org/10.1109/WACV48630.2021. 00253
Weijun Shen, Yanhui Li, Lin Chen, Yuanlei Han, Yuming Zhou, and Baowen Xu. 2021. Multiple-boundary clustering and prioritization to promote neural network retraining. In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering. Association for Computing Machinery, New York, NY, USA, 410–422. DOI:https: //doi.org/10.1145/3324884.3416621
Ying Shi, Beibei Yin, Zheng Zheng, and Tiancheng Li. 2021. An empirical study on test case prioritization metrics for deep neural networks. In Proceedings of the 2021 IEEE 21st International Conference on Software Quality, Reliability and Security. IEEE, Piscataway, NJ, USA, 157–166. DOI:https://doi.org/10.1109/QRS54544.2021.00027
Thibault Simonetto, Salijona Dyrmishi, Salah Ghamizi, Maxime Cordy, and Yves Le Traon. 2022. A unified framework for adversarial attack and defense in constrained feature space. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence (Mess Wien, Vienna, Austria). International Joint Conferences on Artificial Intelligence Organization, 1313–1319. DOI:https://doi.org/10.24963/ijcai.2022/183
Andrea Stocco, Michael Weiss, Marco Calzana, and Paolo Tonella. 2020. Misbehaviour prediction for autonomous driving systems. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering. Association for Computing Machinery, New York, NY, USA, 359–371. DOI:https://doi.org/10.1145/3377811.3380353
Xiaoxiao Sun, Yunzhong Hou, Weijian Deng, Hongdong Li, and Liang Zheng. 2021. Ranking models in unlabeled new environments. In Proceedings of the IEEE/CVF International Conference on Computer Vision. IEEE Computer Society, Los Alamitos, CA, USA, 11741–11751. DOI:https://doi.org/10.1109/ICCV48922.2021.01155
Xiaoxiao Sun, Yunzhong Hou, Hongdong Li, and Liang Zheng. 2021. Label-free model evaluation with semi-structured dataset representations. arXiv:2112.00694. Retrieved from https://arxiv.org/pdf/2112.00694
Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Piscataway, NJ, USA, 2818–2826. DOI:https://doi.org/10.1109/CVPR.2016.308
Yali Tao, Chuanqi Tao, Hongjing Guo, and Bohan Li. 2022. TPFL: test input prioritization for deep neural networks based on fault localization. In Proceedings of the International Conference on Advanced Data Mining and Applications. Springer, Berlin, Germany, 368–383. DOI:https://doi.org/10.1007/978-3-031-22064-7_27
Thomas Unterthiner, Daniel Keysers, Sylvain Gelly, Olivier Bousquet, and Ilya Tolstikhin. 2021. Predicting neural network accuracy from weights. arXiv:2002.11448. Retrieved from https://arxiv.org/pdf/2002.11448
Artem Vazhentsev, Gleb Kuzmin, Artem Shelmanov, Akim Tsvigun, Evgenii Tsymbalov, Kirill Fedyanin, Maxim Panov, Alexander Panchenko, Gleb Gusev, Mikhail Burtsev, Manvel Avetisian, and Leonid Zhukov. 2022. Uncertainty estimation of transformer predictions for misclassification detection. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Dublin, Ireland, 8237–8252. DOI:https://doi.org/10.18653/v1/2022.acl-long.566
Huiyan Wang, Jingwei Xu, Chang Xu, Xiaoxing Ma, and Jian Lu. 2020. Dissector: input validation for deep learning applications by crossing-layer dissection. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering. Association for Computing Machinery, New York, NY, USA, 727–738. DOI:https://doi.org/10.1145/ 3377811.3380379
Jingyi Wang, Jialuo Chen, Youcheng Sun, Xingjun Ma, Dongxia Wang, Jun Sun, and Peng Cheng. 2021. RobOT: robustness-oriented testing for deep learning systems. In Proceedings of the 2021 IEEE/ACM 43rd International Conference on Software Engineering. IEEE, Piscataway, NJ, USA, 300–311. DOI:https://doi.org/10.1109/ICSE43902.2021.00038
Zhiyu Wang, Sihan Xu, Xiangrui Cai, and Hua Ji. 2020. Test input selection for deep neural networks. Journal of Physics: Conference Series 1693, 1 (2020), 012017.
Zan Wang, Hanmo You, Junjie Chen, Yingyi Zhang, Xuyuan Dong, and Wenbin Zhang. 2021. Prioritizing test inputs for deep neural networks via mutation analysis. In Proceedings of the 2021 IEEE/ACM 43rd International Conference on Software Engineering. IEEE, Piscataway, NJ, USA, 397–409. DOI:https://doi.org/10.1109/ICSE43902.2021.00046
Zhengyuan Wei, Haipeng Wang, Imran Ashraf, and W. K. Chan. 2022. Predictive mutation analysis of test case prioritization for deep neural networks. In Proceedings of the IEEE 22nd International Conference on Software Quality, Reliability and Security. IEEE, Piscataway, NJ, USA, 682–693. DOI:https://doi.org/10.1109/QRS57517.2022.00074
Sanford Weisberg. 2005. Applied Linear Regression. Vol. 528. John Wiley & Sons, Hoboken, NJ, USA. DOI:https://doi. org/10.1002/0471704091
Michael Weiss, Rwiddhi Chakraborty, and Paolo Tonella. 2021. A review and refinement of surprise adequacy. In Proceedings of the 2021 IEEE/ACM 3rd International Workshop on Deep Learning for Testing and Testing for Deep Learning. IEEE, Piscataway, NJ, USA, 17–24. DOI:https://doi.org/10.1109/DeepTest52559.2021.00009
Michael Weiss and Paolo Tonella. 2022. Simple techniques work surprisingly well for neural network test prioritization and active learning (replicability study). In Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis. Association for Computing Machinery, New York, NY, USA, 139–150. DOI:https://doi.org/10.1145/3533767.3534375
Xiaoxue Wu, Jinjin Shen, Wei Zheng, Lidan Lin, Yulei Sui, and Abubakar Omari Abdallah Semasaba. 2024. Rnntcs: a test case selection method for recurrent neural networks. Knowledge-Based Systems 279, C (2024), 15 pages. DOI:https://doi.org/10.1016/j.knosys.2023.110955
Renchunzi Xie, Hongxin Wei, Yuzhou Cao, Lei Feng, and Bo An. 2023. On the importance of feature separability in predicting out-of-distribution error. In Proceedings of the 37th Conference on Neural Information Processing Systems. OpenReview.net, Online, 1–18.
Xiaofei Xie, Lei Ma, Felix Juefei-Xu, Minhui Xue, Hongxu Chen, Yang Liu, Jianjun Zhao, Bo Li, Jianxiong Yin, and Simon See. 2019. Deephunter: a coverage-guided fuzz testing framework for deep neural networks. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis. Association for Computing Machinery, New York, NY, USA, 146–157. DOI:https://doi.org/10.1145/3293882.3330579
Xiaoyuan Xie, Pengbo Yin, and Songqiang Chen. 2022. Boosting the revealing of detected violations in deep learning testing: a diversity-guided method. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering. Association for Computing Machinery, New York, NY, USA, Article 17, 13 pages. DOI:https://doi.org/10.1145/3551349.3556919
Rongjie Yan, Yuhang Chen, Hongyu Gao, and Jun Yan. 2022. Test case prioritization with neuron valuation based pattern. Science of Computer Programming 215, C (March 2022), 102761. DOI:https://doi.org/10.1016/j.scico.2021.102761
Zhou Yang, Jieke Shi, Muhammad Hilmi Asyrofi, Bowen Xu, Xin Zhou, DongGyun Han, and David Lo. 2023. Prioritizing speech test cases. arXiv:2302.00330. Retrieved from https://arxiv.org/pdf/2302.00330
Yaodong Yu, Zitong Yang, Alexander Wei, Yi Ma, and Jacob Steinhardt. 2022. Predicting out-of-distribution error with the projection norm. In Proceedings of the International Conference on Machine Learning. PMLR, Brookline, MA, USA, 25721–25746.
Zhenlong Yuan, Yongqiang Lu, Zhaoguo Wang, and Yibo Xue. 2014. Droid-sec: deep learning in android malware detection. In Proceedings of the 2014 ACM conference on SIGCOMM. Association for Computing Machinery, New York, NY, USA, 371–372. DOI:https://doi.org/10.1145/2740070.2631434
Jie M. Zhang, Mark Harman, Lei Ma, and Yang Liu. 2022. Machine learning testing: survey, landscapes and horizons. IEEE Transactions on Software Engineering 48, 1 (2022), 1–36. DOI:https://doi.org/10.1109/TSE.2019.2962027
Chunyu Zhao, Yanzhou Mu, Xiang Chen, Jingke Zhao, Xiaolin Ju, and Gan Wang. 2022. Can test input selection methods for deep neural network guarantee test diversity? A large-scale empirical study. Information and Software Technology 150, C (2022), 12 pages. DOI:https://doi.org/10.1016/j.infsof.2022.106982
Haibin Zheng, Jinyin Chen, and Haibo Jin. 2023. CertPri: certifiable prioritization for deep neural networks via movement cost in feature space. In Proceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering. IEEE Computer Society, Los Alamitos, CA, USA, 1–13. DOI:https://doi.org/10.1109/ASE56229.2023.00126
Yan Zheng, Xiaofei Xie, Ting Su, Lei Ma, Jianye Hao, Zhaopeng Meng, Yang Liu, Ruimin Shen, Yingfeng Chen, and Changjie Fan. 2019. Wuji: automatic online combat game testing using evolutionary deep reinforcement learning. In Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering. IEEE, Piscataway, NJ, USA, 772–784. DOI:https://doi.org/10.1109/ASE.2019.00077
Jianyi Zhou, Feng Li, Jinhao Dong, Hongyu Zhang, and Dan Hao. 2020. Cost-effective testing of a deep learning model through input reduction. In Proceedings of the IEEE 31st International Symposium on Software Reliability Engineering. IEEE Computer Society, Los Alamitos, CA, USA, 289–300. DOI:https://doi.org/10.1109/ISSRE5003.2020.00035
Fei Zhu, Zhen Cheng, Xu-Yao Zhang, and Cheng-Lin Liu. 2022. Rethinking confidence calibration for failure prediction. In Proceedings of the Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23âĂŞ27, 2022, Proceedings, Part XXV. Springer, Berlin, Germany, 518–536. Retrieved from https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136850512.pdf
Fei Zhu, Zhen Cheng, Xu-Yao Zhang, and Cheng-Lin Liu. 2023. OpenMix: exploring outlier samples for misclassification detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, Los Alamitos, CA, USA, 12074–12083. DOI:https://doi.org/10.1109/CVPR52729.2023.01162
Fangzhe Zhu, Ye Zhao, Zhengqiong Liu, and Xueliang Liu. 2023. Label-free model evaluation with out-of-distribution detection. Applied Sciences 13, 8 (2023), 5056. DOI:https://doi.org/10.3390/app13085056
Similar publications
Sorry the service is unavailable at the moment. Please try again later.