Data augmentation; Data quality; ML; NIDS; Benchmark networks; High detection rate; Machine-learning; Network intrusion detection systems; Set assessment; System test; Test sets; Training sets; Theoretical Computer Science; Computer Science (all)
Abstract :
[en] Research works on Network Intrusion Detection Systems (NIDSs) using Machine Learning (ML) usually reports very high detection rate, often well above 90%. However, these results typically originate from overly simplistic NIDS datasets, where the test set, often just a subset of the overall dataset, mirrors the training set distribution, failing to rigorously assess the NIDS’s robustness under more varied conditions. To address this shortcoming, we propose a method for Test sets Assessment and Targeted Augmentation (TATA). TATA is a model-agnostic approach that assesses and augments the quality of benchmark ML–based NIDS test sets. First, TATA encodes both training and test sets in a structured latent space via a contrastive autoencoder, defining three quality metrics (diversity, proximity, and scarcity) to identify test set gaps where the ML-based classification is harder. Next, TATA employs a reinforcement learning (RL) approach guided by these metrics, configuring a testbed that produces realistic data specifically targeting these gaps, creating a more robust test set. Using CIC-IDS2017 and CSE-CIC-IDS2018, we observe a positive correlation between higher metric values and increased detection difficulty, confirming their utility as meaningful indicators of test set robustness. With the same datasets, TATA’s RL-based augmentation significantly raises detection difficulty for multiple NIDS models, revealing previously overlooked weaknesses.
Disciplines :
Computer science
Author, co-author :
ANSER, Omar ; Inria, Université de Lorraine, CNRS, LORIA, Nancy, France
FRANCOIS, Jérôme ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SEDAN ; Inria, Université de Lorraine, CNRS, LORIA, Nancy, France
Chrisment, Isabelle; Inria, Université de Lorraine, CNRS, LORIA, Nancy, France
Kondo, Daishi; Information Technology Center, The University of Tokyo, Bunkyo, Japan
External co-authors :
yes
Language :
English
Title :
TATA: Benchmark NIDS Test Sets Assessment and Targeted Augmentation
Publication date :
2026
Event name :
ESORICS 2025
Event place :
Toulouse, France
Event date :
22-09-2025
Audience :
International
Main work title :
Computer Security – ESORICS 2025 - 30th European Symposium on Research in Computer Security, Proceedings
Editor :
Nicomette, Vincent
Publisher :
Springer Science and Business Media Deutschland GmbH
Catillo, M., Pecchia, A., Villano, U.: Machine learning on public intrusion datasets: academic hype or concrete advances in NIDS? In: 2023 53rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks-Supplemental Volume (DSN-S) (2023)
Clausen, H., Flood, R., Aspinall, D.: Traffic generation using containerization for machine learning. In: Proceedings of the 2019 Workshop on DYnamic and Novel Advances in Machine Learning and Intelligent Cyber Security (2022)
Doriguzzi-Corin, R., Millar, S., Scott-Hayward, S., Martínez-del Rincón, J., Sira-cusa, D.: LUCID: a practical, lightweight deep learning solution for DDoS attack detection. IEEE Trans. Netw. Serv. Manage. 17, 876–889 (2020)
Engelen, G., Rimmer, V., Joosen, W.: Troubleshooting an intrusion detection dataset: the cicids2017 case study. In: 2021 IEEE Security and Privacy Workshops (SPW) (2021)
Feng, Y., Shi, Q., Gao, X., Wan, J., Fang, C., Chen, Z.: DeepGini: prioritizing massive tests to enhance the robustness of deep neural networks. In: Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis. ISSTA 2020 (2020)
Flood, R., Aspinall, D.: Measuring the complexity of benchmark NIDS datasets via spectral analysis. In: 2024 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW) (2024)
Flood, R., Engelen, G., Aspinall, D., Desmet, L.: Bad design smells in benchmark NIDS datasets. In: 2024 IEEE 9th European Symposium on Security and Privacy (EuroS&P) (2024)
Friedman, D., Dieng, A.B.: The vendi score: A diversity evaluation metric for machine learning (2023)
Fujimoto, S., Gu, S.S.: A minimalist approach to offline reinforcement learning. In: Advances in Neural Information Processing Systems (2021)
Guo, J., Jiang, Y., Zhao, Y., Chen, Q., Sun, J.: DLFUZZ: differential fuzzing testing of deep learning systems. In: Proceedings of the 2018 ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (2018)
Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2006) (2006)
Harel-Canada, F., Wang, L., Gulzar, M.A., Gu, Q., Kim, M.: Is neuron coverage a meaningful measure for testing deep neural networks? In: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ESEC/FSE 2020 (2020)
Hu, Q., Ma, L., Xie, X., Yu, B., Liu, Y., Zhao, J.: Deepmutation++: a mutation testing framework for deep learning systems. In: 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE) (2019)
Humbatova, N., Jahangirova, G., Tonella, P.: DeepCrime: mutation testing of deep learning systems based on real faults. In: Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis. ISSTA 2021 (2021)
Jiang, X., et al.: NetDiffusion: network data augmentation through protocol-constrained traffic generation. Proc. ACM Meas. Anal. Comput. Syst. 8, 1–32 (2024)
Khosla, P., et al.: Supervised contrastive learning. In: Advances in Neural Information Processing Systems
Kim, J., Feldt, R., Yoo, S.: Guiding deep learning system testing using surprise adequacy. In: Proceedings of the 41st International Conference on Software Engineering. ICSE 2019 (2019)
Kim, J., Feldt, R., Yoo, S.: Evaluating surprise adequacy for deep learning system testing. ACM Trans. Softw. Eng. Methodol. 32, 1–29 (2023)
Kumar, A., Zhou, A., Tucker, G., Levine, S.: Conservative q-learning for offline reinforcement learning. In: Advances in Neural Information Processing Systems (2020)
Lanvin, M., Gimenez, P.F., Han, Y., Majorczyk, F., Mé, L., Totel, E.: Errors in the CICIDS2017 dataset and the significant differences in detection performances it makes. In: Risks and Security of Internet and Systems: 17th International Conference, CRiSIS 2022, Sousse, Tunisia, 7-9 December 2022, Revised Selected Papers (2023)
Liu, L., Engelen, G., Lynar, T.M., Essam, D.L., Joosen, W.: Error prevalence in NIDS datasets: a case study on CIC-IDS-2017 and CSE-CIC-IDS-2018. 2022 IEEE Conference on Communications and Network Security (CNS) (2022)
Lorena, A.C., Garcia, L.P.F., Lehmann, J., Souto, M.C.P., Ho, T.K.: How complex is your classification problem? a survey on measuring classification complexity. ACM Comput. Surv. 52, 1–34 (2019)
Ma, L., et al.: DeepCT: tomographic combinatorial testing for deep learning systems. In: 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER) (2019)
Ma, L., et al.: DeepGauge: multi-granularity testing criteria for deep learning systems. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. ASE 2018 (2018)
Manocchio, L.D., Layeghy, S., Lo, W.W., Kulatilleke, G.K., Sarhan, M., Portmann, M.: FlowTransformer: a transformer framework for flow-based network intrusion detection systems. Expert Syst. Appl. 241, 122564 (2024)
Moustafa, N., Slay, J.: UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In: 2015 Military Communications and Information Systems Conference (MilCIS)
Nie, A., Flet-Berliac, Y., Jordan, D.R., Steenbergen, W., Brunskill, E.: Data-efficient pipeline for offline reinforcement learning with limited data. In: NeurIPS (2022)
Owezarski, P.: Investigating adversarial attacks against random forest-based network attack detection systems. In: NOMS 2023-2023 IEEE/IFIP Network Operations and Management Symposium (2023)
Paine, T.L., et al.: Hyperparameter selection for offline reinforcement learning. arXiv preprint (2020)
Pei, K., Cao, Y., Yang, J., Jana, S.: DeepXplore: Automated Whitebox testing of deep learning systems. In: Proceedings of the 26th Symposium on Operating Systems Principles, pp. 1–18. ACM (2017)
Riccio, V., Humbatova, N., Jahangirova, G., Tonella, P.: DeepMetis: augmenting a deep learning test set to increase its mutation score. In: Proceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering. ASE 2021 (2022)
Riccio, V., Tonella, P.: When and why test generators for deep learning produce invalid inputs: an empirical study. 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE) (2022)
Sharafaldin, I., Lashkari, A.H., Ghorbani, A.A.: Toward generating a new intrusion detection dataset and intrusion traffic characterization. In: Proceedings of the 4th International Conference on Information Systems Security and Privacy, (ICISSP) (2018)
Shiravi, A., Shiravi, H., Tavallaee, M., Ghorbani, A.A.: Toward developing a systematic approach to generate benchmark datasets for intrusion detection. Comput. Secur. 31, 357–374 (2012)
Singla, A., Bertino, E., Verma, D.: Preparing network intrusion detection deep learning models with minimal data using adversarial domain adaptation. In: Proceedings of the 15th ACM Asia Conference on Computer and Communications Security. ASIA CCS 2020 (2020)
Sudyana, D., et al.: Quality analysis in ids dataset: impact on model generalization. In: 2024 IEEE Conference on Communications and Network Security (CNS) (2024)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction (2018)
Tufano, M., et al.: DeepMutation: a neural mutation tool. In: 2020 IEEE/ACM 42nd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion) (2020)
Wang, C., Finamore, A., Michiardi, P., Gallo, M., Rossi, D.: Data augmentation for traffic classification. In: Richter, P., Bajpai, V., Carisimo, E. (eds.) Passive and Active Measurement (2024)
Weiss, M., Chakraborty, R., Tonella, P.: A review and refinement of surprise adequacy. In: 2021 IEEE/ACM Third International Workshop on Deep Learning for Testing and Testing for Deep Learning (DeepTest) (2021)
Xie, X., et al.: DeepHunter: a coverage-guided fuzz testing framework for deep neural networks. In: Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis (2019)
Yin, Y., Lin, Z., Jin, M., Fanti, G., Sekar, V.: Practical GAN-based synthetic IP header trace generation using NetShare. In: Proceedings of the ACM SIGCOMM 2022 Conference. SIGCOMM 2022 (2022)