artificial intelligence; automated cyber defense; dataset; large language models; Locked Shields; Automated cybe defense; Cyber defense exercise; Cyber-defense; Dataset; Fully automated; Language model; Large language model; Locked shield; Network traffic; Power; Computer Networks and Communications
Abstract :
[en] In 2021, driven by the ongoing advancements in artificial intelligence (AI) and automation, previous works [1], [2] introduced architectures for fully automated blue teams in cyber defense exercises such as Locked Shields (LS). Since then, technological and scientific progress has further accelerated. In particular, the rapid evolution of generative AI through large language models (LLMs) has significantly enhanced the capabilities of cybersecurity automation. This paper reviews how cyber blue team automation can benefit from these recent advances, with a focus on how generative AI and LLMs are reshaping automation strategies for defending complex cyber infrastructure. Using the LS exercise as a case study, we discuss how generative AI-based automation can address the growing complexity of cyber threats. Our paper presents promising directions on how generative AI can enhance fully automated blue teams, and it addresses a major research gap - the lack of high-quality datasets for training and evaluation in this field. To address this challenge, we introduce a novel dataset containing labeled network traffic and end-host logs, collected during the 'partners' run' preceding LS 2024. This dataset is derived from over 400 GB of captured network traffic and more than 6 million log entries. It captures real-world red team behavior and is made publicly available to foster research and AI development in the field of blue team automation. We conclude with future research challenges in automated cyber defense.
Disciplines :
Computer science
Author, co-author :
Dijk, Allard; Netherlands Defence Academy, Den Helder, Netherlands
Melella, Cosimo; NATO Cooperative Cyber Defence, Centre of Excellence, Tallinn, Estonia
Pihelgas, Mauno; Tallinn University of Technology, Tallinn, Estonia
Vaarandi, Risto; Tallinn University of Technology, Tallinn, Estonia
LENDERS, Vincent ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > Systems and Network Security Group (SNS) ; Cyber-Defence Campus armasuisse, Thun, Switzerland
External co-authors :
yes
Language :
English
Title :
Next Steps in Cyber Blue Team Automation - Leveraging the Power of LLMs
Publication date :
May 2025
Event name :
2025 17th International Conference on Cyber Conflict: The Next Step (CyCon)
Event place :
Tallinn, Est
Event date :
27-05-2025 => 30-05-2025
Audience :
International
Main work title :
2025 17th International Conference on Cyber Conflict: The Next Step, CyCon 2025
R. Meier, A. Lavrenovs, K. Heinäaro, L. Gambazzi, and V. Lenders, “Towards an AI-powered player in cyber defence exercises,” in Proc. 13th Int. Conf. Cyber Conflict (CyCon), Tallinn, Estonia, May 2021.
A. Kott, Ed., Autonomous Intelligent Cyber Defense Agent (AICA): A Comprehensive Guide,, Advances in Information Security, vol. 87. Cham: Springer, 2023.
I. H. Sarker, M. H. Furhad, and R. Nowrozy, “AI-driven cybersecurity: An overview, security intelligence modeling and research directions,” SN Comput. Sci., vol. 2, Mar. 2021.
S. Lysenko, “The role of artificial intelligence in cybersecurity: Automation of protection and detection of threats,” Econ. Aff., vol. 69, Feb. 2024.
L. Alevizos, “Automated cybersecurity compliance and threat response using AI, blockchain and smart contracts,” Int. J. Inf. Technol., Dec. 2024.
L. Gehri, R. Meier, D. Hulliger, and V. Lenders, “Towards generalizing machine learning models to detect command and control attack traffic,” in Proc. 15th Int. Conf. Cyber Conflict: Meeting Reality (CyCon), Tallinn, Estonia, May 2023.
Z. Zhang et al., “Artificial intelligence in cyber security: Research advances, challenges, and opportunities,” Artif. Intell. Rev., vol. 55, Feb. 2022.
NATO Cooperative Cyber Defence Centre of Excellence, “NATO Locked Shields.” Accessed: Jan. 4, 2025. [Online]. Available: https://ccdcoe.org/exercises/locked-shields/
A. Dijk, E. Halisdemir, C. Melella, A. Schu, M. Pihelgas, and R. Meier, “LSPR23: A novel IDS dataset from the largest live-fire cybersecurity exercise,” J. Inf. Secur. Appl., vol. 85, Sep. 2024.
S. S. Sengar, A. B. Hasan, S. Kumar, and F. Carroll, “Generative artificial intelligence: A systematic review and applications,” Multimed. Tools Appl., Aug. 2024.
Z. B. Akhtar, “Unveiling the evolution of generative AI (GAI): A comprehensive and investigative analysis toward LLM models (2021–2024) and beyond,” J. Electr. Syst. Inf. Technol., vol. 11, Jun. 2024.
E. Karlsen, X. Luo, N. Zincir-Heywood, and M. Heywood, “Benchmarking large language models for log analysis, security, and interpretation,” J. Netw. Syst. Manag., vol. 32, Jul. 2024.
A. Bessey et al., “A few billion lines of code later: Using static analysis to find bugs in the real world,” Commun. ACM, vol. 53, Feb. 2010.
J. Newsome and D. X. Song, “Dynamic taint analysis for automatic detection, analysis, and signature generation of exploits on commodity software,” in Proc. Netw. Distrib. Syst. Secur. Symp. (NDSS), 2005.
Z. Sheng, F. Wu, X. Zuo, C. Li, Y. Qiao, and L. Hang, “LProtector: An LLM-driven vulnerability detection system,” Nov. 14, 2024, arXiv:2411.06493.
M. Siavvas, I. Kalouptsoglou, E. Gelenbe, D. Kehagias, and D. Tzovaras, “Transforming the field of vulnerability prediction: Are large language models the key?,” in Proc. 32nd Int. Conf. Modeling, Anal. Simul. Comput. Telecommun. Syst. (MASCOTS), Krakow, Poland, 2024, pp. 1–6, doi: 10.1109/ MASCOTS64422.2024.10786575.
J. Haurogné, N. Basheer, and S. Islam, “Vulnerability detection using BERT based LLM model with transparency obligation practice towards trustworthy AI,” Mach. Learn. Appl., vol. 18, Dec. 2024.
H. Pearce, B. Ahmad, B. Tan, B. Dolan-Gavitt, and R. Karri, “Asleep at the keyboard? Assessing the security of GitHub Copilot’s code contributions,” presented at the 2022 IEEE Symp. Secur. Priv. (SP), May 2022.
C. Asuai and G. Nwalozie, “Investigating and addressing security policy misconfigurations,” IOSR J. Eng., vol. 14, Apr. 2024.
C. Benzaïd and T. Taleb, “AI for beyond 5G networks: A cyber-security defense or offense enabler?,” IEEE Netw., vol. 34, Nov. 2020.
U. Mandal, S. Shukla, A. Rastogi, S. Bhattacharya, and D. Mukhopadhyay, “µLAM: A LLM-powered assistant for real-time micro-architectural attack detection and mitigation,” Cryptol. ePrint Arch., Paper 2024/1978, 2024. [Online]. Available: https://eprint.iacr.org/2024/1978
A. Dijk, “Detection of advanced persistent threats using artificial intelligence for deep packet inspection,” in Proc. IEEE Int. Conf. Big Data, Dec. 2021.
M. A. Ferrag, F. Alwahedi, A. Battah, B. Cherif, A. Mechri, and N. Tihanyi, “Generative AI and large language models for cyber security: All insights you need,” May 21, 2024, arXiv:2405.12750.
C.-N. Hang, P.-D. Yu, R. Morabito, and C.-W. Tan, “Large language models meet next-generation networking technologies: A review,” Future Internet, vol. 16, Oct. 2024.
K. Stein, A. A. Mahyari, G. F. III, and E. El-Sheikh, “Towards novel malicious packet recognition: A few-shot learning approach,” Sep. 17, 2024, arXiv:2409.11254.
Z. Wu, H. Zhang, P. Wang, and Z. Sun, “RTIDS: A robust transformer-based approach for intrusion detection system,” IEEE Access, vol. 10, 2022.
Z. A. Khan, D. Shin, D. Bianculli, and L. Briand, “Guidelines for assessing the accuracy of log message template identification techniques,” in Proc. 44th Int. Conf. Softw. Eng. (ICSE), Jul. 2022.
Z. Jiang et al., “LILAC: Log parsing using LLMs with adaptive parsing cache,” Proc. ACM Softw. Eng., vol. 1, Jul. 2024.
Z. Ma, A. R. Chen, D. J. Kim, T.-H. P. Chen, and S. Wang, “LLMParser: An exploratory study on using large language models for log parsing,” presented at the 2024 IEEE/ACM 46th Int. Conf. Softw. Eng. (ICSE), Apr. 2024.
J. Xu, R. Yang, Y. Huo, C. Zhang, and P. He, “DivLog: Log parsing with prompt enhanced in-context learning,” in Proc. IEEE/ACM 46th Int. Conf. Softw. Eng. (ICSE), Apr. 2024.
V.-H. Le and H. Zhang, “Log parsing with prompt-based few-shot learning,” in Proc. 45th Int. Conf. Softw. Eng. (ICSE), Jul. 2023.
J. Huang, Z. Jiang, Z. Chen, and M. R. Lyu, “LUNAR: Unsupervised LLM-based log parsing,” Aug. 2024, arXiv:2406.07174.
R. Vaarandi and H. Bahsi, “Using large language models for template detection from security event logs,” Int. J. Inf. Security, vol. 24, Mar. 2025.
W. Zhou et al., “Star: A system for ticket analysis and resolution,” in Proc. 23rd ACM SIGKDD Int. Conf. Knowl. Discovery Data Min., 2017.
E. Aghaei, X. Niu, W. Shadid, and E. Al-Shaer, “SecureBERT: A domain-specific language model for cybersecurity,” Oct. 20, 2022, arXiv:2204.02685.
N. Arici, L. Putelli, L. Sigalini, I. Serina, and others, “LLM-based approaches for automatic ticket assignment: A real-world Italian application,” in CEUR Workshop Proc., vol. 3551, Nov. 2023.
F. Y. Loumachi and M. C. Ghanem, “Advancing cyber incident timeline analysis through rule-based AI and large language models,” Sep. 2024, arXiv:2409.02572.
P. Balasubramanian, J. Seby, and P. Kostakos, “CYGENT: A cybersecurity conversational agent with log summarization powered by GPT-3,” Mar. 25, 2024, arXiv:2403.17160.
F. Li, H. Lang, J. Zhang, J. Shen, and X. Wang, “PreConfig: A pretrained model for automating network configuration,” Mar. 14, 2024, arXiv:2403.09369.
O. G. Lira, O. M. Caicedo, and N. L. S. da Fonseca, “Large language models for zero touch network configuration management,” Aug. 23, 2024, arXiv:2408.13298.
Y. Mikami, A. Melnik, J. Miura, and V. Hautamäki, “Natural language as policies: Reasoning for coordinate-level embodied control with LLMs,” Apr. 6, 2024, arXiv:2403.13801.
K. Dzeparoska, J. Lin, A. Tizghadam, and A. Leon-Garcia, “LLM-based policy generation for intent-based management of applications,” in Proc. 19th Int. Conf. Netw. Serv. Manag. (CNSM), Oct. 2023.
S. Hays and J. White, “Employing LLMs for incident response planning and review,” Mar. 2, 2024, arXiv:2403.01271.
R. Kaur, T. Klobučar, and D. Gabrijelčič, “Harnessing the power of language models in cybersecurity: A comprehensive review,” Int. J. Inf. Manag. Data Insights, vol. 5, Jun. 2025.
OWASP, “OWASP Top 10 for LLM Applications 2025,” OWASP Top 10 for LLM & Generative AI Security. Accessed: Jan. 4, 2025. [Online]. Available: https://genai.owasp.org/resource/owasp-top-10-forllm-applications-2025/
P. Rafiey and A. Namadchian, “Using LLMs as AI agents to identify false positive alerts in security operation center,” Research Square, Nov. 2024, doi: 10.21203/rs.3.rs-5420741/v1.
M. Kaheh, D. K. Kholgh, and P. Kostakos, “Cyber Sentinel: Exploring conversational agents in streamlining security tasks with GPT-4,” Sep. 28, 2023, arXiv:2309.16422.
T. Ali and P. Kostakos, “HuntGPT: Integrating machine learning-based anomaly detection and explainable AI with large language models (LLMs),” Sep. 27, 2023, arXiv:2309.16021.
O. Oniagbi, “Evaluation of LLM agents for the SOC Tier 1 analyst triage process,” M.S. thesis, University of Turku, 2024.
A. Baig, “Accessing the role of artificial intelligence in information security risk management,” M.S. thesis, University of Jyväskylä, 2024.
L. Wang et al., “From sands to mansions: Enabling automatic full-life-cycle cyberattack construction with LLM,” Jul. 24, 2024, arXiv:2407.16928.
E. Pleshakova, A. Osipov, S. Gataullin, T. Gataullin, and A. Vasilakos, “Next gen cybersecurity paradigm towards artificial general intelligence: Russian market challenges and future global technological trends,” J. Comput. Virol. Hacking Tech., vol. 20, Sep. 2024.
A. Dijk, R. Meier, C. Melella, and M. Pihelgas, “Locked Shields Partners Run 24 (LSPR24): A Next-Generation Cybersecurity Dataset for Blue Team Automation,” Zenodo, Mar. 1, 2025, doi: 10.5281/ zenodo.14900873.
M. Cadrouil, “LLM agents in cybersecurity: A double-edged sword,” I-TRACING. Accessed: Jan. 4, 2025. [Online]. Available: https://i-tracing.com/blog/llm-agents-cybersecurity/