Systematic Evaluation of Deep Learning Models for Log-based Failure Prediction

[en] With the increasing complexity and scope of software systems, their dependability is crucial. The analysis of log data recorded during system execution can enable engineers to automatically predict failures at run time. Several Machine Learning (ML) techniques, including traditional ML and Deep Learning (DL), have been proposed to automate such tasks. However, current empirical studies are limited in terms of covering all main DL types -- Recurrent Neural Network (RNN), Convolutional Neural network (CNN), and transformer -- as well as examining them on a wide range of diverse datasets. In this paper, we aim to address these issues by systematically investigating the combination of log data embedding strategies and DL types for failure prediction. To that end, we propose a modular architecture to accommodate various configurations of embedding strategies and DL-based encoders. To further investigate how dataset characteristics such as dataset size and failure percentage affect model accuracy, we synthesised 360 datasets, with varying characteristics, for three distinct system behavioral models, based on a systematic and automated generation approach. Using the F1 score metric, our results show that the best overall performing configuration is a CNN-based encoder with Logkey2vec. Additionally, we provide specific dataset conditions, namely a dataset size >350 or a failure percentage >7.5%, under which this configuration demonstrates high accuracy for failure prediction.

Research center :

Interdisciplinary Centre for Security, Reliability and Trust (SnT) > SVV - Software Verification and Validation

Disciplines :

Computer science

Author, co-author :

Hadadi, Fatemeh

DAWES, Joshua ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SVV

Shin, Donghwan

BIANCULLI, Domenico ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SVV

BRIAND, Lionel

External co-authors :

yes

Language :

English

Title :

Systematic Evaluation of Deep Learning Models for Log-based Failure Prediction

Publication date :

20 June 2024

Journal title :

Empirical Software Engineering

ISSN :

1382-3256

eISSN :

1573-7616

Volume :

Pages :

105:1-105:53

Peer reviewed :

Peer Reviewed verified by ORBi

Focus Area :

Security, Reliability and Trust

European Projects :

H2020 - 957254 - COSMOS - DevOps for Complex Cyber-physical Systems

Funders :

Union Européenne

Data Set :

https://doi.org/10.6084/m9.figshare.22219111.v2

Available on ORBilu :

since 22 November 2023

Statistics

Number of views

293 (23 by Unilu)

Number of downloads

289 (11 by Unilu)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

OpenCitations

OpenAlex citations

Bibliography

Basin D, Dardinier T, Heimes L, Krstić S, Raszyk M, Schneider J, Traytel D (2020) A formally verified, optimized monitor for metric first-order dynamic logic. In: Automated Reasoning: 10th International joint conference, IJCAR 2020, Paris, France, July 1–4, 2020, Proceedings, Part I, Springer-Verlag, Berlin, Heidelberg, pp 432–453, https://doi.org/10.1007/978-3-030-51074-9_25
E. Bauer R. Adams Reliability and availability of cloud computing 2012 John Wiley & Sons 10.1002/9781118393994
Black PE (2020) Strongly connected component. Dictionary of Algorithms and Data Structures https://www.nist.gov/dads/HTML/stronglyConnectedCompo.html
J. Blom A. Hessel B. Jonsson P. Pettersson Specifying and generating test cases using observer automata Lecture Notes Comput Sci 2005 3395 125 139 10.1007/978-3-540-31848-4_9
Bogatinovski J, Nedelkoski S, Wu L, Cardoso J, Kao O (2022) Failure identification from unstable log data using deep learning. In: 2022 22nd IEEE International symposium on cluster, cloud and internet computing (CCGrid) pp 346–355, https://api.semanticscholar.org/CorpusID:247996709
Bombarda A, Gargantini A (2020) An Automata-Based Generation Method for Combinatorial Sequence Testing of Finite State Machines. In: Proceedings - 2020 IEEE 13th International conference on software testing, verification and validation workshops, ICSTW 2020 pp 157–166. https://doi.org/10.1109/ICSTW50294.2020.00036
L. Breiman Random forests Mach Learn 2001 45 1 5 32 10.1023/A:1010933404324
L. Breiman J.H. Friedman R.A. Olshen C.J. Stone Classification and Regression Trees 1984 Wadsworth
Breiman L, Friedman JH, Olshen RA, Stone CJ (1984b) Classification and Regression Trees. Chapman and Hall/CRC
Carvalho TP, Soares FAAMN, Vita R, da P Francisco R, Basto JP, Alcalá SGS, (2019) A systematic literature review of machine learning methods applied to predictive maintenance. Comput & Industrial Eng 137:106024. https://doi.org/10.1016/j.cie.2019.106024, https://www.sciencedirect.com/science/article/pii/S0360835219304838
Chen Y, Yang X, Lin Q, Zhang D, Dong H, Xu Y, Li H, Kang Y, Zhang H, Gao F, Xu Z, Dang Y (2019) Outage prediction and diagnosis for cloud service systems. In: The Web Conference 2019 - Proceedings of the World Wide Web Conference, WWW 2019 pp 2659–2665. https://doi.org/10.1145/3308558.3313501
Chen Y, Li L, Li W, Guo Q, Du Z, Xu Z (2022) AI Computing Systems: An Application Driven Perspective. Elsevier Sci https://books.google.ca/books?id=RSWJEAAAQBAJ
Chen Z, Liu J, Gu W, Su Y, Lyu MR (2021) Experience report: Deep learning-based system log analysis for anomaly detection. https://doi.org/10.48550/ARXIV.2107.05908, https://arxiv.org/abs/2107.05908
Cho K, Van Merriënboer B, Bahdanau D, Bengio Y (2014) Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv:1406.1078
Chollet F (2017) Xception: Deep learning with depthwise separable convolutions. In: IEEE Conference on computer vision and pattern recognition (CVPR)
C. Cortes V. Vapnik Support-vector networks Mach Learn 1995 20 3 273 297 10.1007/BF00994018
Cotroneo D, De Simone L, Liguori P, Natella R, Bidokhti N (2019) How bad can a bug get? An empirical analysis of software failures in the OpenStack cloud computing platform. In: ESEC/FSE 2019 - Proceedings of the 2019 27th ACM Joint Meeting European software engineering conference and symposium on the foundations of software engineering pp 200–211, https://doi.org/10.1145/3338906.3338916, arXiv:1907.04055
Das A, Mueller F, Siegel C, Vishnu A (2018) Desh: Deep learning for system health prediction of lead times to failure in HPC. In: HPDC 2018 - Proceedings of the 2018 International symposium on high-performance parallel and distributed computing pp 40–51, https://doi.org/10.1145/3208040.3208051
Das A, Mueller F, Rountree B (2020) Aarohi: Making Real-Time Node Failure Prediction Feasible. In: Proceedings - 2020 IEEE 34th International parallel and distributed processing symposium, IPDPS 2020 pp 1092–1101, https://doi.org/10.1109/IPDPS47924.2020.00115
S. Deerwester S.T. Dumais G.W. Furnas T.K. Landauer R. Harshman Indexing by latent semantic analysis J Am Soc Inf Sci 1990 41 6 391 407 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9
Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805
Digital Research Alliance of Canada (2016)https://alliancecan.ca/, accessed: March 2, 2023
Ding M, Zhou C, Yang H, Tang J (2020) Cogltx: Applying bert to long texts. In: Neural information processing systems
Du M, Li F, Zheng G, Srikumar V (2017) Deeplog: Anomaly detection and diagnosis from system logs through deep learning. In: Proceedings of the 2017 ACM SIGSAC Conference on computer and communications security, association for computing machinery, New York, NY, USA, CCS ’17, p 1285–1298, https://doi.org/10.1145/3133956.3134015,
Dwivedi VP, Luu AT, Laurent T, Bengio Y, Bresson X (2021) Graph neural networks with learnable structural and positional representations. arXiv:2110.07875
M. Fernández-Delgado E. Cernadas S. Barro D. Amorim Do we need hundreds of classifiers to solve real world classification problems? J Mach Learn Res 2014 15 3133 3181 3277155
Foundation CC (2023) Common crawl corpus. https://commoncrawl.org/
F.A. Gers J.A. Schmidhuber F.A. Cummins Learning to forget: Continual prediction with lstm Neural Comput 2000 12 10 2451 2471 10.1162/089976600300015015
Gu J, Wang Z, Kuen J, Ma L, Shahroudy A, Shuai B, Liu T, Wang X, Wang G, Cai J, Chen T (2018) Recent advances in convolutional neural networks. Pattern Recognition 77:354–377, https://doi.org/10.1016/j.patcog.2017.10.013, https://www.sciencedirect.com/science/article/pii/S0031320317304120
Guo H, Yuan S, Wu X (2021) Logbert: Log anomaly detection via bert. In: 2021 International joint conference on neural networks (IJCNN), pp 1–8, https://doi.org/10.1109/IJCNN52387.2021.9534113
Hadadi F, Dawes J, Shin D, Bianculli D, Briand L (2024) Replication package. https://doi.org/10.6084/m9.figshare.22219111, https://figshare.com/articles/software/Replication_Package/22219111
He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21:1263–1284, https://api.semanticscholar.org/CorpusID:206742563
He S, He P, Chen Z, Yang T, Su Y, Lyu MR (2021) A Survey on Automated Log Analysis for Reliability Engineering. ACM Comput Surv 54(6), https://doi.org/10.1145/3460345, arXiv:2009.07237
S. Hochreiter J. Schmidhuber Long short-term memory Neural Comput 1997 9 8 1735 1780 10.1162/neco.1997.9.8.1735
S. Hochreiter J. Schmidhuber Long short-term memory Neural Comput 1997 9 8 1735 1780 10.1162/neco.1997.9.8.1735
S. Huang Y. Liu C. Fung R. He Y. Zhao H. Yang Z. Luan HitAnomaly: Hierarchical Transformers for Anomaly Detection in System Log IEEE Trans Netw Service Manag 2020 17 4 2064 2076 10.1109/TNSM.2020.3034647
Huang Z, Xu W, Yu K (2015) Bidirectional lstm-crf models for sequence tagging. https://doi.org/10.48550/ARXIV.1508.01991, arXiv:1508.01991
Johnson JM, Khoshgoftaar TM (2019) Survey on deep learning with class imbalance. J Big Data 6(1), https://doi.org/10.1186/s40537-019-0192-5
Joulin A, Grave E, Bojanowski P, Douze M, Jégou H, Mikolov T (2016) Fasttext.zip: Compressing text classification models. arXiv:1612.03651
Kim Y (2014) Convolutional neural networks for sentence classification. arXiv:1408.5882
Kingma DP, Ba J (2015) Adam: A method for stochastic optimization. In: Bengio Y, LeCun Y (eds) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, arXiv:1412.6980
Kluge F, Rochange C, Ungerer T (2017) EMSBench: Benchmark and Testbed for Reactive Real-Time Systems. Leibniz Trans Embedded Syst 4(2):02–1–02:23, https://ojs.dagstuhl.de/index.php/lites/article/view/LITES-v004-i002-a002
S. Krstić J. Schneider A Benchmark Generator for Online First-Order Monitoring 2020 LNCS Springer International Publishing 10.1007/978-3-030-60508-7_27
Le VH, Zhang H (2021) Log-based anomaly detection without log parsing. In: 2021 36th IEEE/ACM International conference on automated software engineering (ASE), pp 492–504, https://doi.org/10.1109/ASE51524.2021.9678773
Le VH, Zhang H (2022) Log-based anomaly detection with deep learning: How far are we? In: Proceedings of the 44th international conference on software engineering, association for computing machinery, New York, NY, USA, ICSE ’22, p 1356–1367, https://doi.org/10.1145/3510003.3510155
Li X, Chen P, Jing L, He Z, Yu G (2020) Swisslog: Robust and unified deep learning based log anomaly detection for diverse faults. In: 2020 IEEE 31st International symposium on software reliability engineering (ISSRE), IEEE computer society, Los Alamitos, CA, USA, pp 92–103,https://doi.org/10.1109/ISSRE5003.2020.00018, https://doi.ieeecomputersociety.org/10.1109/ISSRE5003.2020.00018
Lin Q, Hsieh K, Dang Y, Zhang H, Sui K, Xu Y, Lou JG, Li C, Wu Y, Yao R, Chintalapati M, Zhang D (2018) Predicting node failure in cloud service systems. In: Proceedings of the 2018 26th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, association for computing machinery, New York, NY, USA, ESEC/FSE 2018, p 480–490, https://doi.org/10.1145/3236024.3236060
Lipton ZC (2015) A critical review of recurrent neural networks for sequence learning. arXiv:1506.00019
Liu X, He Y, Liu H, Zhang J, Liu B, Peng X, Xu J, Zhang J, Zhou A, Sun P, Zhu K, Nishi A, Zhu D, Zhang K (2020) Smart Server Crash Prediction in Cloud Service Data Center. In: 2020 19th IEEE Intersociety conference on thermal and thermomechanical phenomena in electronic systems (ITherm), https://doi.org/10.1109/ITherm45881.2020.9190321
Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2019) Roberta: A robustly optimized bert pretraining approach. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP)
S. Lu X. Wei Y. Li L. Wang Detecting anomaly in big data system logs using convolutional neural network IEEE Access 2018 6 21929 21940 10.1109/ACCESS.2018.2811530
M. Matsumoto T. Nishimura Mersenne Twister: A 623-Dimensionally Equidistributed Uniform Pseudo-Random Number Generator ACM Trans Model Comput Simulation 1998 8 1 3 30 10.1145/272991.272995
Meng W, Liu Y, Zhu Y, Zhang S, Pei D, Liu Y, Chen Y, Zhang R, Tao S, Sun P, Zhou R (2019) Loganomaly: Unsupervised detection of sequential and quantitative anomalies in unstructured logs. In: International joint conference on artificial intelligence
Meng W, Liu Y, Huang Y, Zhang S, Zaiter F, Chen B, Pei D (2020) A semantic-aware representation framework for online log analysis. In: 2020 29th International conference on computer communications and networks (ICCCN), pp 1–7, https://doi.org/10.1109/ICCCN49398.2020.9209707
Mikolov T, Chen K, Corrado GS, Dean J (2013) Efficient estimation of word representations in vector space. In: International conference on learning representations
Nedelkoski S, Bogatinovski J, Acker A, Cardoso J, Kao O (2020) Self-attentive classification-based anomaly detection in unstructured logs. Proceedings - IEEE international conference on data mining, ICDM 2020-Novem(Icdm):1196–1201, https://doi.org/10.1109/ICDM50108.2020.00148, arXiv:2008.09340
Notaro P, Cardoso J, Gerndt M (2021) A survey of aiops methods for failure management. ACM Trans Intell Syst Technol 12(6), https://doi.org/10.1145/3483424
O’Shea K, Nash R (2015) An introduction to convolutional neural networks. https://doi.org/10.48550/ARXIV.1511.08458, arXiv:1511.08458
Package RP (2019) https://docs.python.org/3/library/random.html, accessed 2022-11-14
Prechelt L (1998) Early stopping-but when? In: Neural Networks: Tricks of the Trade, Springer, pp 55–69
Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I (2019) Language models are unsupervised multitask learners. In: Neural information processing systems
A. Rajaraman J. Leskovec J. Ullman Mining of Massive Datasets Cambridge University Press 2014 10.1017/CBO9781139058452
B. Russo G. Succi W. Pedrycz Mining system logs to learn error predictors: a case study of a telemetry system Empirical Softw Eng 2015 20 4 879 927 10.1007/s10664-014-9303-2
Sahoo RK, Oliner AJ, Rish I, Gupta M, Moreira JE, Ma S, Vilalta R, Sivasubramaniam A (2003) Critical event prediction for proactive management in large-scale computer clusters. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining pp 426–435, https://doi.org/10.1145/956750.956799
Salfner F, Lenk M, Malek M (2010) A survey of online failure prediction methods. ACM Comput Surveys 42(3), https://doi.org/10.1145/1670679.1670680
M. Schuster K. Paliwal Bidirectional recurrent neural networks IEEE Trans Signal Process 1997 45 11 2673 2681 10.1109/78.650093
Shin D, Bianculli D, Briand L (2022) Prins: Scalable model inference for component-based system logs. Empirical Softw Engg 27(4), https://doi.org/10.1007/s10664-021-10111-4
C. Sun X. Qiu Y. Xu X. Huang M. Sun X. Huang H. Ji Z. Liu Y. Liu How to fine-tune bert for text classification? Chinese Computational Linguistics 2019 Cham Springer International Publishing 194 206 10.1007/978-3-030-32381-3_16
Tauber A (2018) exrex: Irregular methods for regular expressions. https://github.com/asciimoo/exrex, accessed 2022-11-14
Upton G, Cook I (2008) A Dictionary of Statistics. Oxford Paperback Reference, OUP Oxford, https://books.google.ca/books?id=u97pzxRjaCQC
Vaswani A, Shazeer NM, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. arXiv:1706.03762
Walkinshaw N, Taylor R, Derrick J (2013) Inferring extended finite state machine models from software executions. In: 2013 20th Working conference on reverse engineering (WCRE), pp 301–310, https://doi.org/10.1109/WCRE.2013.6671305
Weijie D, Yunyi L, Jing Z, Xuchen S (2021) Long text classification based on bert. In: 2021 IEEE 5th Information Technology,Networking,Electronic and Automation Control Conference (ITNEC), vol 5, pp 1147–1151, https://doi.org/10.1109/ITNEC52019.2021.9587007
Wu X, Li H, Khomh F (2023) On the effectiveness of log representation for log-based anomaly detection. arXiv:2308.08736
Wu Y, Schuster M, Chen Z, Le QV, Norouzi M, Macherey W, Krikun M, Cao Y, Gao Q, Macherey K, Klingner J, Shah A, Johnson M, Liu X, Kaiser L, Gouws S, Kato Y, Kudo T, Kazawa H, Stevens K, Kurian G, Patil N, Wang W, Young C, Smith J, Riesa J, Rudnick A, Vinyals O, Corrado G, Hughes M, Dean J (2016) Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv:1609.08144
Xie Y, Zhang H, Babar MA (2022) Loggd: Detecting anomalies from system logs with graph neural networks. In: 2022 IEEE 22nd International conference on software quality, reliability and security (QRS), pp 299–310, https://doi.org/10.1109/QRS57517.2022.00039
Xu P, Kumar D, Yang W, Zi W, Tang K, Huang C, Cheung JCK, Prince S, Cao Y (2020) Optimizing deeper transformers on small datasets. In: Annual meeting of the association for computational linguistics
Yamanishi K, Maruyama Y (2005) Dynamic syslog mining for network failure monitoring. In: Proceedings of the Eleventh ACM SIGKDD International conference on knowledge discovery in data mining, association for computing machinery, New York, NY, USA, KDD ’05, p 499–508, https://doi.org/10.1145/1081870.1081927
Yang L, Chen J, Wang Z, Wang W, Jiang J, Dong X, Zhang W (2021) Semi-supervised log-based anomaly detection via probabilistic label estimation. In: 2021 IEEE/ACM 43rd International conference on software engineering (ICSE), pp 1448–1460, https://doi.org/10.1109/ICSE43902.2021.00130
Zhang S, Liu Y, Meng W, Luo Z, Bu J, Yang S, Liang P, Pei D, Xu J, Zhang Y, Chen Y, Dong H, Qu X, Song L (2018) Prefix: Switch failure prediction in datacenter networks. Proc ACM Meas Anal Comput Syst 2(1):2:1–2:29, https://doi.org/10.1145/3179405
Zhang X, Xu Y, Lin Q, Qiao B, Zhang H, Dang Y, Xie C, Yang X, Cheng Q, Li Z, Chen J, He X, Yao R, Lou JG, Chintalapati M, Shen F, Zhang D (2019) Robust log-based anomaly detection on unstable log data. In: ESEC/FSE 2019 - Proceedings of the 2019 27th ACM joint meeting european software engineering conference and symposium on the foundations of software engineering pp 807–817, https://doi.org/10.1145/3338906.3338931