Abstract :
[en] With the increasing complexity and scope of software systems, their
dependability is crucial. The analysis of log data recorded during system
execution can enable engineers to automatically predict failures at run time.
Several Machine Learning (ML) techniques, including traditional ML and Deep
Learning (DL), have been proposed to automate such tasks. However, current
empirical studies are limited in terms of covering all main DL types --
Recurrent Neural Network (RNN), Convolutional Neural network (CNN), and
transformer -- as well as examining them on a wide range of diverse datasets.
In this paper, we aim to address these issues by systematically investigating
the combination of log data embedding strategies and DL types for failure
prediction. To that end, we propose a modular architecture to accommodate
various configurations of embedding strategies and DL-based encoders. To
further investigate how dataset characteristics such as dataset size and
failure percentage affect model accuracy, we synthesised 360 datasets, with
varying characteristics, for three distinct system behavioral models, based on
a systematic and automated generation approach. Using the F1 score metric, our
results show that the best overall performing configuration is a CNN-based
encoder with Logkey2vec. Additionally, we provide specific dataset conditions,
namely a dataset size >350 or a failure percentage >7.5%, under which this
configuration demonstrates high accuracy for failure prediction.
Scopus citations®
without self-citations
0