Abstract :
[en] We address the problem of anomaly detection in log files that
consist of a huge number of records. In order to achieve this
task, we demonstrate label propagation as a semi-supervised
learning technique. The strength of this approach lies in the
small amount of labelled data that is needed to label the
remaining data. This is an advantage since labelled data
needs human expertise which comes at a high cost and be-
comes infeasible for big datasets. Even though our approach
is generally applicable, we focus on the detection of anoma-
lous records in firewall log files. This requires a separation
of records into windows which are compared using different
distance functions to determine their similarity. Afterwards,
we apply label propagation to label a complete dataset in
only a limited number of iterations. We demonstrate our
approach on a realistic dataset from an ISP.
Scopus citations®
without self-citations
3