![]() Khan, Zanis Ali ![]() ![]() ![]() in Proceedings of the 44th International Conference on Software Engineering (ICSE ’22) (2022, July) Log message template identification aims to convert raw logs containing free-formed log messages into structured logs to be processed by automated log-based analysis, such as anomaly detection and model ... [more ▼] Log message template identification aims to convert raw logs containing free-formed log messages into structured logs to be processed by automated log-based analysis, such as anomaly detection and model inference. While many techniques have been proposed in the literature, only two recent studies provide a comprehensive evaluation and comparison of the techniques using an established benchmark composed of real-world logs. Nevertheless, we argue that both studies have the following issues: (1) they used different accuracy metrics without comparison between them, (2) some ground-truth (oracle) templates are incorrect, and (3) the accuracy evaluation results do not provide any information regarding incorrectly identified templates. In this paper, we address the above issues by providing three guidelines for assessing the accuracy of log template identification techniques: (1) use appropriate accuracy metrics, (2) perform oracle template correction, and (3) perform analysis of incorrect templates. We then assess the application of such guidelines through a comprehensive evaluation of 14 existing template identification techniques on the established benchmark logs. Results show very different insights than existing studies and in particular a much less optimistic outlook on existing techniques. [less ▲] Detailed reference viewed: 392 (41 UL)![]() Shin, Donghwan ![]() ![]() ![]() in Proceedings of the 21st International Conference on Runtime Verification (2021, October) Log-based anomaly detection identifies systems' anomalous behaviors by analyzing system runtime information recorded in logs. While many approaches have been proposed, all of them have in common an ... [more ▼] Log-based anomaly detection identifies systems' anomalous behaviors by analyzing system runtime information recorded in logs. While many approaches have been proposed, all of them have in common an essential pre-processing step called log parsing. This step is needed because automated log analysis requires structured input logs, whereas original logs contain semi-structured text printed by logging statements. Log parsing bridges this gap by converting the original logs into structured input logs fit for anomaly detection. Despite the intrinsic dependency between log parsing and anomaly detection, no existing work has investigated the impact of the "quality" of log parsing results on anomaly detection. In particular, the concept of "ideal" log parsing results with respect to anomaly detection has not been formalized yet. This makes it difficult to determine, upon obtaining inaccurate results from anomaly detection, if (and why) the root cause for such results lies in the log parsing step. In this short paper, we lay the theoretical foundations for defining the concept of "ideal" log parsing results for anomaly detection. Based on these foundations, we discuss practical implications regarding the identification and localization of root causes, when dealing with inaccurate anomaly detection, and the identification of irrelevant log messages. [less ▲] Detailed reference viewed: 186 (31 UL) |
||