Paper published in a book (Scientific congresses, symposiums and conference proceedings)
Towards Universal Segmentation for Log Parsing
LE, Van Hoang; BIANCULLI, Domenico; Nguyen, Huy-Trung
In pressIn Proceedings of the 34th IEEE/ACM International Conference on Program Comprehension conference
Peer reviewed Dataset
 

Files


Full Text
icpc26-research-97.pdf
Author postprint (1.01 MB) Creative Commons License - Attribution
Download

All documents in ORBilu are protected by a user license.

Send to



Details



Keywords :
Log Parsing; Segmentation; Syntactic Analysis; Structural Patterns
Abstract :
[en] Log parsing is a crucial step in log analysis facilitating program comprehension throughout software maintenance and engineering life cycles. Log parsing transforms unstructured log messages into structured data required by various downstream analysis tasks. The sheer volume of log data generated by modern software systems motivates the development of numerous log parsing techniques in the literature. However, existing log parsers still suffer from unsatisfactory accuracy, which may significantly affect the follow-up analysis such as log-based anomaly detection. We have identified two main limitations that hinder the effectiveness of existing log parsing methods: (1) under-segmentation: most log parsers leverage a fixed, predefined set of delimiters to separate a log message into a set of tokens, which may fail to split log messages correctly due to the heterogeneity of logging formats; (2) over-segmentation: using too many delimiters may lead to the over-segmentation issue, which fragments meaningful units in log messages and makes it difficult to accurately identify templates and parameters. To address these limitations, we propose SCLog, a novel syntax- and contextual-aware segmentation approach for log parsing. SCLog leverages a comprehensive set of syntax-based heuristics to segment log messages into coarse-grained tokens. To further tokenize log messages into fine-grained tokens, SCLog mines the structural patterns of tokens based on their surrounding contexts to identify the optimal delimiters for each token dynamically. We evaluate SCLog on widely-used, large-scale Loghub-2.0 datasets. The results demonstrate that SCLog significantly improves the parsing accuracy of four representative log parsers.
Research center :
Interdisciplinary Centre for Security, Reliability and Trust (SnT) > SVV - Software Verification and Validation
Disciplines :
Computer science
Author, co-author :
LE, Van Hoang  ;  University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SVV
BIANCULLI, Domenico  ;  University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SVV
Nguyen, Huy-Trung;  Posts and Telecommunications Institute of Technology
External co-authors :
yes
Language :
English
Title :
Towards Universal Segmentation for Log Parsing
Publication date :
In press
Event name :
34th IEEE/ACM International Conference on Program Comprehension (ICPC ’26)
Event date :
April 12-13, 2026
Audience :
International
Main work title :
Proceedings of the 34th IEEE/ACM International Conference on Program Comprehension conference
Publisher :
ACM
Peer reviewed :
Peer reviewed
Focus Area :
Security, Reliability and Trust
FnR Project :
FNR17373407 - LOGODOR - Automated Log Smell Detection And Removal, 2022 (01/09/2023-31/08/2026) - Domenico Bianculli
Funders :
FNR - Luxembourg National Research Fund
Funding number :
C22/IS/17373407/LOGODOR
Available on ORBilu :
since 19 February 2026

Statistics


Number of views
123 (21 by Unilu)
Number of downloads
78 (5 by Unilu)

Bibliography


Similar publications



Contact ORBilu