Paper published in a book (Scientific congresses, symposiums and conference proceedings)
Unleashing the True Potential of Semantic-Based Log Parsing with Pre-Trained Language Models
LE, Van Hoang; Xiao, Yi; Zhang, Hongyu
2025In Proceedings - 2025 IEEE/ACM 47th International Conference on Software Engineering, ICSE 2025
Peer reviewed
 

Files


Full Text
Unleashing_the_True_Potential_of_Semantic-Based_Log_Parsing_with_Pre-Trained_Language_Models-2.pdf
Author postprint (989.25 kB)
Download

All documents in ORBilu are protected by a user license.

Send to



Details



Keywords :
log analytics; log parsing; pre-trained LMs; In contexts; Language model; Log analytic; Log parsing; Performance; Pre-trained LM; Semantics Information; Software intensive systems; State of the art; True potentials; Software
Abstract :
[en] Software-intensive systems often produce console logs for troubleshooting purposes. Log parsing, which aims at parsing a log message into a specific log template, typically serves as the first step toward automated log analytics. To better comprehend the semantic information of log messages, many semantic-based log parsers have been proposed. These log parsers fine-tune a small pre-trained language model (PLM) such as RoBERTa on a few labelled log samples. With the increasing popularity of large language models (LLMs), some recent studies also propose to leverage LLMs such as ChatGPT through in-context learning for automated log parsing and obtain better results than previous semantic-based log parsers with small PLMs. In this paper, we show that semantic-based log parsers with small PLMs can actually achieve better or comparable performance to state-of-the-art LLM-based log parsing models while being more efficient and cost-effective. We propose Unleash, a novel semantic-based log parsing approach, which incorporates three enhancement methods to boost the performance of PLMs for log parsing, including (1) an entropy-based ranking method to select the most informative log samples; (2) a contrastive learning method to enhance the fine-tuning process; and (3) an inference optimization method to improve the log parsing performance. We evaluate Unleash on a set of large-scale, public log datasets and the experimental results show that Unleash is effective and efficient compared to state-of-the-art log parsers.
Disciplines :
Computer science
Author, co-author :
LE, Van Hoang  ;  University of Newcastle, Australia
Xiao, Yi;  Chongqing University, China
Zhang, Hongyu;  Chongqing University, China
External co-authors :
yes
Language :
English
Title :
Unleashing the True Potential of Semantic-Based Log Parsing with Pre-Trained Language Models
Publication date :
2025
Event name :
2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE)
Event place :
Ottawa, Can
Event date :
27-04-2025 => 03-05-2025
Main work title :
Proceedings - 2025 IEEE/ACM 47th International Conference on Software Engineering, ICSE 2025
Publisher :
IEEE Computer Society
ISBN/EAN :
9798331505691
Peer reviewed :
Peer reviewed
Funders :
ACM SIGSOFT
Carleton University
et al.
IBM
IEEE Computer Society (and TCSE)
University of Ottawa
Funding text :
This work is supported by Australian Research Council (ARC) Discovery Projects (DP200102940, DP220103044). We also thank anonymous reviewers for their insightful and constructive comments, which significantly improve this paper.
Available on ORBilu :
since 16 January 2026

Statistics


Number of views
32 (3 by Unilu)
Number of downloads
25 (0 by Unilu)

Scopus citations®
 
1
Scopus citations®
without self-citations
1
OpenCitations
 
0
OpenAlex citations
 
2

Bibliography


Similar publications



Contact ORBilu