[en] Many software engineering activities process the events contained in log files. However, before performing any processing activity, it is necessary to parse the entries in a log file, to retrieve the actual events recorded in the log. Each event is denoted by a log message,
which is composed of a fixed part-called (event) template-that is the same for all occurrences of the same event type, and a variable part, which may vary with each event occurrence. The formats of log messages, in complex and evolving systems, have numerous variations, are typically not entirely known, and change on a frequent basis; therefore, they need to be identified automatically.
The log message format identification problem deals with the identification of the different templates used in the messages of a log. Any solution to this problem has to generate templates that meet two main goals: generating templates that are not too general, so as to distinguish different events, but also not too specific, so as not to consider different occurrences of the same event as following different templates; however, these goals are conflicting.
In this paper, we present the MoLFI approach, which recasts the log message identification problem as a multi-objective problem. MoLFI uses an evolutionary approach to solve this problem, by tailoring the NSGA-II algorithm to search the space of solutions for a Pareto optimal set of message templates. We have implemented MoLFI in a tool, which we have evaluated on six real-world datasets, containing log files with a number of entries ranging from 2K to 300K. The experiments results show that MoLFI extracts by far the highest number of correct log message templates, significantly outperforming two state-of-the-art approaches on all datasets.
Research center :
Interdisciplinary Centre for Security, Reliability and Trust (SnT) > Software Verification and Validation Lab (SVV Lab)
Disciplines :
Computer science
Author, co-author :
Messaoudi, Salma ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)
Panichella, Annibale ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)
Bianculli, Domenico ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)
Briand, Lionel ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)
Sasnauskas, Raimondas; Société Européenne des Satellites - SES
External co-authors :
no
Language :
English
Title :
A Search-based Approach for Accurate Identification of Log Message Formats
Publication date :
2018
Event name :
26th IEEE/ACM International Conference on Program Comprehension (ICPC ’18)
Event organizer :
IEEE/ACM
Event place :
Gothenburg, Sweden
Event date :
from 27-05–2018 to 28-05-2018
Audience :
International
Main work title :
Proceedings of the 26th IEEE/ACM International Conference on Program Comprehension (ICPC ’18)
Publisher :
ACM
Pages :
167-177
Peer reviewed :
Peer reviewed
Focus Area :
Security, Reliability and Trust
European Projects :
H2020 - 694277 - TUNE - Testing the Untestable: Model Testing of Complex Software-Intensive Systems
FnR Project :
FNR11602677 - Log-driven, Search-based Test Generation For Ground Control Systems, 2017 (01/01/2018-30/06/2021) - Lionel Briand
Funders :
FNR - Fonds National de la Recherche [LU] CE - Commission Européenne [BE]
Andrea Arcuri and Gordon Fraser. 2013. Parameter Tuning or Default Values An Empirical Investigation in Search-Based Software Engineering. Empirical Software Engineering 18, 3 (2013), 594-623.
Rose D. Baker. 1995. Modern Permutation Test Software. In Randomization Tests, Third Edition. Marcel Dekker, New York, NY, USA, 391-401.
David Basin, Germano Caronni, Sarah Ereth, Matuš Harvan, Felix Klaedtke, and Heiko Mantel. 2014. Scalable Offline Monitoring. In Proceedings of the 5th International Conference on Runtime Verification (RV 2014) (LNCS), Vol. 8734. Springer, Cham, Switzerland, 31-47.
Christophe Bertero, Matthieu Roy, Carla Sauvanaud, and Gilles Tredan. 2017. Experience Report: Log Mining using Natural Language Processing and Application to Anomaly Detection. In Proceedings of the 28th International Symposium on Software Reliability Engineering (ISSRE 2017). IEEE, Piscataway, NJ, USA, 351-360.
Ivan Beschastnikh, Yuriy Brun, Sigurd Schneider, Michael Sloan, and Michael D. Ernst. 2011. Leveraging Existing Instrumentation toAutomatically Infer Invariantconstrained Models. In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering (ESEC/FSE 2011). ACM, New York, NY, USA, 267-277.
Jurgen Branke, Kalyanmoy Deb, Henning Dierolf, and Matthias Osswald. 2004. Finding Knees in Multi-objective Optimization. In Proceedings of the 8th International Parallel Problem Solving from Nature (PPSN 2004) (LNCS), Vol. 3242. Springer, Berlin, Heidelberg, 722-731.
Lionel C. Briand, Yvan Labiche, and Marwa Shousha. 2006. Using Genetic Algorithms for Early Schedulability Analysis and Stress Testing in Real-time Systems. Genetic Programming and Evolvable Machines 7, 2 (2006), 145-170.
Helen G. Cobb and John J. Grefenstette. 1993. Genetic Algorithms for Tracking Changing Environments. In Proceedings of the 5th International Conference on Genetic Algorithms (ICGA 1993). Morgan Kaufmann Publishers, San Francisco, CA, USA, 523-530.
Kalyanmoy Deb. 2014. Multi-objective Optimization. In Search Methodologies: Introductory Tutorials in Optimization and Decision Support Techniques, Second Edition. Springer, New York, NY, USA, 403-449.
Kalyanmoy Deb, Amrit Pratap, Sameer Agarwal, and TAMT Meyarivan. 2002. A Fast and Elitist Multiobjective Genetic Algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation 6, 2 (2002), 182-197.
Min Du and Feifei Li. 2016. Spell: Streaming Parsing of System Event Logs. In Proceedings of the16th IEEE International Conference on Data Mining (ICDM 2016). IEEE, Piscataway, NJ, USA, 859-864.
Qiang Fu, Jian-Guang Lou, Yi Wang, and Jiang Li. 2009. Execution Anomaly Detection in Distributed Systems through Unstructured Log Analysis. In Proceedings of the 9th IEEE International Conference on Data Mining (ICDM 2009). IEEE Computer Society, Los Alamitos, CA, USA, 149-158.
Qiang Fu, Jieming Zhu, Wenlu Hu, Jian-Guang Lou, Rui Ding, Qingwei Lin, Dongmei Zhang, and Tao Xie. 2014. Where Do Developers Log An Empirical Study on Logging Practices in Industry. In Proceedings of the 36th International Conference on Software Engineering (ICSE Companion 2014). ACM, New York, NY, USA, 24-33.
Maayan Goldstein, Danny Raz, and Itai Segall. 2017. Experience Report: Log-Based Behavioral Differencing. In Proceedings of the 28th International Symposium on Software Reliability Engineering (ISSRE 2017). IEEE, Piscataway, NJ, USA, 282-293.
Christian W. Gunther and Wil M. P. van der Aalst. 2007. Fuzzy Mining-adaptive Process Simplification based on Multi-perspective Metrics. In Proceedings of the 5th International Conference on Business Process Management (BPM 2007) (LNCS), Vol. 4714. Springer, Berlin, Heidelberg, 328-343.
Pinjia He, Jieming Zhu, Shilin He, Jian Li, and Michael R. Lyu. 2016. An Evaluation Study on Log Parsing and Its Use in Log Mining. In Proceedings of the 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2016). IEEE, Piscataway, NJ, USA, 654-661.
Pinjia He, Jieming Zhu, Zibin Zheng, and Michael R. Lyu. 2017. Drain: An Online Log Parsing Approach with Fixed Depth Tree. In Proceedings of the International Conference on Web Services (ICWS 2017). IEEE, Piscataway, NJ, USA, 33-40.
Vineet Khare, Xin Yao, and Kalyanmoy Deb. 2003. Performance Scaling of Multiobjective Evolutionary Algorithms. In Proceedings of the 2nd International Conference on Evolutionary Multi-criterion Optimization (EMO 2003) (LNCS), Vol. 2632. Springer-Verlag, Berlin, Heidelberg, 376-390.
Adetokunbo Makanju, A. Nur Zincir-Heywood, and Evangelos E. Milios. 2012. A Lightweight Algorithm for Message Type Extraction in System Application Logs. IEEE Transactions on Knowledge and Data Engineering 24, 11 (2012), 1921-1936.
Haibo Mi, Huaimin Wang, Yangfan Zhou, Michael R. Lyu, and Hua Cai. 2013. Toward Fine-Grained, Unsupervised, Scalable Performance Diagnosis for Production Cloud Computing Systems. IEEE Transactions on Parallel and Distributed Systems 24, 6 (2013), 1245-1255.
Karthik Nagaraj, Charles Killian, and Jennifer Neville. 2012. Structured Comparative Analysis of Systems Logs to Diagnose Performance Problems. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation (NSDI 2012). USENIX Association, Berkeley, CA, USA, 26-26.
Reka Nagy, Mihai A. Suciu, and Dumitru Dumitrescu. 2012. Lorenz Equilibrium: Equitability in Non-cooperative Games. In Proceedings of the 14th Annual Conference on Genetic and Evolutionary Computation (GECCO 2012). ACM, New York, NY, USA, 489-496.
Annibale Panichella, Fitsum M. Kifetew, and Paolo Tonella. 2015. Reformulating Branch Coverage as a Many-objective Optimization Problem. In Proceedings of the 8th IEEE International Conference on Software Testing, Verification and Validation (ICST 2015). IEEE, Piscataway, NJ, USA, 1-10.
Jose Miguel Rojas, Gordon Fraser, and Andrea Arcuri. 2016. Seeding Strategies in Search-based Unit Test Generation. Software Testing, Verification and Reliability 26, 5 (2016), 366-401.
Abdel Salam Sayyad, Katerina Goseva-Popstojanova, Tim Menzies, and Hany Ammar. 2013. On Parameter Tuning in Search Based Software Engineering: A Replicated Empirical Study. In Proceedings of the 3rd International Workshop on Replication in Empirical Software Engineering Research (RESER 2013). IEEE Computer Society, Washington, DC, USA, 84-90.
S.N. Sivanandam and S.N. Deepa. 2008. Introduction to Genetic Algorithms. Springer, Berlin, Heidelberg.
Gilbert Syswerda. 1989. Uniform Crossover in Genetic Algorithms. In Proceedings of the 3rd International Conference on Genetic Algorithms (ICGA 1989). Morgan Kaufmann Publishers, San Francisco, CA, USA, 2-9.
Liang Tang, Tao Li, and Chang-Shing Perng. 2011. LogSig: Generating System Events from Raw Textual Logs. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management (CIKM 2011). ACM, New York, NY, USA, 785-794.
Risto Vaarandi. 2003. A Data Clustering Algorithm for Mining Patterns from Event Logs. In Proceedings of the 3rd IEEE Workshop on IP Operations & Management (IPOM 2003). IEEE, Piscataway, NJ, USA, 119-126.
Jan M. E. M. van der Werf, Boudewijn F. van Dongen, Cor A. J. Hurkens, and Alexander Serebrenik. 2008. Process Discovery Using Integer Linear Programming. In Proceedings of the 29th International Conference on Applications and Theory of Petri Nets (PETRI NETS 2008) (LNCS), Vol. 5062. Springer, Berlin, Heidelberg, 368-387.
Andras Vargha and Harold D. Delaney. 2000. A Critique and Improvement of the CL Common Language Effect Size Statistics of McGraw and Wong. Journal of Educational and Behavioral Statistics 25, 2 (2000), 101-132.
Shuai Wang, Shaukat Ali, Tao Yue, Yan Li, and Marius Liaaen. 2016. A Practical Guide to Select Quality Indicators for Assessing Pareto-based Search Algorithms in Search-based Software Engineering. In Proceedings of the 38th International Conference on Software Engineering (ICSE 2016). ACM, New York, NY, USA, 631-642.
W. Eric Wong, Vidroha Debroy, Richard Golden, Xiaofeng Xu, and Bhavani Thuraisingham. 2012. Effective Software Fault Localization Using an RBF Neural Network. IEEE Transactions on Reliability 61, 1 (2012), 149-169.
Wei Xu. 2010. System Problem Detection by Mining Console Logs. Ph.D. Dissertation. University of California Berkeley.
Ding Yuan, Soyeon Park, and Yuanyuan Zhou. 2012. Characterizing Logging Practices in Open-source Software. In Proceedings of the 34th International Conference on Software Engineering (ICSE 2012). IEEE, Piscataway, NJ, USA, 102-112.
Jieming Zhu, Pinjia He, Qiang Fu, Hongyu Zhang, Michael R. Lyu, and Dongmei Zhang. 2015. Learning to Log: Helping Developers Make Informed Logging Decisions. In Proceedings of the 37th International Conference on Software Engineering (ICSE 2015). IEEE, Piscataway, NJ, USA, 415-425.