Rigorous Assessment of Model Inference Accuracy using Language  Cardinality

[en] Models such as finite state automata are widely used to abstract the behavior of software systems by capturing the sequences of events observable during their execution. Nevertheless, models rarely exist in practice and, when they do, get easily outdated; moreover, manually building and maintaining models is costly and error-prone. As a result, a variety of model inference methods that automatically construct models from execution traces have been proposed to address these issues. However, performing a systematic and reliable accuracy assessment of inferred models remains an open problem. Even when a reference model is given, most existing model accuracy assessment methods may return misleading and biased results. This is mainly due to their reliance on statistical estimators over a finite number of randomly generated traces, introducing avoidable uncertainty about the estimation and being sensitive to the parameters of the random trace generative process. This paper addresses this problem by developing a systematic approach based on analytic combinatorics that minimizes bias and uncertainty in model accuracy assessment by replacing statistical estimation with deterministic accuracy measures. We experimentally demonstrate the consistency and applicability of our approach by assessing the accuracy of models inferred by state-of-the-art inference tools against reference models from established specification mining benchmarks.

Research center :

Interdisciplinary Centre for Security, Reliability and Trust (SnT) > SVV - Software Verification and Validation

Disciplines :

Computer science

Author, co-author :

Clun, Donato

Shin, Donghwan

Filieri, Antonio

BIANCULLI, Domenico ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SVV

External co-authors :

yes

Language :

English

Title :

Rigorous Assessment of Model Inference Accuracy using Language Cardinality

Publication date :

January 2024

Journal title :

ACM Transactions on Software Engineering and Methodology

ISSN :

1049-331X

Publisher :

Association for Computing Machinery (ACM), United States

Peer reviewed :

Peer Reviewed verified by ORBi

Data Set :

Replication Package and supplementary material for the paper "Rigorous Assessment of Model Inference Accuracy using Language Cardinality"

Available on ORBilu :

since 22 November 2023

Statistics

Number of views

120 (18 by Unilu)

Number of downloads

71 (7 by Unilu)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

OpenAlex citations

Bibliography

Abdulbaki Aydin, Lucas Bang, and Tevfik Bultan. 2015. Automata-based model counting for string constraints. In Computer Aided Verification, Daniel Kroening and Corina S. Păsăreanu (Eds.). Springer International Publishing, Cham, 255–272. DOI:https://doi.org/10.1007/978-3-319-21690-4_15
Abdulbaki Aydin, William Eiers, Lucas Bang, Tegan Brennan, Miroslav Gavrilov, Tevfik Bultan, and Fang Yu. 2018. Parameterized model counting for string and numeric constraints. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering - ESEC/FSE 2018, Daniel Kroening and Corina S. Pasareanu (Eds.). ACM Press, New York, New York, USA, 400–410. DOI:https: //doi.org/10.1145/3236024.3236064
Ivan Beschastnikh, Yuriy Brun, Sigurd Schneider, Michael Sloan, and Michael D. Ernst. 2011. Leveraging existing instrumentation to automatically infer invariant-constrained models. In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering. ACM, New York, NY, USA, 267–277. DOI:https://doi.org/10.1145/2025113.2025151
Alan W. Biermann and J. A. Feldman. 1972. On the synthesis of finite-state machines from samples of their behavior. IEEE Transactions on Computers C-21, 6 (June 1972), 592–597. DOI:https://doi.org/10.1109/TC.1972.5009015
Manuel Bodirsky, Tobias Gärtner, Timo von Oertzen, and Jan Schwinghammer. 2004. Efficiently computing the density of regular languages. In LATIN 2004: Theoretical Informatics, Martín Farach-Colton (Ed.). Springer Berlin, 262–270.
Adilson Luiz Bonifácio, Arnaldo Vieira Moura, and Adenilso da Silva Simão. 2008. A generalized model-based test generation method. In Proceedings of the 2008 6th IEEE International Conference on Software Engineering and Formal Methods. 139–148. DOI:https://doi.org/10.1109/SEFM.2008.17
Janusz A. Brzozowski and Edward J. McCluskey. 1963. Signal flow graph techniques for sequential circuit state diagrams. IEEE Transactions on Electronic Computers EC-12, 2 (1963), 67–76. DOI:https://doi.org/10.1109/PGEC.1963. 263416
Horst Bunke. 2000. Graph matching: Theoretical foundations, algorithms, and applications. In Proceedings of the Vision Interface, Vol. 2000. 82–88.
Nimrod Busany, Shahar Maoz, and Yehonatan Yulazari. 2019. Size and accuracy in model inference. In Proceedings of the 2019 34th IEEE/ACM International Conference on Automated Software Engineering. 887–898. DOI:https://doi.org/10. 1109/ASE.2019.00087
Pierre-Antoine Champin and Christine Solnon. 2003. Measuring the similarity of labeled graphs. In Proceedings of the Case-Based Reasoning Research and Development, Kevin D. Ashley and Derek G. Bridge (Eds.). Springer Berlin, 80–95.
T. S. Chow. 1978. Testing software design modeled by finite-state machines. IEEE Transactions on Software Engineering SE-4, 3 (1978), 178–187. DOI:https://doi.org/10.1109/TSE.1978.231496
Edmund M. Clarke Jr, Orna Grumberg, Daniel Kroening, Doron Peled, and Helmut Veith. 2018. Model Checking. MIT press, Cambridge, MA, USA.
Jonathan E. Cook and Alexander L. Wolf. 1998. Discovering models of software processes from event-based data. ACM Transactions on Software Engineering and Methodology 7, 3 (July 1998), 215–249. DOI:https://doi.org/10.1145/287000. 287001
Cewei Cui, Zhe Dang, Thomas R. Fischer, and Oscar H. Ibarra. 2013. Similarity in languages and programs. Theoretical Computer Science 498 (2013), 58–75. DOI:https://doi.org/10.1016/j.tcs.2013.05.040
Lawrence C. Eggan. 1963. Transition graphs and the star-height of regular events. Michigan Mathematical Journal 10, 4 (1963), 385–397.
Seyedeh Sepideh Emam and James Miller. 2018. Inferring extended probabilistic finite-state automaton models from software executions. ACM Transactions on Software Engineering and Methodology 27, 1, Article 4 (June 2018), 39 pages. DOI:https://doi.org/10.1145/3196883
Bernd Finkbeiner and Hazem Torfah. 2014. Counting models of linear-time temporal logic. In Proceedings of the International Conference on Language and Automata Theory and Applications, Adrian-Horia Dediu, Carlos Martín-Vide, José-Luis Sierra-Rodríguez, and Bianca Truthe (Eds.). Springer International Publishing, Cham, 360–371.
Bernd Finkbeiner and Hazem Torfah. 2017. The density of linear-time properties. In Proceedings of the International Symposium on Automated Technology for Verification and Analysis, Deepak D’Souza and K. Narayan Kumar (Eds.). Springer International Publishing, Cham, 139–155.
Philippe Flajolet and Robert Sedgewick. 2009. Analytic Combinatorics. Cambridge University Press, Cambridge. DOI:https://doi.org/10.1017/CBO9780511801655
Gordon Fraser and Neil Walkinshaw. 2012. Behaviourally adequate software testing. In Proceedings of the 2012 IEEE 5th International Conference on Software Testing, Verification and Validation. IEEE Press, Piscataway, NJ, USA, 300–309. DOI:https://doi.org/10.1109/ICST.2012.110
S. Fujiwara, G. V. Bochmann, F. Khendek, M. Amalou, and A. Ghedamsi. 1991. Test selection based on finite state models. IEEE Transactions on Software Engineering 17, 6 (1991), 591–603. DOI:https://doi.org/10.1109/32.87284
Wolfram Research, Inc. 2022. Mathematica. Retrieved from https://www.wolfram.com/mathematica
Sampath Kannan, Z. Sweedyk, and Steve Mahaney. 1995. Counting and random generation of strings in regular languages. In Proceedings of the 6th Annual ACM-SIAM Symposium on Discrete Algorithms . Society for Industrial and Applied Mathematics, USA, 551–557.
Daniel Kirsten. 2005. Distance desert automata and the star height problem. RAIRO - Theoretical Informatics and Applications 39, 3 (2005), 455–509. DOI:https://doi.org/10.1051/ita:2005027
Ivo Krka, Yuriy Brun, and Nenad Medvidovic. 2014. Automatic mining of specifications from invocation traces and method invariants. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering - FSE 2014. ACM Press, New York, New York, USA, 178–189. DOI:https://doi.org/10.1145/2635868.2635890
Tien-Duy B. Le, Xuan-Bach D. Le, David Lo, and Ivan Beschastnikh. 2015. Synergizing specification miners through model fissions and fusions. In Proceedings of the 2015 30th IEEE/ACM International Conference on Automated Software Engineering. IEEE, 115–125. DOI:https://doi.org/10.1109/ASE.2015.83
David Lo and Siau-cheng Khoo. 2006. QUARK: Empirical assessment of automaton-based specification miners. In Proceedings of the 2006 13th Working Conference on Reverse Engineering. IEEE, 51–60. DOI:https://doi.org/10.1109/ WCRE.2006.47
David Lo and Siau-Cheng Khoo. 2006. SMArTIC: Towards building an accurate, robust and scalable specification miner. In Proceedings of the 14th ACM SIGSOFT International Symposium on Foundations of Software Engineering . Association for Computing Machinery, New York, NY, USA, 265–275. DOI:https://doi.org/10.1145/1181775.1181808
David Lo, Leonardo Mariani, and Mauro Pezzè. 2009. Automatic steering of behavioral model inference. In Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering on European Software Engineering Conference and Foundations of Software Engineering Symposium - E. ACM Press, New York, New York, USA, 345. DOI:https://doi.org/10.1145/1595696.1595761
David Lo, Leonardo Mariani, and Mauro Santoro. 2012. Learning extended FSA from software: An empirical assessment. Journal of Systems and Software 85, 9 (2012), 2063–2076. DOI:https://doi.org/10.1016/j.jss.2012.04.001
Loi Luu, Shweta Shinde, Prateek Saxena, and Brian Demsky. 2014. A model counter for constraints over unbounded strings. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (Edinburgh, United Kingdom) . Association for Computing Machinery, New York, NY, USA, 565–576. DOI:https://doi. org/10.1145/2594291.2594331
Leonardo Mariani, Mauro Pezzè, Oliviero Riganelli, and Mauro Santoro. 2010. SEIM: Static extraction of interaction models. In Proceedings of the 2nd International Workshop on Principles of Engineering Service-Oriented Systems - PESOS ’10. ACM Press, New York, New York, USA, 22. DOI:https://doi.org/10.1145/1808885.1808891
Leonardo Mariani, Mauro Pezzè, and Mauro Santoro. 2017. GK-Tail+ an efficient approach to learn software models. IEEE Transactions on Software Engineering 43, 8 (Aug 2017), 715–738. DOI:https://doi.org/10.1109/TSE.2016.2623623
Austin J. Parker, Kelly B. Yancey, and Matthew P. Yancey. 2017. Regular language distance and entropy. In Proceedings of the 42nd International Symposium on Mathematical Foundations of Computer Science (Leibniz International Proceedings in Informatics, Vol. 83), Kim G. Larsen, Hans L. Bodlaender, and Jean-Francois Raskin (Eds.). Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany, 3:1–3:14. DOI:https://doi.org/10.4230/LIPIcs.MFCS.2017.3
Artem Polyvyanyy, Andreas Solti, Matthias Weidlich, Claudio Di Ciccio, and Jan Mendling. 2020. Monotone precision and recall measures for comparing executions and specifications of dynamic systems. ACM Transactions on Software Engineering and Methodology 29, 3 (2020). DOI:https://doi.org/10.1145/3387909
Michael Pradel, Philipp Bichsel, and Thomas R. Gross. 2010. A framework for the evaluation of specification miners based on finite state machines. In Proceedings of the IEEE International Conference on Software Maintenance, ICSM (2010), 1–10. DOI:https://doi.org/10.1109/ICSM.2010.5609576
Arto Salomaa and Matti Soittola. 1978. Automata-Theoretic Aspects of Formal Power Series. Springer New York. DOI:https://doi.org/10.1007/978-1-4612-6264-0
Richard P. Stanley. 2011. Enumerative Combinatorics (2 ed.). Cambridge Studies in Advanced Mathematics, Vol. 1. Cambridge University Press. DOI:https://doi.org/10.1017/CBO9781139058520
Alaa Tharwat. 2020. Classification assessment methods. Applied Computing and Informatics 17, 1 (2020), 168–192. DOI:https://doi.org/10.1016/j.aci.2018.08.003
Minh-Thai Trinh, Duc-Hiep Chu, and Joxan Jaffar. 2016. Progressive reasoning over recursively-defined strings. In Proceedings of the International Conference on Computer Aided Verification, Swarat Chaudhuri and Azadeh Farzan (Eds.). Springer International Publishing, Cham, 218–240. DOI:https://doi.org/10.1007/978-3-319-41528-4_12
Minh-Thai Trinh, Duc-Hiep Chu, and Joxan Jaffar. 2017. Model counting for recursively-defined strings. In Proceedings of the International Conference on Computer Aided Verification, Rupak Majumdar and Viktor Kunčak (Eds.). Springer International Publishing, Cham, 399–418. DOI:https://doi.org/10.1007/978-3-319-63390-9_21
Neil Walkinshaw and Kirill Bogdanov. 2013. Automated comparison of state-based software models in terms of their language and structure. ACM Transactions on Software Engineering and Methodology 22, 2 (2013). DOI:https://doi.org/ 10.1145/2430545.2430549
Neil Walkinshaw, Kirill Bogdanov, and Ken Johnson. 2008. Evaluation and comparison of inferred regular grammars. In Proceedings of the 9th international colloquium on Grammatical Inference: Algorithms and Applications Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 5278 LNAI (2008), 252–265. DOI:https://doi.org/10.1007/978-3-540-88009-7_20
Neil Walkinshaw, Bernard Lambeau, Christophe Damas, Kirill Bogdanov, and Pierre Dupont. 2013. STAMINA: A Competition to Encourage the Development and Assessment of Software Model Inference Techniques. Vol. 18. 791–824 pages. DOI:https://doi.org/10.1007/s10664-012-9210-3
Neil Walkinshaw, Ramsay Taylor, and John Derrick. 2016. Inferring extended finite state machine models from software executions. Empirical Software Engineering 21, 3 (jun 2016), 811–853. DOI:https://doi.org/10.1007/s10664-015-9367-7
Waterloo Maple, Inc.. 2022. Maple. Retrieved from https://www.maplesoft.com/products/maple/
Sheng Yu. 2001. State complexity of regular languages. Journal of Automata, Languages and Combinatorics 6, 2 (2001), 221–234. DOI:https://doi.org/10.25596/jalc-2001-221