PEELER: Learning to Effectively Predict Flakiness without Running Tests

Qin, Yihao; Wang, Shangwen; Liu, Kui; Lin, Bo; Wu, Hongjun; Li, Li; Mao, Xiaoguang; BISSYANDE, Tegawendé François D Assise

Download

Paper published in a book (Scientific congresses, symposiums and conference proceedings)

PEELER: Learning to Effectively Predict Flakiness without Running Tests

Qin, Yihao; Wang, Shangwen; Liu, Kui et al.

2022 • In Proceedings of the 38th IEEE International Conference on Software Maintenance and Evolution

Peer reviewed

Permalink
https://hdl.handle.net/10993/52227

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

ICSME_2022_Flaky_test.pdf

Author preprint (729.66 kB)

Download

All documents in ORBilu are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

Flaky tests; Deep learning; Program dependency

Abstract :

[en] —Regression testing is a widely adopted approach to expose change-induced bugs as well as to verify the correctness/robustness of code in modern software development settings. Unfortunately, the occurrence of flaky tests leads to a significant increase in the cost of regression testing and eventually reduces the productivity of developers (i.e., their ability to find and fix real problems). State-of-the-art approaches leverage dynamic test information obtained through expensive re-execution of test cases to effectively identify flaky tests. Towards accounting for scalability constraints, some recent approaches have built on static test case features, but fall short on effectiveness. In this paper, we introduce PEELER, a new fully static approach for predicting flaky tests through exploring a representation of test cases based on the data dependency relations. The predictor is then trained as a neural network based model, which achieves at the same time scalability (because it does not require any test execution), effectiveness (because it exploits relevant test dependency features), and practicality (because it can be applied in the wild to find new flaky tests). Experimental validation on 17,532 test cases from 21 Java projects shows that PEELER outperforms the state-of-the-art FlakeFlagger by around 20 percentage points: we catch 22% more flaky tests while yielding 51% less false positives. Finally, in a live study with projects in-the-wild, we reported to developers 21 flakiness cases, among which 12 have already been confirmed by developers as being indeed flaky.

Disciplines :

Computer science

Author, co-author :

Qin, Yihao; National University of Defense Technology, China

Wang, Shangwen; National University of Defense Technology, China

Liu, Kui; Huawei Software Engineering Application Technology Lab, China

Lin, Bo; National University of Defense Technology

Wu, Hongjun; National University of Defense Technology, China

Li, Li; Monash University, Australia

Mao, Xiaoguang; National University of Defense Technology, China

BISSYANDE, Tegawendé François D Assise ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > TruX

External co-authors :

yes

Language :

English

Title :

PEELER: Learning to Effectively Predict Flakiness without Running Tests

Publication date :

October 2022

Event name :

38th IEEE International Conference on Software Maintenance and Evolution

Event organizer :

IEEE

Event place :

Limassol, Cyprus

Event date :

from 02-10-2022 to 07-10-2022

Audience :

International

Main work title :

Proceedings of the 38th IEEE International Conference on Software Maintenance and Evolution

Pages :

1-12

Peer reviewed :

Peer reviewed

Focus Area :

Security, Reliability and Trust

Additional URL :

https://www.researchgate.net/publication/361324187_Peeler_Learning_to_Effectively_Predict_Flakiness_without_Running_Tests/link/62aabeb123f3283e3aeadab7/download

European Projects :

H2020 - 949014 - NATURAL - Natural Program Repair

Funders :

CE - Commission Européenne [BE]

Available on ORBilu :

since 24 September 2022

Statistics

Number of views

149 (19 by Unilu)

Number of downloads

217 (15 by Unilu)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

Bibliography

H. Leung and L. White, "Insights into regression testing (software testing)," in Proceedings. Conference on Software Maintenance - 1989, 1989, pp. 60-69.
J. Micco, "The state of continuous integration testing at google," 2017, https://bit.ly/2OohAip.
"The state of continuous integration testing," 2020, https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45880.pdf.
W. Lam, P. Godefroid, S. Nath, A. Santhiar, and S. Thummalapenta, "Root causing flaky tests in a large-scale industrial setting," in Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis, ser. ISSTA 2019. New York, NY, USA: Association for Computing Machinery, 2019, p. 101-111.
S. Paydar and A. Azamnouri, "An experimental study on flakiness and fragility of randoop regression test suites," in Fundamentals of Software Engineering, H. Hojjat and M. Massink, Eds. Cham: Springer International Publishing, 2019, pp. 111-126.
Z. Fan, "A systematic evaluation of problematic tests generated by evosuite," in Proceedings of the 41st International Conference on Software Engineering: Companion Proceedings, ser. ICSE '19. IEEE Press, 2019, p. 165-167. [Online]. Available: https://doi.org/10.1109/ICSE-Companion.2019.00068
Q. Luo, F. Hariri, L. Eloussi, and D. Marinov, "An empirical analysis of flaky tests," in Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, ser. FSE 2014. New York, NY, USA: Association for Computing Machinery, 2014, p. 643-653.
S. Thorve, C. Sreshtha, and N. Meng, "An empirical study of flaky tests in android apps," in 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME), 09 2018, pp. 534-538.
W. Lam, K. Muşlu, H. Sajnani, and S. Thummalapenta, "A study on the lifecycle of flaky tests," in Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, ser. ICSE '20. New York, NY, USA: Association for Computing Machinery, 2020, p. 1471-1482. [Online]. Available: https://doi.org/10.1145/3377811.3381749
M. T. Rahman and P. C. Rigby, "The impact of failing, flaky, and high failure tests on the number of crash reports associated with firefox builds," in Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ser. ESEC/FSE 2018. New York, NY, USA: Association for Computing Machinery, 2018, p. 857-862.
M. Eck, F. Palomba, M. Castelluccio, and A. Bacchelli, "Understanding flaky tests: The developer's perspective," in Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ser. ESEC/FSE 2019. New York, NY, USA: Association for Computing Machinery, 2019, p. 830-840. [Online]. Available: https://doi.org/10.1145/3338906.3338945
"Circleci: continuous integration and delivery," https://circleci.com/, 2021.
"Flakytest," https://developer.android.com/reference/androidx/test/filters/FlakyTest, 2021.
"Flaky test handler plugin - jenkins - jenkins wiki," https://wiki.jenkins.io/display/JENKINS/Flaky+Test+Handler+Plugin, 2021.
"Maven surefire plugin - rerun failing tests," https://maven.apache.org/surefire/maven-surefireplugin/examples/rerun-failing-tests.html, 2021.
"pytest: helps you write better programs," https://docs.pytest.org/en/latest/, 2021.
"Selenium and testng," https://testng.org/doc/selenium.html, 2021.
J. Bell, O. Legunsen, M. Hilton, L. Eloussi, T. Yung, and D. Marinov, "Deflaker: Automatically detecting flaky tests," in 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE), 2018, pp. 433-444.
W. Lam, R. Oei, A. Shi, D. Marinov, and T. Xie, "idflakies: A framework for detecting and partially classifying flaky tests," in 2019 12th IEEE Conference on Software Testing, Validation and Verification (ICST), 2019, pp. 312-322.
Z. Dong, A. Tiwari, X. L. Yu, and A. Roychoudhury, "Flaky test detection in android via event order exploration," in Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ser. ESEC/FSE 2021. New York, NY, USA: Association for Computing Machinery, 2021, p. 367-378. [Online]. Available: https://doi.org/10.1145/3468264.3468584
G. Pinto, B. Miranda, S. Dissanayake, M. d'Amorim, C. Treude, and A. Bertolino, "What is the vocabulary of flaky tests?" in Proceedings of the 17th International Conference on Mining Software Repositories, ser. MSR '20. New York, NY, USA: Association for Computing Machinery, 2020, p. 492-502.
A. Alshammari, C. Morris, M. Hilton, and J. Bell, "Flakeflagger: Predicting flakiness without rerunning tests," in 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), 2021, pp. 1572-1584.
H. Wu, Z. Zhang, S. Wang, Y. Lei, B. Lin, Y. Qin, H. Zhang, and X. Mao, "Peculiar: Smart contract vulnerability detection based on crucial data flow graph and pre-training techniques," in 2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE). IEEE, 2021.
C. Fang, Z. Liu, Y. Shi, J. Huang, and Q. Shi, "Functional code clone detection with syntax and semantics fusion learning," in Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis, 2020, pp. 516-527.
"Javaparser: The most popular parser for the java language," https://javaparser.org/, 2021.
U. Alon, M. Zilberstein, O. Levy, and E. Yahav, "code2vec: learning distributed representations of code," Proceedings of the ACM on Programming Languages, vol. 3, no. POPL, pp. 40:1-40:29, 2019.
B. Karlik and A. Vehbi, "Performance analysis of various activation functions in generalized mlp architectures of neural networks," International Journal of Artificial Intelligence and Expert Systems (IJAE), vol. 1, no. 4, pp. 111-122, 2011.
U. Alon, S. Brody, O. Levy, and E. Yahav, "code2seq: Generating sequences from structured representations of code," in Proceedings of the 7th International Conference on Learning Representations. Open-Review.net, 2019.
Y. Li, S. Wang, T. N. Nguyen, and S. Van Nguyen, "Improving bug detection via context-based code representation learning and attentionbased neural networks," Proceedings of the ACM on Programming Languages, vol. 3, no. OOPSLA, pp. 1-30, 2019.
R. Rubinstein, "The cross-entropy method for combinatorial and continuous optimization." Methodology & Computing in Applied Probability, vol. 1, no. 2, pp. 127-190, 1999.
N. Japkowicz and S. Stephen, "The class imbalance problem: A systematic study," Intell. Data Anal., vol. 6, no. 5, p. 429-449, Oct. 2002.
S. Wang, M. Wen, B. Lin, and X. Mao, "Lightweight global and local contexts guided method name recommendation with prior knowledge," in Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), 2021.
E. W. Høst and B. M. Østvold, "Debugging method names," in Proceedings of the 23rd European Conference on Object-Oriented Programming (ECOOP), 2009, p. 294-317.
L. Jiang, H. Liu, and H. Jiang, "Machine learning based recommendation of method names: How far are we," in 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2019, pp. 602-614.
P. E. Strandberg, T. J. Ostrand, E. J. Weyuker, W. Afzal, and D. Sundmark, "Intermittently failing tests in the embedded systems domain," in Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis, 2020, pp. 337-348.
A. Shi, J. Bell, and D. Marinov, "Mitigating the effects of flaky tests on mutation testing," in Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis, 2019, pp. 112-122.
Y. Qin, S. Wang, K. Liu, X. Mao, and T. F. Bissyandé, "On the impact of flaky tests in automated program repair," in Proceedings of the 28th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), 2021, pp. 295-306.
S. Wang, M. Wen, B. Lin, H. Wu, Y. Qin, D. Zou, X. Mao, and H. Jin, "Automated patch correctness assessment: How far are we?" in Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering. ACM, 2020.
A. Shi, W. Lam, R. Oei, T. Xie, and D. Marinov, "Ifixflakies: A framework for automatically fixing order-dependent flaky tests," in ESEC/FSE 2019 - Proceedings of the 2019 27th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering. Association for Computing Machinery, Inc, Aug. 2019, pp. 545-555.
P. Zhang, Y. Jiang, A. Wei, V. Stodden, D. Marinov, and A. Shi, "Domain-specific fixes for flaky tests with wrong assumptions on underdetermined specifications," in 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 2021, pp. 50-61.
S. Dutta, A. Shi, and S. Misailovic, "Flex: Fixing flaky tests in machine learning projects by updating assertion bounds," in Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), 2021.
M. Geng, S. Wang, D. Dong, S. Gu, W. Ruan, X. Mao, and X. Liao, "Intermediate representation-based semantic graph for cross-language code search," in 2022 IEEE International Conference on Software Maintenance and Evolution (ICSME), 2022.
Y. Zou, B. Ban, Y. Xue, and Y. Xu, "Ccgraph: a pdg-based code clone detector with approximate graph matching," in 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2020, pp. 931-942.
Y. Zhuang, Z. Liu, P. Qian, Q. Liu, X. Wang, and Q. He, "Smart contract vulnerability detection using graph neural network." in IJCAI, 2020, pp. 3283-3290.