[en] Asynchronous waits are a common root cause of flaky tests and a major time-influential factor of web application testing. We build a dataset of 49 reproducible asynchronous wait flaky tests and their fixes from 26 open-source projects to study their characteristics in web testing. Our study reveals that developers adjusted wait time to address asynchronous wait flakiness in about 63% of cases (31 out of 49), even when the underlying causes lie elsewhere. From this, we introduce TRaf, an automated time-based repair for asynchronous wait flakiness in web applications.
TRaf
determines appropriate wait times for asynchronous calls in web applications by analyzing code similarity and past change history. Its key insight is that efficient wait times can be inferred from the current or past codebase since developers tend to repeat similar mistakes. Our analysis shows that TRaf can statically suggest a shorter wait time to alleviate async wait flakiness immediately upon the detection, reducing test execution time by 11.1% compared to the timeout values initially chosen by developers. With optional dynamic tuning, TRaf can reduce the execution time by 16.8% in its initial refinement compared to developer-written patches and by 6.2% compared to the post-refinements of these original patches. Overall, we sent 16 pull requests from our dataset, each fixing one test, to the developers. So far, three have been accepted by the developers.
Disciplines :
Sciences informatiques
Auteur, co-auteur :
PEI, Yu ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SerVal
SOHN, Jeongju ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust > SerVal > Team Mike PAPADAKIS ; Kyungpook National University, Korea
HABCHI, Sarra ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust > SerVal > Team Yves LE TRAON ; Ubisoft, Canada
PAPADAKIS, Mike ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SerVal
Co-auteurs externes :
yes
Langue du document :
Anglais
Titre :
Non-Flaky and Nearly-Optimal Time-based Treatment of Asynchronous Wait Web Tests
Date de publication/diffusion :
13 septembre 2024
Titre du périodique :
ACM Transactions on Software Engineering and Methodology
Earl T. Barr, Yuriy Brun, Premkumar Devanbu, Mark Harman, and Federica Sarro. 2014. The plastic surgery hypothesis. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, New York, NY, 306–317. DOI: https://doi.org/10.1145/2635868.2635898
Scott Chacon and Ben Straub. 2014. Pro Git. Apress.
Yang Chen, Alperen Yildiz, Darko Marinov, and Reyhaneh Jabbarvand. 2023. Transforming test suites into croissants. In Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, 1080–1092.
Zhen Dong, Abhishek Tiwari, Xiao Liang Yu, and Abhik Roychoudhury. 2021. Flaky test detection in Android via event order exploration. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 367–378.
Saikat Dutta, August Shi, Rutvik Choudhary, Zhekun Zhang, Aryaman Jain, and Sasa Misailovic. 2020. Detecting flaky tests in probabilistic and machine learning applications. In Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA’20), 211–224. DOI: https://doi.org/10.1145/3395363.3397366
Saikat Dutta, August Shi, and Sasa Misailovic. 2021. FLEX: Fixing flaky tests in machine learning projects by updating assertion bounds. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE’21). ACM New York, NY, 603–614. DOI: https://doi.org/10.1145/3468264.3468615
Dennis F. Galletta, Raymond Henry, Scott McCoy, and Peter Polak. 2004. Web site delays: How tolerant are users? Journal of the Association for Information Systems 5, 1 (2004), 1–28.
Kening Gao, Bin Zhang, Yin Zhang, Hongru Wei, and Anxiang Ma. 2008. Study on semantic representation of web information based on repeating patterns. In Proceedings of the 2008 5th International Conference on Fuzzy Systems and Knowledge Discovery, Vol. 4. IEEE, 482–486.
Martin Gruber, Stephan Lukasczyk, Florian Krois, and Gordon Fraser. 2021. An empirical study of flaky tests in Python. In Proceedings of the 2021 IEEE 14th International Conference on Software Testing, Verification and Validation (ICST’21), 148–158. DOI: https://doi.org/10.1109/ICST49551.2021.00026
Sarra Habchi, Guillaume Haben, Mike Papadakis, Maxime Cordy, and Yves Le Traon. 2022. A qualitative study on the sources, impacts, and mitigation strategies of flaky tests. In Proceedings of the 2022 IEEE Conference on Software Testing, Verification and Validation (ICST). IEEE, 244–255.
Sarra Habchi, Guillaume Haben, Jeongju Sohn, Adriano Franci, Mike Papadakis, Maxime Cordy, and Yves Le Traon. 2022. What made this test flake? Pinpointing classes responsible for test flakiness. In Proceedings of the 2022 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 352–363.
Negar Hashemi, Amjed Tahir, and Shawn Rasheed. 2022. An empirical study of flaky tests in JavaScript. In Proceedings of the 2022 IEEE International Conference on Software Maintenance and Evolution (ICSME), 24–34. DOI: https://doi.org/10.1109/ICSME55016.2022.00011
Michael Hilton, Timothy Tunnell, Kai Huang, Darko Marinov, and Danny Dig. 2016. Usage, costs, and benefits of continuous integration in open-source projects. In Proceedings of the 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 426–437.
S. Kim, T. Zimmermann, E. J. Whitehead Jr., and A. Zeller. 2007. Predicting faults from cached history. In Proceedings of the 29th International Conference on Software Engineering (ICSE’07), 489–498. DOI: https://doi.org/10.1109/ICSE.2007.66
Adriaan Labuschagne, Laura Inozemtseva, and Reid Holmes. 2017. Measuring the cost of regression testing in practice: A study of Java projects using continuous integration. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering (ESEC/FSE’17). ACM, New York, NY, 821–830. DOI: https://doi.org/10.1145/3106237.3106288
Wing Lam, Patrice Godefroid, Suman Nath, Anirudh Santhiar, and Suresh Thummalapenta. 2019. Root causing flaky tests in a large-scale industrial setting. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA’19). ACM, New York, NY, 101–111. DOI: https://doi.org/10.1145/3293882.3330570
Wing Lam, Kivanc Muslu, Hitesh Sajnani, and Suresh Thummalapenta. 2020. A study on the lifecycle of flaky tests. In Proceedings of the International Conference on Software Engineering, 1471–1482. DOI: https://doi.org/10.1145/3377811. 3381749
Wing Lam, Stefan Winter, Angello Astorga, Victoria Stodden, and Darko Marinov. 2020. Understanding reproducibility and characteristics of flaky tests through test reruns in Java projects. In Proceedings of the 2020 IEEE 31st International Symposium on Software Reliability Engineering (ISSRE). IEEE, 403–413.
Wing Lam, Stefan Winter, Anjiang Wei, Tao Xie, Darko Marinov, and Jonathan Bell. 2020. A large-scale longitudinal study of flaky tests. Proceedings of the ACM on Programming Languages 4, OOPSLA (2020), 1–29.
Tien-Duy B. Le, Richard J. Oentaryo, and David Lo. 2015. Information retrieval and spectrum based bug localization: Better together. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE’15). ACM, New York, NY, 579–590. DOI: https://doi.org/10.1145/2786805.2786880
Claire Le Goues, ThanhVu Nguyen, Stephanie Forrest, and Westley Weimer. 2012. GenProg: A generic method for automatic software repair. IEEE Transactions on Software Engineering 38, 1 (2012), 54–72. DOI: https://doi.org/10.1109/TSE.2011.104
Claire Leong, Abhayendra Singh, John Micco, Mike Papadakis, and Yves le traon. 2019. Assessing transition-based test selection algorithms at Google. In Proceedings of the International Conference on Software Engineering, 101–110.
Maurizio Leotta, Diego Clerissi, Filippo Ricca, and Paolo Tonella. 2014. Visual vs. DOM-based web locators: An empirical study. In Proceedings of the Web Engineering: 14th International Conference (ICWE’14), Proceedings 14. Springer, 322–340.
Maurizio Leotta, Andrea Stocco, Filippo Ricca, and Paolo Tonella. 2015. Automated generation of visual web tests from DOM-based web tests. In Proceedings of the 30th Annual ACM Symposium on Applied Computing, 775–782.
Chengpeng Li, Chenguang Zhu, Wenxi Wang, and August Shi. 2022. Repairing order-dependent flaky tests via test generation. In Proceedings of the 44th International Conference on Software Engineering (ICSE’22). ACM, New York, NY, 1881–1892. DOI: https://doi.org/10.1145/3510003.3510173
Kui Liu, Anil Koyuncu, Dongsun Kim, and Tegawendé F. Bissyandé. 2019. TBar: Revisiting template-based automated program repair. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA’19). ACM, New York, NY, 31–42. DOI: https://doi.org/10.1145/3293882.3330577
Qingzhou Luo, Farah Hariri, Lamyaa Eloussi, and Darko Marinov. 2014. An empirical analysis of flaky tests. In Proceedings of the ACM SIGSOFT Symposium on the Foundations of Software Engineering, Vol. 16-21-November-2014, 643–653. DOI: https://doi.org/10.1145/2635868.2635920
Jean Malm, Adnan Causevic, Björn Lisper, and Sigrid Eldh. 2020. Automated analysis of flakiness-mitigating delays. In Proceedings of the IEEE/ACM 1st International Conference on Automation of Software Test, 81–84.
Matias Martinez, Westley Weimer, and Martin Monperrus. 2014. Do the fix ingredients already exist? An empirical inquiry into the redundancy assumptions of program repair approaches. In Companion Proceedings of the 36th International Conference on Software Engineering (ICSE Companion’14). ACM, New York, NY, 492–495. DOI: https://doi.org/10.1145/2591062.2591114
Maximiliano Agustín Mascheroni and Emanuel Irrazábal. 2018. Identifying key success factors in stopping flaky tests in automated REST service testing. Journal of Computer Science & Technology, 18, 02 (2018).
John Micco. 2017. The State of Continuous Integration Testing Google. Retrieved from https://static.googleusercontent.com/media/research.google.com/zh-CN//pubs/archive/45880.pdf
Dario Olianas, Maurizio Leotta, and Filippo Ricca. 2022. SleepReplacer: A novel tool-based approach for replacing thread sleeps in selenium WebDriver test code. Software Quality Journal (2022), Volume 30, 1–33.
Owain Parry, Gregory M. Kapfhammer, Michael Hilton, and Phil McMinn. 2021. A survey of flaky tests. ACM Transactions on Software Engineering and Methodology 31, 1, Article 17 (Oct. 2021), 74 pages. DOI: https://doi.org/10.1145/3476105
Gustavo Pinto, Breno Miranda, Supun Dissanayake, Marcelo d’Amorim, Christoph Treude, and Antonia Bertolino. 2020. What is the vocabulary of flaky tests? In Proceedings of the 17th International Conference on Mining Software Repositories, 492–502.
Kai Presler-Marshall, Eric Horton, Sarah Heckman, and Kathryn Stolee. 2019. Wait, wait. No, tell me. Analyzing selenium configuration effects on test flakiness. In Proceedings of the 2019 IEEE/ACM 14th International Workshop on Automation of Software Test (AST). IEEE, 7–13.
Foyzur Rahman, Daryl Posnett, Abram Hindle, Earl Barr, and Premkumar Devanbu. 2011. BugCache for inspections: Hit or miss? In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering (ESEC/FSE’11). ACM, New York, NY, 322–331. DOI: https://doi.org/10.1145/2025113.2025157
M. Tajmilur Rahman and Peter C. Rigby. 2018. The impact of failing, flaky, and high failure tests on the number of crash reports associated with Firefox builds. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE’18), 857–862. DOI: https://doi.org/10.1145/3236024.3275529
Julian Risch and Ralf Krestel. 2019. Measuring and facilitating data repeatability in web science. Datenbank-Spektrum 19 (2019), 117–126.
Alan Romano, Zihe Song, Sampath Grandhi, Wei Yang, and Weihang Wang. 2021. An empirical analysis of UI-based flaky tests. In Proceedings of the 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), 1585–1597. DOI: https://doi.org/10.1109/ICSE43902.2021.00141
R. K. Saha, M. Lease, S. Khurshid, and D. E. Perry. 2013. Improving bug localization using structured information retrieval. In Proceedings of the 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE), 345–355. DOI: https://doi.org/10.1109/ASE.2013.6693093
August Shi, Jonathan Bell, and Darko Marinov. 2019. Mitigating the effects of flaky tests on mutation testing. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA’19), 296–306. DOI: https://doi.org/10.1145/3293882.3330568
August Shi, Wing Lam, Reed Oei, Tao Xie, and Darko Marinov. 2019. iFixFlakies: A framework for automatically fixing order-dependent flaky tests. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 545–555.
Jeongju Sohn, Yasutaka Kamei, Shane McIntosh, and Shin Yoo. 2021. Leveraging fault localisation to enhance defect prediction. In Proceedings of the 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), 284–294. DOI: https://doi.org/10.1109/SANER50967.2021.00034
Andrea Stocco, Rahulkrishna Yandrapally, and Ali Mesbah. 2018. Visual web test repair. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 503–514.
Anjiang Wei, Pu Yi, Zhengxi Li, Tao Xie, Darko Marinov, and Wing Lam. 2022. Preempting flaky tests via non-idempotent-outcome tests. In Proceedings of the 44th International Conference on Software Engineering, 1730–1742.
Ming Wen, Junjie Chen, Rongxin Wu, Dan Hao, and Shing-Chi Cheung. 2018. Context-aware patch generation for better automated program repair. In Proceedings of the 40th International Conference on Software Engineering (ICSE ’18). ACM, New York, NY, 1–11. DOI: https://doi.org/10.1145/3180155.3180233
Ming Wen, Rongxin Wu, and Shing-Chi Cheung. 2016. Locus: Locating bugs from software changes. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering (ASE’16). ACM, New York, NY, 262–273. DOI: https://doi.org/10.1145/2970276.2970359
Rahulkrishna Yandrapally and Ali Mesbah. 2021. Mutation analysis for assessing end-to-end web tests. In Proceedings of the 2021 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 183–194.
Shin Yoo and Mark Harman. 2012. Regression testing minimisation, selection and prioritisation: A survey. Software Testing, Verification, and Reliability 22, 2 (Mar. 2012), 67–120.