[en] Rather than tediously writing unit tests manually,
tools can be used to generate them automatically — sometimes
even resulting in higher code coverage than manual testing. But
how good are these tests at actually finding faults? To answer
this question, we applied three state-of-the art unit test generation
tools for Java (Randoop, EvoSuite, and Agitar) to the 357 faults
in the Defects4J dataset and investigated how well the generated
test suites perform at detecting faults. Although 55.7% of the
faults were found by automatically generated tests overall, only
19.9% of the test suites generated in our experiments actually
detected a fault. By studying the performance and the problems of
the individual tools and their tests, we derive insights to support
the development of automated unit test generators, in order to
increase the fault detection rate in the future. These include
1) improving coverage obtained so that defective statements
are actually executed in the first instance, 2) techniques for
propagating faults to the output, coupled with the generation
of more sensitive assertions for detecting them, and 3) better
simulation of the execution environment to detecting faults that
are dependent on external factors, for example the date and time.
Disciplines :
Sciences informatiques
Auteur, co-auteur :
Shamshiri, Sina
Just, Rene
Rojas, Jose Miguel
Fraser, Gordon
McMinn, Phil
ARCURI, Andrea; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)
Co-auteurs externes :
yes
Langue du document :
Anglais
Titre :
Do Automatically Generated Unit Tests Find Real Faults? An Empirical Study of Effectiveness and Challenges
Date de publication/diffusion :
2015
Nom de la manifestation :
Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE)
Date de la manifestation :
9-13 November 2015
Titre de l'ouvrage principal :
Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE)
Agitar One (2014), www.agitar.com/developers/junit factory.html, Last visited on 01.08.2014
Alshraideh, M., Bottaci, L.: Search-based software test data generation for string data using program-specific search operators. Software Testing, Verification and Reliability (STVR) 16(3), 175-203 (2006)
Andrews, J.H., Briand, L.C., Labiche, Y.: Is mutation an appropriate tool for testing experiments? In: Proc. of the Int. Conference on Software Engineering (ICSE). pp. 402-411. IEEE (2005)
Andrews, J.H., Li, F.C., Menzies, T.: Nighthawk: A two-level geneticrandom unit test data generator. In: Proc. of the Int. Conference on Automated Software Engineering (ASE). pp. 144-153. ACM (2007)
Arcuri, A., Fraser, G., Galeotti, J.P.: Automated unit test generation for classes with environment dependencies. In: Proc. of the Int. Conference on Automated Software Engineering (ASE). pp. 79-90. ACM (2014)
Baresi, L., Lanzi, P.L., Miraz, M.: Testful: an evolutionary test approach for java. In: Proc. of the Int. Conference on Software Testing, Verification and Validation (ICST). pp. 185-194. IEEE (2010)
Bell, J., Kaiser, G.: Unit test virtualization with VMVM. In: Proc. of the Int. Conference on Software Engineering (ICSE). pp. 550-561. ACM (2014)
Beyene, M., Andrews, J.H.: Generating string test data for code coverage. In: Proc. of the Int. Conference on Software Testing, Verification and Validation (ICST). pp. 270-279. IEEE (2012)
Analytix CodePro (2014), developers.google.com/java-dev-tools/ codepro/doc/, Last visited on 01.08.2014
Csallner, C., Smaragdakis, Y.: JCrasher: an automatic robustness tester for java. Software: Practice and Experience (SP and E) 34(11), 1025-1050 (2004)
Elbaum, S., Chin, H.N., Dwyer, M.B., Dokulil, J.: Carving differential unit test cases from system test cases. In: Proc. of the Symposium on the Foundations of Software Engineering (FSE). pp. 253-264. ACM (2006)
Fraser, G., Arcuri, A.: The seed is strong: Seeding strategies in searchbased software testing. In: Proc. of the Int. Conference on Software Testing, Verification and Validation (ICST). pp. 121-130. IEEE (2012)
Fraser, G., Arcuri, A.: Whole test suite generation. IEEE Transactions on Software Engineering (TSE) 39(2), 276-291 (2013)
Fraser, G., Staats, M., McMinn, P., Arcuri, A., Padberg, F.: Does automated white-box test generation really help software testers? In: Proc. of the Int. Symposium on Software Testing and Analysis (ISSTA). pp. 291-301. ACM (2013)
Fraser, G., Zeller, A.: Exploiting common object usage in test case generation. In: Proc. of the Int. Conference on Software Testing, Verification and Validation (ICST). pp. 80-89. IEEE (2011)
Fraser, G., Zeller, A.: Mutation-driven generation of unit tests and oracles. IEEE Transactions on Software Engineering (TSE) 38(2), 278-292 (2012)
Galeotti, J.P., Fraser, G., Arcuri, A.: Improving search-based test suite generation with dynamic symbolic execution. In: Int. Conference on Software Reliability Engineering (ISSRE). pp. 360-369. IEEE (2013)
Ganesh, V., Kieżun, A., Artzi, S., Guo, P.J., Hooimeijer, P., Ernst, M.: Hampi: A string solver for testing, analysis and vulnerability detection. In: Proc. of the Int. Conference on Computer Aided Verification (CAV). pp. 1-19. Springer (2011)
Godefroid, P., Klarlund, N., Sen, K.: Dart: directed automated random testing. ACM Sigplan Notices 40(6), 213-223 (2005)
Harman, M., Hu, L., Hierons, R., Wegener, J., Sthamer, H., Baresel, A., Roper, M.: Testability transformation. IEEE Transactions on Software Engineering (TSE) 30(1), 3-16 (2004)
Harrold, M.J., Rothermel, G.: Performing data flow testing on classes. ACM SIGSOFT Software Engineering Notes 19(5), 154-163 (1994)
Islam, M., Csallner, C.: Dsc+Mock: A test case + mock class generator in support of coding against interfaces. In: Int. Workshop on Dynamic Analysis (WODA). pp. 26-31. ACM (2010)
Jaygarl, H., Kim, S., Xie, T., Chang, C.K.: OCAT: object capture-based automated testing. In: Proc. of the Int. Symposium on Software Testing and Analysis (ISSTA). pp. 159-170. ACM (2010)
Parasoft JTest (2014), www.parasoft.com/jtest, Last visited on 01.08.2014
Just, R., Jalali, D., Ernst, M.D.: Defects4J: A database of existing faults to enable controlled testing studies for java programs. In: Proc. of the Int. Symposium on Software Testing and Analysis (ISSTA). pp. 437-440. ACM (2014)
Just, R., Jalali, D., Inozemtseva, L., Ernst, M.D., Holmes, R., Fraser, G.: Are mutants a valid substitute for real faults in software testing? In: Proc. of the Symposium on the Foundations of Software Engineering (FSE). pp. 654-665. ACM (2014)
Luo, Q., Hariri, F., Eloussi, L., Marinov, D.: An empirical analysis of flaky tests. In: Proc. of the Symposium on the Foundations of Software Engineering (FSE). pp. 643-653. ACM (2014)
McMinn, P., Shahbaz, M., Stevenson, M.: Search-based test input generation for string data types using the results of web queries. In: Proc. of the Int. Conference on Software Testing, Verification and Validation (ICST). pp. 141-150. IEEE (2012)
Mouchawrab, S., Briand, L.C., Labiche, Y., Di Penta, M.: Assessing, comparing, and combining state machine-based testing and structural testing: a series of experiments. IEEE Transactions on Software Engineering (TSE) 37(2), 161-187 (2011)
Pacheco, C., Ernst, M.D.: Randoop: feedback-directed random testing for Java. In: Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA). pp. 815-816. ACM (2007)
Park, S., Hossain, B.M.M., Hussain, I., Csallner, C., Grechanik, M., Taneja, K., Fu, C., Xie, Q.: CarFast: achieving higher statement coverage faster. In: Proc. of the Symposium on the Foundations of Software Engineering (FSE). pp. 35:1-35:11. ACM (2012)
PǍsǍreanu, C.S., Rungta, N.: Symbolic PathFinder: symbolic execution of Java bytecode. In: Proc. of the Int. Conference on Automated Software Engineering (ASE). pp. 179-180. ACM (2010)
Prasetya, I.W.B.: T3, a combinator-based random testing tool for java: benchmarking. In: Future Internet Testing, pp. 101-110. Springer (2014)
Runeson, P., Andersson, C., Thelin, T., Andrews, A., Berling, T.: What do we know about defect detection methods? Software, IEEE 23(3), 82-90 (2006)
Savonia, A., Evans, B.: Crap4J URL: http://www.crap4j.org/ (2014), Last visited on 19.01.2015
Sen, K., Agha, G.: CUTE and jCUTE: Concolic unit testing and explicit path model-checking tools. In: Proc. of the Int. Conference on Computer Aided Verification (CAV). pp. 419-423. Springer (2006)
Taneja, K., Zhang, Y., Xie, T.: Moda: Automated test generation for database applications via mock objects. In: Proc. of the Int. Conference on Automated Software Engineering (ASE). pp. 289-292. ACM (2010)
Tillmann, N., Halleux, J.: Pex-white box test generation for .NET. In: Tests and Proofs, pp. 134-153. Springer (2008)
Tonella, P.: Evolutionary testing of classes. In: Proc. of the Int. Symposium on Software Testing and Analysis (ISSTA). pp. 119-128. ACM (2004)
Veanes, M., Halleux, P., Tillmann, N.: Rex: Symbolic regular expression explorer. In: Proc. of the Int. Conference on Software Testing, Verification and Validation (ICST). pp. 498-507. IEEE (2010)
Wood, M., Roper, M., Brooks, A., Miller, J.: Comparing and combining software defect detection techniques: a replicated empirical study. ACM SIGSOFT Software Engineering Notes 22(6), 262-277 (1997)
Xiao, X., Xie, T., Tillmann, N., Halleux, J.: Precise identification of problems for structural test generation. In: Proc. of the Int. Conference on Software Engineering (ICSE). pp. 611-620. ACM (2011)
Xie, T.: Augmenting automatically generated unit-test suites with regression oracle checking. In: European Conference on Object-Oriented Programming (ECOOP), pp. 380-403. Springer (2006)
Xie, T., Marinov, D., Schulte, W., Notkin, D.: Symstra: A framework for generating object-oriented unit tests using symbolic execution. In: Int. Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS), pp. 365-381. Springer (2005)
Zhang, S., Jalali, D., Wuttke, J., Muşlu, K., Lam, W., Ernst, M.D., Notkin, D.: Empirically revisiting the test independence assumption. In: Proc. of the Int. Symposium on Software Testing and Analysis (ISSTA). pp. 385-396. ACM (2014)