Do Automatically Generated Unit Tests Find Real Faults? An Empirical Study of Effectiveness and Challenges

Shamshiri, Sina; Just, Rene; Rojas, Jose Miguel; Fraser, Gordon; McMinn, Phil; Arcuri, Andrea

Request a copy

Paper published in a book (Scientific congresses, symposiums and conference proceedings)

Do Automatically Generated Unit Tests Find Real Faults? An Empirical Study of Effectiveness and Challenges

Shamshiri, Sina; Just, Rene; Rojas, Jose Miguel et al.

2015 • In Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE)

Peer reviewed

Permalink
https://hdl.handle.net/10993/21589

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

CR-Submission-PID3864821.pdf

Author preprint (492.74 kB)

Request a copy

All documents in ORBilu are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Abstract :

[en] Rather than tediously writing unit tests manually, tools can be used to generate them automatically — sometimes even resulting in higher code coverage than manual testing. But how good are these tests at actually finding faults? To answer this question, we applied three state-of-the art unit test generation tools for Java (Randoop, EvoSuite, and Agitar) to the 357 faults in the Defects4J dataset and investigated how well the generated test suites perform at detecting faults. Although 55.7% of the faults were found by automatically generated tests overall, only 19.9% of the test suites generated in our experiments actually detected a fault. By studying the performance and the problems of the individual tools and their tests, we derive insights to support the development of automated unit test generators, in order to increase the fault detection rate in the future. These include 1) improving coverage obtained so that defective statements are actually executed in the first instance, 2) techniques for propagating faults to the output, coupled with the generation of more sensitive assertions for detecting them, and 3) better simulation of the execution environment to detecting faults that are dependent on external factors, for example the date and time.

Disciplines :

Computer science

Author, co-author :

Shamshiri, Sina

Just, Rene

Rojas, Jose Miguel

Fraser, Gordon

McMinn, Phil

Arcuri, Andrea; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)

External co-authors :

yes

Language :

English

Title :

Do Automatically Generated Unit Tests Find Real Faults? An Empirical Study of Effectiveness and Challenges

Publication date :

2015

Event name :

Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE)

Event date :

9-13 November 2015

Main work title :

Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE)

Publisher :

ACM

Peer reviewed :

Peer reviewed

Available on ORBilu :

since 25 July 2015

Statistics

Number of views

166 (4 by Unilu)

Number of downloads

4 (3 by Unilu)

More statistics

Scopus citations^®

176

Scopus citations^®
without self-citations

152

Bibliography

Agitar One (2014), www.agitar.com/developers/junit factory.html, Last visited on 01.08.2014
Alshraideh, M., Bottaci, L.: Search-based software test data generation for string data using program-specific search operators. Software Testing, Verification and Reliability (STVR) 16(3), 175-203 (2006)
Andrews, J.H., Briand, L.C., Labiche, Y.: Is mutation an appropriate tool for testing experiments? In: Proc. of the Int. Conference on Software Engineering (ICSE). pp. 402-411. IEEE (2005)
Andrews, J.H., Li, F.C., Menzies, T.: Nighthawk: A two-level geneticrandom unit test data generator. In: Proc. of the Int. Conference on Automated Software Engineering (ASE). pp. 144-153. ACM (2007)
Arcuri, A., Fraser, G., Galeotti, J.P.: Automated unit test generation for classes with environment dependencies. In: Proc. of the Int. Conference on Automated Software Engineering (ASE). pp. 79-90. ACM (2014)
Baresi, L., Lanzi, P.L., Miraz, M.: Testful: an evolutionary test approach for java. In: Proc. of the Int. Conference on Software Testing, Verification and Validation (ICST). pp. 185-194. IEEE (2010)
Bell, J., Kaiser, G.: Unit test virtualization with VMVM. In: Proc. of the Int. Conference on Software Engineering (ICSE). pp. 550-561. ACM (2014)
Beyene, M., Andrews, J.H.: Generating string test data for code coverage. In: Proc. of the Int. Conference on Software Testing, Verification and Validation (ICST). pp. 270-279. IEEE (2012)
Analytix CodePro (2014), developers.google.com/java-dev-tools/ codepro/doc/, Last visited on 01.08.2014
Csallner, C., Smaragdakis, Y.: JCrasher: an automatic robustness tester for java. Software: Practice and Experience (SP and E) 34(11), 1025-1050 (2004)
Elbaum, S., Chin, H.N., Dwyer, M.B., Dokulil, J.: Carving differential unit test cases from system test cases. In: Proc. of the Symposium on the Foundations of Software Engineering (FSE). pp. 253-264. ACM (2006)
Fraser, G., Arcuri, A.: The seed is strong: Seeding strategies in searchbased software testing. In: Proc. of the Int. Conference on Software Testing, Verification and Validation (ICST). pp. 121-130. IEEE (2012)
Fraser, G., Arcuri, A.: Whole test suite generation. IEEE Transactions on Software Engineering (TSE) 39(2), 276-291 (2013)
Fraser, G., Staats, M., McMinn, P., Arcuri, A., Padberg, F.: Does automated white-box test generation really help software testers? In: Proc. of the Int. Symposium on Software Testing and Analysis (ISSTA). pp. 291-301. ACM (2013)
Fraser, G., Zeller, A.: Exploiting common object usage in test case generation. In: Proc. of the Int. Conference on Software Testing, Verification and Validation (ICST). pp. 80-89. IEEE (2011)
Fraser, G., Zeller, A.: Mutation-driven generation of unit tests and oracles. IEEE Transactions on Software Engineering (TSE) 38(2), 278-292 (2012)
Galeotti, J.P., Fraser, G., Arcuri, A.: Improving search-based test suite generation with dynamic symbolic execution. In: Int. Conference on Software Reliability Engineering (ISSRE). pp. 360-369. IEEE (2013)
Ganesh, V., Kieżun, A., Artzi, S., Guo, P.J., Hooimeijer, P., Ernst, M.: Hampi: A string solver for testing, analysis and vulnerability detection. In: Proc. of the Int. Conference on Computer Aided Verification (CAV). pp. 1-19. Springer (2011)
Godefroid, P., Klarlund, N., Sen, K.: Dart: directed automated random testing. ACM Sigplan Notices 40(6), 213-223 (2005)
Harman, M., Hu, L., Hierons, R., Wegener, J., Sthamer, H., Baresel, A., Roper, M.: Testability transformation. IEEE Transactions on Software Engineering (TSE) 30(1), 3-16 (2004)
Harrold, M.J., Rothermel, G.: Performing data flow testing on classes. ACM SIGSOFT Software Engineering Notes 19(5), 154-163 (1994)
Islam, M., Csallner, C.: Dsc+Mock: A test case + mock class generator in support of coding against interfaces. In: Int. Workshop on Dynamic Analysis (WODA). pp. 26-31. ACM (2010)
Jaygarl, H., Kim, S., Xie, T., Chang, C.K.: OCAT: object capture-based automated testing. In: Proc. of the Int. Symposium on Software Testing and Analysis (ISSTA). pp. 159-170. ACM (2010)
Parasoft JTest (2014), www.parasoft.com/jtest, Last visited on 01.08.2014
Just, R., Jalali, D., Ernst, M.D.: Defects4J: A database of existing faults to enable controlled testing studies for java programs. In: Proc. of the Int. Symposium on Software Testing and Analysis (ISSTA). pp. 437-440. ACM (2014)
Just, R., Jalali, D., Inozemtseva, L., Ernst, M.D., Holmes, R., Fraser, G.: Are mutants a valid substitute for real faults in software testing? In: Proc. of the Symposium on the Foundations of Software Engineering (FSE). pp. 654-665. ACM (2014)
Luo, Q., Hariri, F., Eloussi, L., Marinov, D.: An empirical analysis of flaky tests. In: Proc. of the Symposium on the Foundations of Software Engineering (FSE). pp. 643-653. ACM (2014)
McMinn, P., Shahbaz, M., Stevenson, M.: Search-based test input generation for string data types using the results of web queries. In: Proc. of the Int. Conference on Software Testing, Verification and Validation (ICST). pp. 141-150. IEEE (2012)
Mouchawrab, S., Briand, L.C., Labiche, Y., Di Penta, M.: Assessing, comparing, and combining state machine-based testing and structural testing: a series of experiments. IEEE Transactions on Software Engineering (TSE) 37(2), 161-187 (2011)
Pacheco, C., Ernst, M.D.: Randoop: feedback-directed random testing for Java. In: Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA). pp. 815-816. ACM (2007)
Park, S., Hossain, B.M.M., Hussain, I., Csallner, C., Grechanik, M., Taneja, K., Fu, C., Xie, Q.: CarFast: achieving higher statement coverage faster. In: Proc. of the Symposium on the Foundations of Software Engineering (FSE). pp. 35:1-35:11. ACM (2012)
PǍsǍreanu, C.S., Rungta, N.: Symbolic PathFinder: symbolic execution of Java bytecode. In: Proc. of the Int. Conference on Automated Software Engineering (ASE). pp. 179-180. ACM (2010)
Prasetya, I.W.B.: T3, a combinator-based random testing tool for java: benchmarking. In: Future Internet Testing, pp. 101-110. Springer (2014)
Runeson, P., Andersson, C., Thelin, T., Andrews, A., Berling, T.: What do we know about defect detection methods? Software, IEEE 23(3), 82-90 (2006)
Savonia, A., Evans, B.: Crap4J URL: http://www.crap4j.org/ (2014), Last visited on 19.01.2015
Sen, K., Agha, G.: CUTE and jCUTE: Concolic unit testing and explicit path model-checking tools. In: Proc. of the Int. Conference on Computer Aided Verification (CAV). pp. 419-423. Springer (2006)
Taneja, K., Zhang, Y., Xie, T.: Moda: Automated test generation for database applications via mock objects. In: Proc. of the Int. Conference on Automated Software Engineering (ASE). pp. 289-292. ACM (2010)
Thummalapenta, S., Xie, T., Tillmann, N., Halleux, J., Su, Z.: Synthesizing method sequences for high-coverage testing. ACM SIGPLAN Notices 46(10), 189-206 (2011)
Tillmann, N., Halleux, J.: Pex-white box test generation for .NET. In: Tests and Proofs, pp. 134-153. Springer (2008)
Tonella, P.: Evolutionary testing of classes. In: Proc. of the Int. Symposium on Software Testing and Analysis (ISSTA). pp. 119-128. ACM (2004)
Veanes, M., Halleux, P., Tillmann, N.: Rex: Symbolic regular expression explorer. In: Proc. of the Int. Conference on Software Testing, Verification and Validation (ICST). pp. 498-507. IEEE (2010)
Wood, M., Roper, M., Brooks, A., Miller, J.: Comparing and combining software defect detection techniques: a replicated empirical study. ACM SIGSOFT Software Engineering Notes 22(6), 262-277 (1997)
Xiao, X., Xie, T., Tillmann, N., Halleux, J.: Precise identification of problems for structural test generation. In: Proc. of the Int. Conference on Software Engineering (ICSE). pp. 611-620. ACM (2011)
Xie, T.: Augmenting automatically generated unit-test suites with regression oracle checking. In: European Conference on Object-Oriented Programming (ECOOP), pp. 380-403. Springer (2006)
Xie, T., Marinov, D., Schulte, W., Notkin, D.: Symstra: A framework for generating object-oriented unit tests using symbolic execution. In: Int. Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS), pp. 365-381. Springer (2005)
Zhang, S., Jalali, D., Wuttke, J., Muşlu, K., Lam, W., Ernst, M.D., Notkin, D.: Empirically revisiting the test independence assumption. In: Proc. of the Int. Symposium on Software Testing and Analysis (ISSTA). pp. 385-396. ACM (2014)