[en] Although white-box regression test prioritization has been well-studied, the more recently introduced black-box prioritization approaches have neither been compared against each other nor against more well-established white-box techniques. We present a comprehensive experimental comparison of several test prioritization techniques, including well-established white-box strategies and more recently introduced black-box approaches. We found that Combinatorial Interaction Testing and diversity-based techniques (Input Model Diversity and Input Test Set Diameter) perform best among the black-box approaches. Perhaps surprisingly, we found little difference between black-box and white-box performance (at most 4% fault detection rate difference). We also found the overlap between black- and white-box faults to be high: the first 10% of the prioritized test suites already agree on at least 60% of the faults found. These are positive findings for practicing regression testers who may not have source code available, thereby making white-box techniques inapplicable. We also found evidence that both black-box and white-box prioritization remain robust over multiple system releases.
Disciplines :
Sciences informatiques
Auteur, co-auteur :
HENARD, Christopher ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)
PAPADAKIS, Mike ; University of Luxembourg > Faculty of Science, Technology and Communication (FSTC) > Computer Science and Communications Research Unit (CSC)
Harman, Mark; University College London - UCL
Jia, Yue; University College London - UCL
LE TRAON, Yves ; University of Luxembourg > Faculty of Science, Technology and Communication (FSTC) > Computer Science and Communications Research Unit (CSC)
Co-auteurs externes :
yes
Langue du document :
Anglais
Titre :
Comparing White-box and Black-box Test Prioritization
Date de publication/diffusion :
2016
Nom de la manifestation :
38th International Conference on Software Engineering (ICSE'16)
Lieu de la manifestation :
Austin, TX, Etats-Unis
Date de la manifestation :
14-05-2016 to 22-05-2016
Titre de l'ouvrage principal :
38th International Conference on Software Engineering (ICSE'16)
bzip2: A freely available, patent free, high-quality data compressor. http://www.bzip.org/
cloc: Count Lines of Code. http://cloc.sourceforge.net/
gcc: The GNU Compiler Collection. https://gcc.gnu.org/
gcov-A Test coverage program. https://gcc.gnu.org/onlinedocs/gcc/Gcov.html
R: The R project for statistical computing. https://www.r-project.org/
time: Run programs and summarize system resource usage. http://linux.die.net/man/1/time
N. Alshahwan and M. Harman. State aware test case regeneration for improving web application test suite coverage and fault detection. In ISSTA, pages 45-55, 2012
N. Alshahwan and M. Harman. Coverage and fault detection of the output-uniqueness test selection criteria. In ISSTA, pages 181-192, 2014
P. Ammann, M. E. Delamaro, and J. Offutt. Establishing theoretical minimal sets of mutants. In ICST, pages 21-30, 2014
J. H. Andrews, L. C. Briand, Y. Labiche, and A. S. Namin. Using mutation analysis for assessing and comparing testing coverage criteria. IEEE Trans. Softw. Eng., 32(8):608-624, 2006
A. Arcuri and L. Briand. A practical guide for using statistical tests to assess randomized algorithms in software engineering. In ICSE, pages 1-10, 2011
T. Ball. On the limit of control flow analysis for regression test selection. In ISSTA, pages 134-142, 1998
R. C. Bryce and C. J. Colbourn. Prioritized interaction testing for pair-wise coverage with seeding and constraints. Info. & Softw. Tech., 48(10):960-970, 2006
R. C. Bryce and A. M. Memon. Test suite prioritization by interaction coverage. In DOSTA, pages 1-7, 2007
Y. Cao, Z. Zhou, and T. Y. Chen. On the correlation between the effectiveness of metamorphic relations and dissimilarities of test case executions. In QSIC, pages 153-162, 2013
E. G. Cartaxo, P. D. L. Machado, and F. G. O. Neto. On the use of a similarity function for test case selection in the context of model-based testing. Softw. Test., Verif. Reliab., 21(2):75-100, 2011
T. Y. Chen, F. Kuo, R. G. Merkel, and T. H. Tse. Adaptive random testing: The ART of test case diversity. Jrnl. Syst. Softw., 83(1):60-66, 2010
A. R. Cohen and P. M. B. Vitányi. Normalized compression distance of multisets with applications. IEEE Trans. Pattern Anal. Mach. Intell., 37(8):1602-1614, 2015
M. B. Cohen, M. B. Dwyer, and J. Shi. Constructing interaction test suites for highly-configurable systems in the presence of constraints: A greedy approach. IEEE Trans. Softw. Eng., 34(5):633-650, 2008
D. Cotroneo, R. Pietrantuono, and S. Russo. A learning-based method for combining testing techniques. In ICSE, pages 142-151, 2013
H. Do, S. Elbaum, and G. Rothermel. Supporting controlled experimentation with testing techniques: An infrastructure and its potential impact. Empir. Softw. Eng., 10(4):405-435, Oct. 2005
H. Do and G. Rothermel. An empirical study of regression testing techniques incorporating context and lifetime factors and improved cost-benefit models. In FSE, pages 141-151, 2006
S. Elbaum, P. Kallakuri, A. Malishevsky, G. Rothermel, and S. Kanduri. Understanding the effects of changes on the cost-effectiveness of regression testing techniques. Softw. Test., Verif. Reliab., 13(2):65-83, 2003
S. Elbaum, G. Rothermel, and J. Penix. Techniques for improving regression testing in continuous integration development environments. In FSE, pages 235-245, 2014
S. G. Elbaum, A. G. Malishevsky, and G. Rothermel. Prioritizing test cases for regression testing. In ISSTA, pages 102-112, 2000
S. G. Elbaum, A. G. Malishevsky, and G. Rothermel. Incorporating varying test costs and fault severities into test case prioritization. In ICSE, pages 329-338, 2001
S. G. Elbaum, A. G. Malishevsky, and G. Rothermel. Test case prioritization: A family of empirical studies. IEEE Trans. Softw. Eng., 28(2):159-182, 2002
S. G. Elbaum, G. Rothermel, S. Kanduri, and A. G. Malishevsky. Selecting a cost-effective test case prioritization technique. Softw. Qual. Jrnl., 12(3):185-210, 2004
E. Engström, P. Runeson, and M. Skoglund. A systematic review on regression test selection techniques. Info. & Softw. Tech., 52(1):14-30, 2010
E. Engström, M. Skoglund, and P. Runeson. Empirical evaluations of regression test selection techniques: A systematic review. In ESEM, pages 22-31, 2008
R. Feldt, S. M. Poulding, D. Clark, and S. Yoo. Test set diameter: Quantifying the diversity of sets of test cases. CoRR, abs/1506.03482, 2015
M. Gligoric, S. Negara, O. Legunsen, and D. Marinov. An empirical evaluation and comparison of manual and automated test selection. In ASE, pages 361-372, 2014
M. Harman, P. McMinn, J. Souza, and S. Yoo. Search based software engineering: Techniques, taxonomy, tutorial. In Empirical Software Engineering and Verification, pages 1-59. 2012
H. Hemmati, A. Arcuri, and L. C. Briand. Achieving scalable model-based testing through test case diversity. ACM Trans. Softw. Eng. Methodol., 22(1):6, 2013
C. Henard, M. Papadakis, G. Perrouin, J. Klein, P. Heymans, and Y. Le Traon. Bypassing the combinatorial explosion: Using similarity to generate and prioritize t-wise test configurations for software product lines. IEEE Trans. Softw. Eng., 40(7):650-670, July 2014
C. Henard, M. Papadakis, G. Perrouin, J. Klein, and Y. L. Traon. Assessing software product line testing via model-based mutation: An application to similarity testing. In A-MOST, pages 188-197, 2013
P. Jaccard. Etude comparative de la distribution florale dans une portion des alpes et des jura. Bulletin de la Société Vaudoise des Sciences Naturelles, 37:547-579, 1901
Y. Jia and M. Harman. Higher order mutation testing. Info. & Softw. Tech., 51(10):1379-1393, 2009
Y. Jia and M. Harman. An analysis and survey of the development of mutation testing. IEEE Trans. Softw. Eng., 37(5):649-678, 2011
B. Jiang, Z. Zhang, W. K. Chan, and T. H. Tse. Adaptive random test case prioritization. In ASE, pages 233-244, 2009
W. Jin and A. Orso. Bugredux: Reproducing field failures for in-house debugging. In ICSE, pages 474-484, 2012
R. Just, D. Jalali, L. Inozemtseva, M. D. Ernst, R. Holmes, and G. Fraser. Are mutants a valid substitute for real faults in software testing? In FSE, pages 654-665, 2014
J. Kim and A. A. Porter. A history-based test prioritization technique for regression testing in resource constrained environments. In ICSE, pages 119-129, 2002
M. Kintis, M. Papadakis, and N. Malevris. Evaluating mutation testing alternatives: A collateral experiment. In APSEC, pages 300-309, 2010
Y. Ledru, A. Petrenko, S. Boroday, and N. Mandran. Prioritizing test cases with string distances. Autom. Softw. Eng., 19(1):65-95, 2012
Z. Li, M. Harman, and R. M. Hierons. Search algorithms for regression test case prioritization. IEEE Trans. Softw. Eng., 33(4):225-237, 2007
M. Marré and A. Bertolino. Using spanning sets for coverage testing. IEEE Trans. Softw. Eng., 29(11):974-984, Nov. 2003
H. Mei, D. Hao, L. Zhang, L. Zhang, J. Zhou, and G. Rothermel. A static approach to prioritizing junit test cases. IEEE Trans. Softw. Eng., 38(6):1258-1275, 2012
C. D. Nguyen, A. Marchetto, and P. Tonella. Combining model-based and combinatorial testing for effective test case generation. In ISSTA, pages 100-110, 2012
C. Nie and H. Leung. A survey of combinatorial testing. ACM Comput. Surv., 43(2):11, 2011
A. Orso, N. Shi, and M. J. Harrold. Scaling regression testing to large software systems. In FSE, pages 241-251, 2004
M. Papadakis, C. Henard, and Y. L. Traon. Sampling program inputs with mutation analysis: Going beyond combinatorial interaction testing. In ICST, pages 1-10, 2014
M. Papadakis, Y. Jia, M. Harman, and Y. LeTraon. Trivial compiler equivalence: A large scale empirical study of a simple fast and effective equivalent mutant detection technique. In ICSE, pages 936-946, 2015
J. Petke, S. Yoo, M. B. Cohen, and M. Harman. Efficiency and early fault detection with lower and higher strength combinatorial interaction testing. In FSE, pages 26-36, 2013
E. Rogstad, L. C. Briand, and R. Torkar. Test case selection for black-box regression testing of database applications. Info. & Softw. Tech., 55(10):1781-1795, 2013
G. Rothermel, R. H. Untch, C. Chu, and M. J. Harrold. Test case prioritization: An empirical study. In ICSM, pages 179-188, 1999
G. Rothermel, R. H. Untch, C. Chu, and M. J. Harrold. Prioritizing test cases for regression testing. IEEE Trans. Softw. Eng., 27(10):929-948, 2001
R. K. Saha, L. Zhang, S. Khurshid, and D. E. Perry. An information retrieval approach for regression test prioritization based on program changes. In ICSE, pages 268-279, 2015
R. A. Santelices, P. K. Chittimalli, T. Apiwattanapong, A. Orso, and M. J. Harrold. Test-suite augmentation for evolving software. In ASE, pages 218-227, 2008
P. J. Schroeder and B. Korel. Black-box test reduction using input-output analysis. In ISSTA, pages 173-177, 2000
A. Vargha and H. D. Delaney. A Critique and Improvement of the CL Common Language Effect Size Statistics of McGraw and Wong. Jrnl. Educ. Behav. Stat., 25(2):101-132, 2000
P. Vitányi, F. Balbach, R. Cilibrasi, and M. Li. Normalized information distance. In Information Theory and Statistical Learning, pages 45-82. 2009
C. Wohlin, P. Runeson, M. Höst, M. C. Ohlsson, B. Regnell, and A. Wesslén. Experimentation in Software Engineering: An Introduction. 2000
S. Yoo and M. Harman. Regression testing minimization, selection and prioritization: A survey. Softw. Test. Verif. Reliab., 22(2):67-120, Mar. 2012
S. Yoo and M. Harman. Test data regeneration: Generating new test data from existing test data. Softw. Test., Verif. Reliab., 22(3):171-201, May 2012
C. Zhang, A. Groce, and M. A. Alipour. Using test case reduction and prioritization to improve symbolic execution. In ISSTA, pages 160-170, 2014
L. Zhang, D. Hao, L. Zhang, G. Rothermel, and H. Mei. Bridging the gap between the total and additional test-case prioritization strategies. In ICSE, pages 192-201, 2013
Z. Q. Zhou, A. Sinaga, and W. Susilo. On the fault-detection capabilities of adaptive random test case prioritization: Case studies with large test suites. In HICSS, pages 5584-5593, 2012.