Abstract :
Regression testing is an essential activity to assure that software code changes do not
adversely a ect existing functionalities. With the wide adoption of Continuous Integration (CI) in
software projects, which increases the frequency of running software builds, running all tests can be
time-consuming and resource-intensive. To alleviate that problem, Test case Selection and Prioritiza-
tion (TSP) techniques have been proposed to improve regression testing by selecting and prioritizing
test cases in order to provide early feedback to developers. In recent years, researchers have relied on
Machine Learning (ML) techniques to achieve e ective TSP (ML-based TSP). Such techniques help
combine information about test cases, from partial and imperfect sources, into accurate prediction
models. This work conducts a systematic literature review focused on ML-based TSP techniques,
aiming to perform an in-depth analysis of the state of the art, thus gaining insights regarding fu-
ture avenues of research. To that end, we analyze 29 primary studies published from 2006 to 2020,
which have been identi ed through a systematic and documented process. This paper addresses ve
research questions addressing variations in ML-based TSP techniques and feature sets for training
and testing ML models, alternative metrics used for evaluating the techniques, the performance of
techniques, and the reproducibility of the published studies. We summarize the results related to
our research questions in a high-level summary that can be used as a taxonomy for classifying future
TSP studies.
Scopus citations®
without self-citations
64