Abstract :
[en] This tutorial covers the basics of how to use statistical tests to
evaluate and compare search-algorithms, in particular when applied
on software engineering problems. Search-algorithms like
Hill Climbing and Genetic Algorithms are randomised. Running
such randomised algorithms twice on the same problem can give
different results. It is hence important to run such algorithms multiple
times to collect average results, and avoid so publishing wrong
conclusions that were based on just luck. However, there is the
question of how often such runs should be repeated. Given a set
of n repeated experiments, is such n large enough to draw sound
conclusions? Or should had more experiments been run? Statistical
tests like the Wilcoxon-Mann-Whitney U-test can be used to
answer these important questions.
Scopus citations®
without self-citations
1