![]() ; ; Papadakis, Mike ![]() in Software Testing, Verification and Reliability (2019), 29(1-2), Detailed reference viewed: 119 (9 UL)![]() Cordy, Maxime ![]() ![]() in ACM SIGSOFT International Symposium on Software Testing and Analysis (2019) Detailed reference viewed: 217 (8 UL)![]() Rwemalika, Renaud ![]() ![]() ![]() in 12th IEEE International Conference on Software Testing, Verification and Validation (2019) Many companies rely on software testing to verify that their software products meet their requirements. However, test quality and, in particular, the quality of end-to-end testing is relatively hard to ... [more ▼] Many companies rely on software testing to verify that their software products meet their requirements. However, test quality and, in particular, the quality of end-to-end testing is relatively hard to achieve. The problem becomes challenging when software evolves, as end-to-end test suites need to adapt and conform to the evolved software. Unfortunately, end-to-end tests are particularly fragile as any change in the application interface, e.g., application flow, location or name of graphical user interface elements, necessitates a change in the tests. This paper presents an industrial case study on the evolution of Keyword-Driven test suites, also known as Keyword-Driven Testing (KDT). Our aim is to demonstrate the problem of test maintenance, identify the benefits of Keyword-Driven Testing and overall improve the understanding of test code evolution (at the acceptance testing level). This information will support the development of automatic techniques, such as test refactoring and repair, and will motivate future research. To this end, we identify, collect and analyze test code changes across the evolution of industrial KDT test suites for a period of eight months. We show that the problem of test maintenance is largely due to test fragility (most commonly-performed changes are due to locator and synchronization issues) and test clones (over 30% of keywords are duplicated). We also show that the better test design of KDT test suites has the potential for drastically reducing (approximately 70%) the number of test code changes required to support software evolution. To further validate our results, we interview testers from BGL BNP Paribas and report their perceptions on the advantages and challenges of keyword-driven testing. [less ▲] Detailed reference viewed: 269 (12 UL)![]() ; ; Papadakis, Mike ![]() in International Conference on Software Engineering (ICSE) (2019) Detailed reference viewed: 161 (10 UL)![]() ; ; Robert, Jérémy ![]() in International Conference on Social Networks Analysis, Management and Security (2019) Detailed reference viewed: 160 (4 UL)![]() Ghamizi, Salah ![]() ![]() ![]() in Automated Search for Configurations of Convolutional Neural Network Architectures (2019) Deep Neural Networks (DNNs) are intensively used to solve a wide variety of complex problems. Although powerful, such systems require manual configuration and tuning. To this end, we view DNNs as ... [more ▼] Deep Neural Networks (DNNs) are intensively used to solve a wide variety of complex problems. Although powerful, such systems require manual configuration and tuning. To this end, we view DNNs as configurable systems and propose an end-to-end framework that allows the configuration, evaluation and automated search for DNN architectures. Therefore, our contribution is threefold. First, we model the variability of DNN architectures with a Feature Model (FM) that generalizes over existing architectures. Each valid configuration of the FM corresponds to a valid DNN model that can be built and trained. Second, we implement, on top of Tensorflow, an automated procedure to deploy, train and evaluate the performance of a configured model. Third, we propose a method to search for configurations and demonstrate that it leads to good DNN models. We evaluate our method by applying it on image classification tasks (MNIST, CIFAR-10) and show that, with limited amount of computation and training, our method can identify high-performing architectures (with high accuracy). We also demonstrate that we outperform existing state-of-the-art architectures handcrafted by ML researchers. Our FM and framework have been released to support replication and future research. [less ▲] Detailed reference viewed: 210 (38 UL)![]() ; ; et al in ACM SIGSOFT International Symposium on Software Testing and Analysis (2019) Detailed reference viewed: 125 (3 UL)![]() ; Rwemalika, Renaud ![]() ![]() in Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE) (2019) Detailed reference viewed: 142 (10 UL)![]() Rwemalika, Renaud ![]() ![]() ![]() Scientific Conference (2018, December 11) Agile methodologies enable companies to drastically increase software release pace and reduce time-to-market. In a rapidly changing environment, testing becomes a cornerstone of the software development ... [more ▼] Agile methodologies enable companies to drastically increase software release pace and reduce time-to-market. In a rapidly changing environment, testing becomes a cornerstone of the software development process, guarding the system code base from the insertion of faults. To cater for this, many companies are migrating manual end-to-end tests to automated ones. This migration introduces several challenges to the practitioners. These challenges relate to difficulties in the creation of the automated tests, their maintenance and the evolution of the test code base. In this position paper, we discuss our preliminary results on such challenges and present two potential solutions to these problems, focusing on keyword-driven end-to-end tests. Our solutions leverage existing software artifacts, namely the test suite and an automatically-created model of the system under test, to support the evolution of keyword-driven test suites. [less ▲] Detailed reference viewed: 289 (27 UL)![]() Jimenez, Matthieu ![]() ![]() ![]() in Proceedings of 12th International Symposium on Empirical Software Engineering and Measurement (ESEM'18) (2018, October 11) Background: Code is repetitive and predictable in a way that is similar to the natural language. This means that code is ``natural'' and this ``naturalness'' can be captured by natural language modelling ... [more ▼] Background: Code is repetitive and predictable in a way that is similar to the natural language. This means that code is ``natural'' and this ``naturalness'' can be captured by natural language modelling techniques. Such models promise to capture the program semantics and identify source code parts that `smell', i.e., they are strange, badly written and are generally error-prone (likely to be defective). Aims: We investigate the use of natural language modelling techniques in mutation testing (a testing technique that uses artificial faults). We thus, seek to identify how well artificial faults simulate real ones and ultimately understand how natural the artificial faults can be. %We investigate this question in a fault revelation perspective. Our intuition is that natural mutants, i.e., mutants that are predictable (follow the implicit coding norms of developers), are semantically useful and generally valuable (to testers). We also expect that mutants located on unnatural code locations (which are generally linked with error-proneness) to be of higher value than those located on natural code locations. Method: Based on this idea, we propose mutant selection strategies that rank mutants according to a) their naturalness (naturalness of the mutated code), b) the naturalness of their locations (naturalness of the original program statements) and c) their impact on the naturalness of the code that they apply to (naturalness differences between original and mutated statements). We empirically evaluate these issues on a benchmark set of 5 open-source projects, involving more than 100k mutants and 230 real faults. Based on the fault set we estimate the utility (i.e. capability to reveal faults) of mutants selected on the basis of their naturalness, and compare it against the utility of randomly selected mutants. Results: Our analysis shows that there is no link between naturalness and the fault revelation utility of mutants. We also demonstrate that the naturalness-based mutant selection performs similar (slightly worse) to the random mutant selection. Conclusions: Our findings are negative but we consider them interesting as they confute a strong intuition, i.e., fault revelation is independent of the mutants' naturalness. [less ▲] Detailed reference viewed: 201 (23 UL)![]() Jimenez, Matthieu ![]() ![]() ![]() in 34th IEEE International Conference on Software Maintenance and Evolution, Madrid, Spain, 26-28 September 2018 (2018, September 26) Natural language processing techniques, in particular n-gram models, have been applied successfully to facilitate a number of software engineering tasks. However, in our related ICSME ’18 paper, we have ... [more ▼] Natural language processing techniques, in particular n-gram models, have been applied successfully to facilitate a number of software engineering tasks. However, in our related ICSME ’18 paper, we have shown that the conclusions of a study can drastically change with respect to how the code is tokenized and how the used n-gram model is parameterized. These choices are thus of utmost importance, and one must carefully make them. To show this and allow the community to benefit from our work, we have developed TUNA (TUning Naturalness-based Analysis), a Java software artifact to perform naturalness-based analyses of source code. To the best of our knowledge, TUNA is the first open- source, end-to-end toolchain to carry out source code analyses based on naturalness. [less ▲] Detailed reference viewed: 212 (12 UL)![]() Jimenez, Matthieu ![]() ![]() ![]() Scientific Conference (2018, September) Recent research shows that language models, such as n-gram models, are useful at a wide variety of software engineering tasks, e.g., code completion, bug identification, code summarisation, etc. However ... [more ▼] Recent research shows that language models, such as n-gram models, are useful at a wide variety of software engineering tasks, e.g., code completion, bug identification, code summarisation, etc. However, such models require the appropriate set of numerous parameters. Moreover, the different ways one can read code essentially yield different models (based on the different sequences of tokens). In this paper, we focus on n- gram models and evaluate how the use of tokenizers, smoothing, unknown threshold and n values impact the predicting ability of these models. Thus, we compare the use of multiple tokenizers and sets of different parameters (smoothing, unknown threshold and n values) with the aim of identifying the most appropriate combinations. Our results show that the Modified Kneser-Ney smoothing technique performs best, while n values are depended on the choice of the tokenizer, with values 4 or 5 offering a good trade-off between entropy and computation time. Interestingly, we find that tokenizers treating the code as simple text are the most robust ones. Finally, we demonstrate that the differences between the tokenizers are of practical importance and have the potential of changing the conclusions of a given experiment. [less ▲] Detailed reference viewed: 254 (16 UL)![]() ; ; et al in 40th International Conference on Software Engineering, May 27 - 3 June 2018, Gothenburg, Sweden (2018, May) Detailed reference viewed: 140 (2 UL)![]() ; ; Papadakis, Mike ![]() in IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (2018) Detailed reference viewed: 79 (1 UL)![]() Papadakis, Mike ![]() in 40th International Conference on Software Engineering, May 27 - 3 June 2018, Gothenburg, Sweden (2018) Detailed reference viewed: 226 (10 UL)![]() Titcheu Chekam, Thierry ![]() ![]() ![]() in 40th International Conference on Software Engineering, Gothenburg, Sweden, May 27 - 3 June 2018 (2018) Detailed reference viewed: 297 (22 UL)![]() ; ; Papadakis, Mike ![]() in Journal of Systems and Software (2018) Detailed reference viewed: 123 (3 UL)![]() Kintis, Marinos ![]() ![]() in Empirical Software Engineering (2018) Mutation analysis is a well-studied, fault-based testing technique. It requires testers to design tests based on a set of artificial defects. The defects help in performing testing activities by measuring ... [more ▼] Mutation analysis is a well-studied, fault-based testing technique. It requires testers to design tests based on a set of artificial defects. The defects help in performing testing activities by measuring the ratio that is revealed by the candidate tests. Unfortunately, applying mutation to real-world programs requires automated tools due to the vast number of defects involved. In such a case, the effectiveness of the method strongly depends on the peculiarities of the employed tools. Thus, when using automated tools, their implementation inadequacies can lead to inaccurate results. To deal with this issue, we cross-evaluate four mutation testing tools for Java, namely PIT, muJava, Major and the research version of PIT, PITRV, with respect to their fault-detection capabilities. We investigate the strengths of the tools based on: a) a set of real faults and b) manual analysis of the mutants they introduce. We find that there are large differences between the tools’ effectiveness and demonstrate that no tool is able to subsume the others. We also provide results indicating the application cost of the method. Overall, we find that PITRV achieves the best results. In particular, PITRV outperforms the other tools by finding 6% more faults than the other tools combined. [less ▲] Detailed reference viewed: 255 (9 UL)![]() Jimenez, Matthieu ![]() ![]() ![]() in IEEE International Working Conference on Source Code Analysis and Manipulation (2018) Detailed reference viewed: 346 (37 UL)![]() Papadakis, Mike ![]() ![]() ![]() in 13th International Workshop on Mutation Analysis (MUTATION'18) (2018) Detailed reference viewed: 306 (20 UL) |
||