References of "Empirical Software Engineering"
     in
Bookmark and Share    
Full Text
Peer Reviewed
See detailSearch-based Multi-Vulnerability Testing of XML Injections in Web Applications
Jan, Sadeeq UL; Panichella, Annibale UL; Arcuri, Andrea UL et al

in Empirical Software Engineering (in press)

Detailed reference viewed: 125 (9 UL)
Full Text
Peer Reviewed
See detailAn Empirical Study on the Potential Usefulness of Domain Models for Completeness Checking of Requirements
Arora, Chetan UL; Sabetzadeh, Mehrdad UL; Briand, Lionel UL

in Empirical Software Engineering (in press)

[Context] Domain modeling is a common strategy for mitigating incompleteness in requirements. While the benefits of domain models for checking the completeness of requirements are anecdotally known, these ... [more ▼]

[Context] Domain modeling is a common strategy for mitigating incompleteness in requirements. While the benefits of domain models for checking the completeness of requirements are anecdotally known, these benefits have never been evaluated systematically. [Objective] We empirically examine the potential usefulness of domain models for detecting incompleteness in natural-language requirements. We focus on requirements written as “shall”- style statements and domain models captured using UML class diagrams. [Methods] Through a randomized simulation process, we analyze the sensitivity of domain models to omissions in requirements. Sensitivity is a measure of whether a domain model contains information that can lead to the discovery of requirements omissions. Our empirical research method is case study research in an industrial setting. [Results and Conclusions] We have experts construct domain models in three distinct industry domains. We then report on how sensitive the resulting models are to simulated omissions in requirements. We observe that domain models exhibit near-linear sensitivity to both unspecified (i.e., missing) and under-specified requirements (i.e., requirements whose details are incomplete). The level of sensitivity is more than four times higher for unspecified requirements than under-specified ones. These results provide empirical evidence that domain models provide useful cues for checking the completeness of natural-language requirements. Further studies remain necessary to ascertain whether analysts are able to effectively exploit these cues for incompleteness detection. [less ▲]

Detailed reference viewed: 198 (56 UL)
Full Text
Peer Reviewed
See detailEvolutionary Fuzzing of Android OS Vendor System Services
Iannillo, Antonio Ken UL; Natella, Roberto; Cotroneo, Domenico

in Empirical Software Engineering (2019)

Android devices are shipped in several flavors by more than 100 manufacturer partners, which extend the Android “vanilla” OS with new system services and modify the existing ones. These proprietary ... [more ▼]

Android devices are shipped in several flavors by more than 100 manufacturer partners, which extend the Android “vanilla” OS with new system services and modify the existing ones. These proprietary extensions expose Android devices to reliability and security issues. In this paper, we propose a coverage-guided fuzzing platform (Chizpurfle) based on evolutionary algorithms to test proprietary Android system services. A key feature of this platform is the ability to profile coverage on the actual, unmodified Android device, by taking advantage of dynamic binary re-writing techniques. We applied this solution to three high-end commercial Android smartphones. The results confirmed that evolutionary fuzzing is able to test Android OS system services more efficiently than blind fuzzing. Furthermore, we evaluate the impact of different choices for the fitness function and selection algorithm. [less ▲]

Detailed reference viewed: 22 (0 UL)
Full Text
Peer Reviewed
See detailEffective Fault Localization of Automotive Simulink Models: Achieving the Trade-Off between Test Oracle Effort and Fault Localization Accuracy
Liu, Bing; Nejati, Shiva UL; Lucia, Lucia et al

in Empirical Software Engineering (2019), 24(1), 444-490

One promising way to improve the accuracy of fault localization based on statistical debugging is to increase diversity among test cases in the underlying test suite. In many practical situations, adding ... [more ▼]

One promising way to improve the accuracy of fault localization based on statistical debugging is to increase diversity among test cases in the underlying test suite. In many practical situations, adding test cases is not a cost-free option because test oracles are developed manually or running test cases is expensive. Hence, we require to have test suites that are both diverse and small to improve debugging. In this paper, we focus on improving fault localization of Simulink models by generating test cases. We identify four test objectives that aim to increase test suite diversity. We use four objectives in a search-based algorithm to generate diversified but small test suites. To further minimize test suite sizes, we develop a prediction model to stop test generation when adding test cases is unlikely to improve fault localization. We evaluate our approach using three industrial subjects. Our results show (1) expanding test suites used for fault localization using any of our four test objectives, even when the expansion is small, can significantly improve the accuracy of fault localization, (2) varying test objectives used to generate the initial test suites for fault localization does not have a significant impact on the fault localization results obtained based on those test suites, and (3) we identify an optimal configuration for prediction models to help stop test generation when it is unlikely to be beneficial. We further show that our optimal prediction model is able to maintain almost the same fault localization accuracy while reducing the average number of newly generated test cases by more than half. [less ▲]

Detailed reference viewed: 225 (43 UL)
Full Text
Peer Reviewed
See detailAugmenting and Structuring User Queries to Support Efficient Free-Form Code Search
Sirres, Raphael; Bissyande, Tegawendé François D Assise UL; Kim, Dongsun et al

in Empirical Software Engineering (2018), 90

Detailed reference viewed: 22 (2 UL)
Full Text
Peer Reviewed
See detailHow effective are mutation testing tools? An empirical analysis of Java mutation testing tools with manual analysis and real faults
Kintis, Marinos UL; Papadakis, Mike UL; Papadopoulos, Andreas et al

in Empirical Software Engineering (2018)

Mutation analysis is a well-studied, fault-based testing technique. It requires testers to design tests based on a set of artificial defects. The defects help in performing testing activities by measuring ... [more ▼]

Mutation analysis is a well-studied, fault-based testing technique. It requires testers to design tests based on a set of artificial defects. The defects help in performing testing activities by measuring the ratio that is revealed by the candidate tests. Unfortunately, applying mutation to real-world programs requires automated tools due to the vast number of defects involved. In such a case, the effectiveness of the method strongly depends on the peculiarities of the employed tools. Thus, when using automated tools, their implementation inadequacies can lead to inaccurate results. To deal with this issue, we cross-evaluate four mutation testing tools for Java, namely PIT, muJava, Major and the research version of PIT, PITRV, with respect to their fault-detection capabilities. We investigate the strengths of the tools based on: a) a set of real faults and b) manual analysis of the mutants they introduce. We find that there are large differences between the tools’ effectiveness and demonstrate that no tool is able to subsume the others. We also provide results indicating the application cost of the method. Overall, we find that PITRV achieves the best results. In particular, PITRV outperforms the other tools by finding 6% more faults than the other tools combined. [less ▲]

Detailed reference viewed: 102 (8 UL)
Full Text
Peer Reviewed
See detailAn Experience Report On Applying Software Testing Academic Results In Industry: We Need Usable Automated Test Generation
Arcuri, Andrea UL

in Empirical Software Engineering (2018), 23(4),

What is the impact of software engineering research on current practices in industry? In this paper, I report on my direct experience as a PhD/post-doc working in software engineering research projects ... [more ▼]

What is the impact of software engineering research on current practices in industry? In this paper, I report on my direct experience as a PhD/post-doc working in software engineering research projects, and then spending the following five years as an engineer in two different companies (the first one being the same I worked in collaboration with during my post-doc). Given a background in software engineering research, what cutting-edge techniques and tools from academia did I use in my daily work when developing and testing the systems of these companies? Regarding validation and verification (my main area of research), the answer is rather short: as far as I can tell, only FindBugs. In this paper, I report on why this was the case, and discuss all the challenging, complex open problems we face in industry and which somehow are ``neglected'' in the academic circles. In particular, I will first discuss what actual tools I could use in my daily work, such as JaCoCo and Selenium. Then, I will discuss the main open problems I faced, particularly related to environment simulators, unit and web testing. After that, popular topics in academia are presented, such as UML, regression and mutation testing. Their lack of impact on the type of projects I worked on in industry is then discussed. Finally, from this industrial experience, I provide my opinions about how this situation can be improved, in particular related to how academics are evaluated, and advocate for a greater involvement into open-source projects. [less ▲]

Detailed reference viewed: 182 (19 UL)
Full Text
Peer Reviewed
See detailAutomatic Identifier Inconsistency Detection Using Code Dictionary
Kim, Suntae; Kim, Dongsun UL

in Empirical Software Engineering (2016), 21(2), 565-604

Inconsistent identifiers make it difficult for developers to understand source code. In particular, large software systems written by several developers can be vulnerable to identifier inconsistency ... [more ▼]

Inconsistent identifiers make it difficult for developers to understand source code. In particular, large software systems written by several developers can be vulnerable to identifier inconsistency. Unfortunately, it is not easy to detect inconsistent identifiers that are already used in source code. Although several techniques have been proposed to address this issue, many of these techniques can result in false alarms since such techniques do not accept domain words and idiom identifiers that are widely used in programming practice. This paper proposes an approach to detecting inconsistent identifiers based on a custom code dictionary. It first automatically builds a Code Dictionary from the existing API documents of popular Java projects by using an Natural Language Processing (NLP) parser. This dictionary records domain words with dominant part-of-speech (POS) and idiom identifiers. This set of domain words and idioms can improve the accuracy when detecting inconsistencies by reducing false alarms. The approach then takes a target program and detects inconsistent identifiers of the program by leveraging the Code Dictionary. We provide CodeAmigo, a GUI-based tool support for our approach. We evaluated our approach on seven Java based open-/proprietarysource projects. The results of the evaluations show that the approach can detect inconsistent identifiers with 85.4% precision and 83.59% recall values. In addition, we conducted an interview with developers who used our approach, and the interview confirmed that inconsistent identifiers frequently and inevitably occur in most software projects. The interviewees then stated that our approach can help to better detect inconsistent identifiers that would have been missed through manual detection. [less ▲]

Detailed reference viewed: 153 (13 UL)
Full Text
Peer Reviewed
See detailA Detailed Investigation of the Effectiveness of Whole Test Suite Generation
Rojas, José Miguel; Vivanti, Mattia; Arcuri, Andrea UL et al

in Empirical Software Engineering (2016)

A common application of search-based software testing is to generate test cases for all goals defined by a coverage criterion (e.g., lines, branches, mutants). Rather than generating one test case at a ... [more ▼]

A common application of search-based software testing is to generate test cases for all goals defined by a coverage criterion (e.g., lines, branches, mutants). Rather than generating one test case at a time for each of these goals individually, whole test suite generation optimizes entire test suites towards satisfying all goals at the same time. There is evidence that the overall coverage achieved with this approach is superior to that of targeting individual coverage goals. Nevertheless, there remains some uncertainty on (a) whether the results generalize beyond branch coverage, (b) whether the whole test suite approach might be inferior to a more focused search for some particular coverage goals, and (c) whether generating whole test suites could be optimized by only targeting coverage goals not already covered. In this paper, we perform an in-depth analysis to study these questions. An empirical study on 100 Java classes using three different coverage criteria reveals that indeed there are some testing goals that are only covered by the traditional approach, although their number is only very small in comparison with those which are exclusively covered by the whole test suite approach. We find that keeping an archive of already covered goal with corresponding tests and focusing the search on uncovered goals overcomes this small drawback on larger classes, leading to an improved overall effectiveness of whole test suite generation. [less ▲]

Detailed reference viewed: 116 (17 UL)
Full Text
Peer Reviewed
See detailImproving the Performance of OCL Constraint Solving with Novel Heuristics for Logical Operations: A SearchBased Approach
Ali, Shaukat; Iqbal, Zohaib; Khalid, Maham et al

in Empirical Software Engineering (2015)

A common practice to specify constraints on the Unified Modeling Language (UML) models is using the Object Constraint Language (OCL). Such constraints serve various purposes, ranging from simply providing ... [more ▼]

A common practice to specify constraints on the Unified Modeling Language (UML) models is using the Object Constraint Language (OCL). Such constraints serve various purposes, ranging from simply providing precise meaning to the models to supporting complex verification and validation activities. In many applications, these constraints have to be solved to obtain values satisfying the constraints, for example, in the case of modelbased testing (MBT) to generate test data for the purpose of generating executable test cases. In our previous work, we proposed novel heuristics for various OCL constructs to efficiently solve them using search algorithms. These heuristics are enhanced in this paper to further improve the performance of OCL constraint solving. We performed an empirical evaluation comprising of three case studies using three search algorithms: Alternating Variable Method (AVM), (1+1) Evolutionary Algorithm (EA), and a Genetic Algorithm (GA) and in addition Random Search (RS) was used as a comparison baseline. In the first case study, we evaluated each heuristics using carefully designed artificial problems. In the second case study, we evaluated the heuristics on various constraints of Cisco’s Video Conferencing Systems defined to support MBT. Finally, the third case study is about EURent Car Rental specification and is obtained from the literature. The results of the empirical evaluation showed that (1+1) EA and AVM with the improved heuristics significantly outperform the rest of the algorithms. [less ▲]

Detailed reference viewed: 88 (19 UL)
Full Text
Peer Reviewed
See detailEmpirical assessment of machine learning-based malware detectors for Android: Measuring the Gap between In-the-Lab and In-the-Wild Validation Scenarios
Allix, Kevin UL; Bissyande, Tegawendé François D Assise UL; Jerome, Quentin UL et al

in Empirical Software Engineering (2014)

To address the issue of malware detection through large sets of applications, researchers have recently started to investigate the capabilities of machine-learning techniques for proposing effective ... [more ▼]

To address the issue of malware detection through large sets of applications, researchers have recently started to investigate the capabilities of machine-learning techniques for proposing effective approaches. So far, several promising results were recorded in the literature, many approaches being assessed with what we call in the lab validation scenarios. This paper revisits the purpose of malware detection to discuss whether such in the lab validation scenarios provide reliable indications on the performance of malware detectors in real-world settings, aka in the wild. To this end, we have devised several Machine Learning classifiers that rely on a set of features built from applications’ CFGs. We use a sizeable dataset of over 50 000 Android applications collected from sources where state-of-the art approaches have selected their data. We show that, in the lab, our approach outperforms existing machine learning-based approaches. However, this high performance does not translate in high performance in the wild. The performance gap we observed—F-measures dropping from over 0.9 in the lab to below 0.1 in the wild —raises one important question: How do state-of-the-art approaches perform in the wild ? [less ▲]

Detailed reference viewed: 392 (39 UL)