References of "IEEE Transactions on Software Engineering"
     in
Bookmark and Share    
Full Text
Peer Reviewed
See detailData-driven Mutation Analysis for Cyber-Physical Systems
Vigano, Enrico UL; Cornejo, Oscar; Pastore, Fabrizio UL et al

in IEEE Transactions on Software Engineering (in press)

Cyber-physical systems (CPSs) typically consist of a wide set of integrated, heterogeneous components; consequently, most of their critical failures relate to the interoperability of such components ... [more ▼]

Cyber-physical systems (CPSs) typically consist of a wide set of integrated, heterogeneous components; consequently, most of their critical failures relate to the interoperability of such components. Unfortunately, most CPS test automation techniques are preliminary and industry still heavily relies on manual testing. With potentially incomplete, manually-generated test suites, it is of paramount importance to assess their quality. Though mutation analysis has demonstrated to be an effective means to assess test suite quality in some specific contexts, we lack approaches for CPSs. Indeed, existing approaches do not target interoperability problems and cannot be executed in the presence of black-box or simulated components, a typical situation with CPSs. In this paper, we introduce data-driven mutation analysis, an approach that consists in assessing test suite quality by verifying if it detects interoperability faults simulated by mutating the data exchanged by software components. To this end, we describe a data-driven mutation analysis technique (DaMAT) that automatically alters the data exchanged through data buffers. Our technique is driven by fault models in tabular form where engineers specify how to mutate data items by selecting and configuring a set of mutation operators. We have evaluated DaMAT with CPSs in the space domain; specifically, the test suites for the software systems of a microsatellite and nanosatellites launched on orbit last year. Our results show that the approach effectively detects test suite shortcomings, is not affected by equivalent and redundant mutants, and entails acceptable costs. [less ▲]

Detailed reference viewed: 117 (11 UL)
Full Text
Peer Reviewed
See detailSmart Bound Selection for the Verification of UML/OCL Class Diagrams
Clarisó, Robert; Gonzalez Perez, Carlos Alberto UL; Cabot, Jordi

in IEEE Transactions on Software Engineering (in press)

Correctness of UML class diagrams annotated with OCL constraints can be checked using bounded verification techniques, e.g., SAT or constraint programming (CP) solvers. Bounded verification detects faults ... [more ▼]

Correctness of UML class diagrams annotated with OCL constraints can be checked using bounded verification techniques, e.g., SAT or constraint programming (CP) solvers. Bounded verification detects faults efficiently but, on the other hand, the absence of faults does not guarantee a correct behavior outside the bounded domain. Hence, choosing suitable bounds is a non-trivial process as there is a trade-off between the verification time (faster for smaller domains) and the confidence in the result (better for larger domains). Unfortunately, bounded verification tools provide little support in the bound selection process. In this paper, we present a technique that can be used to (i) automatically infer verification bounds whenever possible, (ii) tighten a set of bounds proposed by the user and (iii) guide the user in the bound selection process. This approach may increase the usability of UML/OCL bounded verification tools and improve the efficiency of the verification process. [less ▲]

Detailed reference viewed: 203 (33 UL)
Full Text
Peer Reviewed
See detailFlakify: A Black-Box, Language Model-Based Predictor for Flaky Tests
Fatima, Sakina; Ghaleb, Taher; Briand, Lionel UL

in IEEE Transactions on Software Engineering (in press)

Detailed reference viewed: 55 (2 UL)
Full Text
Peer Reviewed
See detailScalable and Accurate Test Case Prioritization in Continuous Integration Contexts
Yaraghi, Ahmadreza; Bagherzadeh, Mojtaba; Kahani, Nafiseh et al

in IEEE Transactions on Software Engineering (in press)

Detailed reference viewed: 36 (1 UL)
Full Text
Peer Reviewed
See detailBlack-Box Testing of Deep Neural Networks through Test Case Diversity
Aghababaeyan, Zohreh; Abdellatif, Manel; Briand, Lionel UL et al

in IEEE Transactions on Software Engineering (in press)

Deep Neural Networks (DNNs) have been extensively used in many areas including image processing, medical diagnostics and autonomous driving. However, DNNs can exhibit erroneous behaviours that may lead to ... [more ▼]

Deep Neural Networks (DNNs) have been extensively used in many areas including image processing, medical diagnostics and autonomous driving. However, DNNs can exhibit erroneous behaviours that may lead to critical errors, especially when used in safety-critical systems. Inspired by testing techniques for traditional software systems, researchers have proposed neuron coverage criteria, as an analogy to source code coverage, to guide the testing of DNNs. Despite very active research on DNN coverage, several recent studies have questioned the usefulness of such criteria in guiding DNN testing. Further, from a practical standpoint, these criteria are white-box as they require access to the internals or training data of DNNs, which is often not feasible or convenient. Measuring such coverage requires executing DNNs with candidate inputs to guide testing, which is not an option in many practical contexts. In this paper, we investigate diversity metrics as an alternative to white-box coverage criteria. For the previously mentioned reasons, we require such metrics to be black-box and not rely on the execution and outputs of DNNs under test. To this end, we first select and adapt three diversity metrics and study, in a controlled manner, their capacity to measure actual diversity in input sets. We then analyze their statistical association with fault detection using four datasets and five DNNs. We further compare diversity with state-of-the-art white-box coverage criteria. As a mechanism to enable such analysis, we also propose a novel way to estimate fault detection in DNNs. Our experiments show that relying on the diversity of image features embedded in test input sets is a more reliable indicator than coverage criteria to effectively guide DNN testing. Indeed, we found that one of our selected black-box diversity metrics far outperforms existing coverage criteria in terms of fault-revealing capability and computational time. Results also confirm the suspicions that state-of-the-art coverage criteria are not adequate to guide the construction of test input sets to detect as many faults as possible using natural inputs. [less ▲]

Detailed reference viewed: 89 (8 UL)
Full Text
Peer Reviewed
See detailMetamorphic Testing for Web System Security
Bayati Chaleshtari, Nazanin; Pastore, Fabrizio UL; Goknil, Arda et al

in IEEE Transactions on Software Engineering (in press)

Security testing aims at verifying that the software meets its security properties. In modern Web systems, however, this often entails the verification of the outputs generated when exercising the system ... [more ▼]

Security testing aims at verifying that the software meets its security properties. In modern Web systems, however, this often entails the verification of the outputs generated when exercising the system with a very large set of inputs. Full automation is thus required to lower costs and increase the effectiveness of security testing. Unfortunately, to achieve such automation, in addition to strategies for automatically deriving test inputs, we need to address the oracle problem, which refers to the challenge, given an input for a system, of distinguishing correct from incorrect behavior (e.g., the response to be received after a specific HTTP GET request). In this paper, we propose Metamorphic Security Testing for Web-interactions (MST-wi), a metamorphic testing approach that integrates test input generation strategies inspired by mutational fuzzing and alleviates the oracle problem in security testing. It enables engineers to specify metamorphic relations (MRs) that capture many security properties of Web systems. To facilitate the specification of such MRs, we provide a domain-specific language accompanied by an Eclipse editor. MST-wi automatically collects the input data and transforms the MRs into executable Java code to automatically perform security testing. It automatically tests Web systems to detect vulnerabilities based on the relations and collected data. We provide a catalog of 76 system-agnostic MRs to automate security testing in Web systems. It covers 39% of the OWASP security testing activities not automated by state-of-the-art techniques; further, our MRs can automatically discover 102 different types of vulnerabilities, which correspond to 45% of the vulnerabilities due to violations of security design principles according to the MITRE CWE database. We also define guidelines that enable test engineers to improve the testability of the system under test with respect to our approach. We evaluated MST-wi effectiveness and scalability with two well-known Web systems (i.e., Jenkins and Joomla). It automatically detected 85% of their vulnerabilities and showed a high specificity (99.81% of the generated inputs do not lead to a false positive); our findings include a new security vulnerability detected in Jenkins. Finally, our results demonstrate that the approach scale, thus enabling automated security testing overnight. [less ▲]

Detailed reference viewed: 96 (8 UL)
Full Text
Peer Reviewed
See detailTrace Diagnostics for Signal-based Temporal Properties
Boufaied, Chaima; Menghi, Claudio; Bianculli, Domenico UL et al

in IEEE Transactions on Software Engineering (2023), 49(5), 3131-3154

Trace checking is a verification technique widely used in Cyber-physical system (CPS) development, to verify whether execution traces satisfy or violate properties expressing system requirements. Often ... [more ▼]

Trace checking is a verification technique widely used in Cyber-physical system (CPS) development, to verify whether execution traces satisfy or violate properties expressing system requirements. Often these properties characterize complex signal behaviors and are defined using domain-specific languages, such as SB-TemPsy-DSL, a pattern-based specification language for signal-based temporal properties. Most of the trace-checking tools only yield a Boolean verdict. However, when a property is violated by a trace, engineers usually inspect the trace to understand the cause of the violation; such manual diagnostic is time-consuming and error-prone. Existing approaches that complement trace-checking tools with diagnostic capabilities either produce low-level explanations that are hardly comprehensible by engineers or do not support complex signal-based temporal properties. In this paper, we propose TD-SB-TemPsy, a trace-diagnostic approach for properties expressed using SB-TemPsy-DSL. Given a property and a trace that violates the property, TD-SB-TemPsy determines the root cause of the property violation. TD-SB-TemPsy relies on the concepts of violation cause, which characterizes one of the behaviors of the system that may lead to a property violation, and diagnoses, which are associated with violation causes and provide additional information to help engineers understand the violation cause. As part of TD-SB-TemPsy, we propose a language-agnostic methodology to define violation causes and diagnoses. In our context, its application resulted in a catalog of 34 violation causes, each associated with one diagnosis, tailored to properties expressed in SB-TemPsy-DSL. We assessed the applicability of TD-SB-TemPsy on two datasets, including one based on a complex industrial case study. The results show that TD-SB-TemPsy could finish within a timeout of 1 min for ≈ 83.66% of the trace-property combinations in the industrial dataset, yielding a diagnosis in ≈ 99.84% of these cases; moreover, it also yielded a diagnosis for all the trace-property combinations in the other dataset. These results suggest that our tool is applicable and efficient in most cases. [less ▲]

Detailed reference viewed: 54 (9 UL)
Full Text
Peer Reviewed
See detailMutation Analysis for Cyber-Physical Systems: Scalable Solutions and Results in the Space Domain
Cornejo Olivares, Oscar Eduardo UL; Pastore, Fabrizio UL; Briand, Lionel UL

in IEEE Transactions on Software Engineering (2022), 48(10), 39133939

On-board embedded software developed for spaceflight systems (space software) must adhere to stringent software quality assurance procedures. For example, verification and validation activities are ... [more ▼]

On-board embedded software developed for spaceflight systems (space software) must adhere to stringent software quality assurance procedures. For example, verification and validation activities are typically performed and assessed by third party organizations. To further minimize the risk of human mistakes, space agencies, such as the European Space Agency (ESA), are looking for automated solutions for the assessment of software testing activities, which play a crucial role in this context. Though space software is our focus here, it should be noted that such software shares the above considerations, to a large extent, with embedded software in many other types of cyber-physical systems. Over the years, mutation analysis has shown to be a promising solution for the automated assessment of test suites; it consists of measuring the quality of a test suite in terms of the percentage of injected faults leading to a test failure. A number of optimization techniques, addressing scalability and accuracy problems, have been proposed to facilitate the industrial adoption of mutation analysis. However, to date, two major problems prevent space agencies from enforcing mutation analysis in space software development. First, there is uncertainty regarding the feasibility of applying mutation analysis optimization techniques in their context. Second, most of the existing techniques either can break the real-time requirements common in embedded software or cannot be applied when the software is tested in Software Validation Facilities, including CPU emulators and sensor simulators. In this paper, we enhance mutation analysis optimization techniques to enable their applicability to embedded software and propose a pipeline that successfully integrates them to address scalability and accuracy issues in this context, as described above. Further, we report on the largest study involving embedded software systems in the mutation analysis literature. Our research is part of a research project funded by ESA ESTEC involving private companies (GomSpace Luxembourg and LuxSpace) in the space sector. These industry partners provided the case studies reported in this paper; they include an on-board software system managing a microsatellite currently on-orbit, a set of libraries used in deployed cubesats, and a mathematical library certified by ESA. [less ▲]

Detailed reference viewed: 568 (45 UL)
Full Text
Peer Reviewed
See detailCombining Genetic Programming and Model Checking to Generate Environment Assumptions
Gaaloul, Khouloud UL; Menghi, Claudio UL; Nejati, Shiva UL et al

in IEEE Transactions on Software Engineering (2022)

Software verification may yield spurious failures when environment assumptions are not accounted for. Environment assumptions are the expectations that a system or a component makes about its operational ... [more ▼]

Software verification may yield spurious failures when environment assumptions are not accounted for. Environment assumptions are the expectations that a system or a component makes about its operational environment and are often specified in terms of conditions over the inputs of that system or component. In this article, we propose an approach to automatically infer environment assumptions for Cyber-Physical Systems (CPS). Our approach improves the state-of-the-art in three different ways: First, we learn assumptions for complex CPS models involving signal and numeric variables; second, the learned assumptions include arithmetic expressions defined over multiple variables; third, we identify the trade-off between soundness and coverage of environment assumptions and demonstrate the flexibility of our approach in prioritizing either of these criteria. We evaluate our approach using a public domain benchmark of CPS models from Lockheed Martin and a component of a satellite control system from LuxSpace, a satellite system provider. The results show that our approach outperforms state-of-the-art techniques on learning assumptions for CPS models, and further, when applied to our industrial CPS model, our approach is able to learn assumptions that are sufficiently close to the assumptions manually developed by engineers to be of practical value. [less ▲]

Detailed reference viewed: 231 (51 UL)
Full Text
Peer Reviewed
See detailInputs from Hell: Learning Input Distributions for Grammar-Based Test Generation
Soremekun, Ezekiel UL; Pavese, Esteban; Havrikov, Nikolas et al

in IEEE Transactions on Software Engineering (2022), 48(4), 1138-1153

Grammars can serve as producers for structured test inputs that are syntactically correct by construction. A probabilistic grammar assigns probabilities to individual productions, thus controlling the ... [more ▼]

Grammars can serve as producers for structured test inputs that are syntactically correct by construction. A probabilistic grammar assigns probabilities to individual productions, thus controlling the distribution of input elements. Using the grammars as input parsers, we show how to learn input distributions from input samples, allowing to create inputs that are similar to the sample; by inverting the probabilities, we can create inputs that are dissimilar to the sample. This allows for three test generation strategies: 1) “Common inputs” – by learning from common inputs, we can create inputs that are similar to the sample; this is useful for regression testing. 2) “Uncommon inputs” – learning from common inputs and inverting probabilities yields inputs that are strongly dissimilar to the sample; this is useful for completing a test suite with “inputs from hell” that test uncommon features, yet are syntactically valid. 3) “Failure-inducing inputs” – learning from inputs that caused failures in the past gives us inputs that share similar features and thus also have a high chance of triggering bugs; this is useful for testing the completeness of fixes. Our evaluation on three common input formats (JSON, JavaScript, CSS) shows the effectiveness of these approaches. Results show that “common inputs” reproduced 96% of the methods induced by the samples. In contrast, for almost all subjects (95%), the “uncommon inputs” covered significantly different methods from the samples. Learning from failure-inducing samples reproduced all exceptions (100%) triggered by the failure-inducing samples and discovered new exceptions not found in any of the samples learned from. [less ▲]

Detailed reference viewed: 116 (11 UL)
Full Text
Peer Reviewed
See detailAutomatic Generation of Acceptance Test Cases from Use Case Specifications: an NLP-based Approach
Wang, Chunhui UL; Pastore, Fabrizio UL; Göknil, Arda UL et al

in IEEE Transactions on Software Engineering (2022), 48(2), 585-616

Acceptance testing is a validation activity performed to ensure the conformance of software systems with respect to their functional requirements. In safety critical systems, it plays a crucial role since ... [more ▼]

Acceptance testing is a validation activity performed to ensure the conformance of software systems with respect to their functional requirements. In safety critical systems, it plays a crucial role since it is enforced by software standards, which mandate that each requirement be validated by such testing in a clearly traceable manner. Test engineers need to identify all the representative test execution scenarios from requirements, determine the runtime conditions that trigger these scenarios, and finally provide the input data that satisfy these conditions. Given that requirements specifications are typically large and often provided in natural language (e.g., use case specifications), the generation of acceptance test cases tends to be expensive and error-prone. In this paper, we present Use Case Modeling for System-level, Acceptance Tests Generation (UMTG), an approach that supports the generation of executable, system-level, acceptance test cases from requirements specifications in natural language, with the goal of reducing the manual effort required to generate test cases and ensuring requirements coverage. More specifically, UMTG automates the generation of acceptance test cases based on use case specifications and a domain model for the system under test, which are commonly produced in many development environments. Unlike existing approaches, it does not impose strong restrictions on the expressiveness of use case specifications. We rely on recent advances in natural language processing to automatically identify test scenarios and to generate formal constraints that capture conditions triggering the execution of the scenarios, thus enabling the generation of test data. In two industrial case studies, UMTG automatically and correctly translated 95% of the use case specification steps into formal constraints required for test data generation; furthermore, it generated test cases that exercise not only all the test scenarios manually implemented by experts, but also some critical scenarios not previously considered. [less ▲]

Detailed reference viewed: 226 (32 UL)
Full Text
Peer Reviewed
See detailTkT: Automatic Inference of Timed and Extended Pushdown Automata
Pastore, Fabrizio UL; Micucci, Daniela; Guzman, Michell et al

in IEEE Transactions on Software Engineering (2022), 48(2), 617-636

To mitigate the cost of manually producing and maintaining models capturing software specifications, specification mining techniques can be exploited to automatically derive up-to-date models that ... [more ▼]

To mitigate the cost of manually producing and maintaining models capturing software specifications, specification mining techniques can be exploited to automatically derive up-to-date models that faithfully represent the behavior of software systems. So far, specification mining solutions focused on extracting information about the functional behavior of the system, especially in the form of models that represent the ordering of the operations. Well-known examples are finite state models capturing the usage protocol of software interfaces and temporal rules specifying relations among system events. Although the functional behavior of a software system is a primary aspect of concern, there are several other non-functional characteristics that must be typically addressed jointly with the functional behavior of a software system. Efficiency is one of the most relevant characteristics. In fact, an application delivering the right functionalities inefficiently has a big chance to not satisfy the expectation of its users. Interestingly, the timing behavior is strongly dependent on the functional behavior of a software system. For instance, the timing of an operation depends on the functional complexity and size of the computation that is performed. Consequently, models that combine the functional and timing behaviors, as well as their dependencies, are extremely important to precisely reason on the behavior of software systems. In this paper, we address the challenge of generating models that capture both the functional and timing behavior of a software system from execution traces. The result is the Timed k-Tail (TkT) specification mining technique, which can mine finite state models that capture such an interplay: the functional behavior is represented by the possible order of the events accepted by the transitions, while the timing behavior is represented through clocks and clock constraints of different nature associated with transitions. Our empirical evaluation with several libraries and applications show that TkT can generate accurate models, capable of supporting the identification of timing anomalies due to overloaded environment and performance faults. Furthermore, our study shows that TkT outperforms state-of-the-art techniques in terms of scalability and accuracy of the mined models. [less ▲]

Detailed reference viewed: 126 (13 UL)
Full Text
Peer Reviewed
See detailCerebro: Static Subsuming Mutant Selection
Garg, Aayush UL; Ojdanic, Milos UL; Degiovanni, Renzo Gaston UL et al

in IEEE Transactions on Software Engineering (2022)

Detailed reference viewed: 152 (36 UL)
Full Text
Peer Reviewed
See detailAstraea: Grammar-based fairness testing
Soremekun, Ezekiel UL; Udeshi, Sakshi Sunil; Chattopadhyay, Sudipta

in IEEE Transactions on Software Engineering (2022)

Software often produces biased outputs. In particular, machine learning (ML) based software is known to produce erroneous predictions when processing discriminatory inputs. Such unfair program behavior ... [more ▼]

Software often produces biased outputs. In particular, machine learning (ML) based software is known to produce erroneous predictions when processing discriminatory inputs. Such unfair program behavior can be caused by societal bias. In the last few years, Amazon, Microsoft and Google have provided software services that produce unfair outputs, mostly due to societal bias (e.g. gender or race). In such events, developers are saddled with the task of conducting fairness testing. Fairness testing is challenging; developers are tasked with generating discriminatory inputs that reveal and explain biases. We propose a grammar-based fairness testing approach (called ASTRAEA) which leverages context-free grammars to generate discriminatory inputs that reveal fairness violations in software systems. Using probabilistic grammars, ASTRAEA also provides fault diagnosis by isolating the cause of observed software bias. ASTRAEA’s diagnoses facilitate the improvement of ML fairness. ASTRAEA was evaluated on 18 software systems that provide three major natural language processing (NLP) services. In our evaluation, ASTRAEA generated fairness violations at a rate of about 18%. ASTRAEA generated over 573K discriminatory test cases and found over 102K fairness violations. Furthermore, ASTRAEA improves software fairness by about 76% via model retraining, on average. [less ▲]

Detailed reference viewed: 31 (0 UL)
Full Text
Peer Reviewed
See detailAI-enabled Automation for Completeness Checking of Privacy Policies
Amaral Cejas, Orlando UL; Abualhaija, Sallam UL; Torre, Damiano et al

in IEEE Transactions on Software Engineering (2021)

Technological advances in information sharing have raised concerns about data protection. Privacy policies containprivacy-related requirements about how the personal data of individuals will be handled by ... [more ▼]

Technological advances in information sharing have raised concerns about data protection. Privacy policies containprivacy-related requirements about how the personal data of individuals will be handled by an organization or a software system (e.g.,a web service or an app). In Europe, privacy policies are subject to compliance with the General Data Protection Regulation (GDPR). Aprerequisite for GDPR compliance checking is to verify whether the content of a privacy policy is complete according to the provisionsof GDPR. Incomplete privacy policies might result in large fines on violating organization as well as incomplete privacy-related softwarespecifications. Manual completeness checking is both time-consuming and error-prone. In this paper, we propose AI-based automationfor the completeness checking of privacy policies. Through systematic qualitative methods, we first build two artifacts to characterizethe privacy-related provisions of GDPR, namely a conceptual model and a set of completeness criteria. Then, we develop anautomated solution on top of these artifacts by leveraging a combination of natural language processing and supervised machinelearning. Specifically, we identify the GDPR-relevant information content in privacy policies and subsequently check them against thecompleteness criteria. To evaluate our approach, we collected 234 real privacy policies from the fund industry. Over a set of 48 unseenprivacy policies, our approach detected 300 of the total of 334 violations of some completeness criteria correctly, while producing 23false positives. The approach thus has a precision of 92.9% and recall of 89.8%. Compared to a baseline that applies keyword searchonly, our approach results in an improvement of 24.5% in precision and 38% in recall. [less ▲]

Detailed reference viewed: 268 (57 UL)
Full Text
Peer Reviewed
See detailReinforcement Learning for Test Case Prioritization
Bagherzadeh, Mojtaba; Kahani, Nafiseh; Briand, Lionel UL

in IEEE Transactions on Software Engineering (2021), 48(8), 2836-2856

Continuous Integration (CI) context significantly reduces integration problems, speeds up development time, and shortens release time. However, it also introduces new challenges for quality assurance ... [more ▼]

Continuous Integration (CI) context significantly reduces integration problems, speeds up development time, and shortens release time. However, it also introduces new challenges for quality assurance activities, including regression testing, which is the focus of this work. Though various approaches for test case prioritization have shown to be very promising in the context of regression testing, specific techniques must be designed to deal with the dynamic nature and timing constraints of CI. Recently, Reinforcement Learning (RL) has shown great potential in various challenging scenarios that require continuous adaptation, such as game playing, real-time ads bidding, and recommender systems. Inspired by this line of work and building on initial efforts in supporting test case prioritization with RL techniques, we perform here a comprehensive investigation of RL-based test case prioritization in a CI context. To this end, taking test case prioritization as a ranking problem, we model the sequential interactions between the CI environment and a test case prioritization agent as an RL problem, using three alternative ranking models. We then rely on carefully selected and tailored state-of-the-art RL techniques to automatically and continuously learn a test case prioritization strategy, whose objective is to be as close as possible to the optimal one. Our extensive experimental analysis shows that the best RL solutions provide a significant accuracy improvement over previous RL-based work, with prioritization strategies getting close to being optimal, thus paving the way for using RL to prioritize test cases in a CI context. [less ▲]

Detailed reference viewed: 210 (24 UL)
Full Text
Peer Reviewed
See detailAn Integrated Approach for Effective Injection Vulnerability Analysis of Web Applications through Security Slicing and Hybrid Constraint Solving
Thome, Julian UL; Shar, Lwin Khin UL; Bianculli, Domenico UL et al

in IEEE Transactions on Software Engineering (2020), 46(2), 163--195

Malicious users can attack Web applications by exploiting injection vulnerabilities in the source code. This work addresses the challenge of detecting injection vulnerabilities in the server-side code of ... [more ▼]

Malicious users can attack Web applications by exploiting injection vulnerabilities in the source code. This work addresses the challenge of detecting injection vulnerabilities in the server-side code of Java Web applications in a scalable and effective way. We propose an integrated approach that seamlessly combines security slicing with hybrid constraint solving; the latter orchestrates automata-based solving with meta-heuristic search. We use static analysis to extract minimal program slices relevant to security from Web programs and to generate attack conditions. We then apply hybrid constraint solving to determine the satisfiability of attack conditions and thus detect vulnerabilities. The experimental results, using a benchmark comprising a set of diverse and representative Web applications/services as well as security benchmark applications, show that our approach (implemented in the JOACO tool) is significantly more effective at detecting injection vulnerabilities than state-of-the-art approaches, achieving 98% recall, without producing any false alarm. We also compared the constraint solving module of our approach with state-of-the-art constraint solvers, using six different benchmark suites; our approach correctly solved the highest number of constraints (665 out of 672), without producing any incorrect result, and was the one with the least number of time-out/failing cases. In both scenarios, the execution time was practically acceptable, given the offline nature of vulnerability detection. [less ▲]

Detailed reference viewed: 678 (144 UL)
Full Text
Peer Reviewed
See detailSpecification Patterns for Robotic Missions
Menghi, Claudio UL; Tsigkanos, Christos; Pelliccione, Pelliccione et al

in IEEE Transactions on Software Engineering (2019)

Mobile and general-purpose robots increasingly support our everyday life, requiring dependable robotics control software. Creating such software mainly amounts to implementing their complex behaviors ... [more ▼]

Mobile and general-purpose robots increasingly support our everyday life, requiring dependable robotics control software. Creating such software mainly amounts to implementing their complex behaviors known as missions. Recognizing this need, a large number of domain-specific specification languages has been proposed. These, in addition to traditional logical languages, allow the use of formally specified missions for synthesis, verification, simulation or guiding implementation. For instance, the logical language LTL is commonly used by experts to specify missions as an input for planners, which synthesize the behavior a robot should have. Unfortunately, domain-specific languages are usually tied to specific robot models, while logical languages such as LTL are difficult to use by non-experts. We present a catalog of 22 mission specification patterns for mobile robots, together with tooling for instantiating, composing, and compiling the patterns to create mission specifications. The patterns provide solutions for recurrent specification problems, each of which detailing the usage intent, known uses, relationships to other patterns, and-most importantly-a template mission specification in temporal logic. Our tooling produces specifications expressed in the temporal logics LTL and CTL to be used by planners, simulators or model checkers. The patterns originate from 245 realistic textual mission requirements extracted from the robotics literature, and they are evaluated upon a total of 441 real-world mission requirements and 1251 mission specifications. Five of these reflect scenarios we defined with two well-known industrial partners developing human-size robots. We validated our patterns' correctness with simulators and two different types of real robots. [less ▲]

Detailed reference viewed: 135 (20 UL)
Full Text
Peer Reviewed
See detailTest Generation and Test Prioritization for Simulink Models with Dynamic Behavior
Matinnejad, Reza; Nejati, Shiva UL; Briand, Lionel UL et al

in IEEE Transactions on Software Engineering (2019), 45(9), 919-944

All engineering disciplines are founded and rely on models, although they may differ on purposes and usages of modeling. Among the different disciplines, the engineering of Cyber Physical Systems (CPSs ... [more ▼]

All engineering disciplines are founded and rely on models, although they may differ on purposes and usages of modeling. Among the different disciplines, the engineering of Cyber Physical Systems (CPSs) particularly relies on models with dynamic behaviors (i.e., models that exhibit time-varying changes). The Simulink modeling platform greatly appeals to CPS engineers since it captures dynamic behavior models. It further provides seamless support for two indispensable engineering activities: (1) automated verification of abstract system models via model simulation, and (2) automated generation of system implementation via code generation. We identify three main challenges in the verification and testing of Simulink models with dynamic behavior, namely incompatibility, oracle and scalability challenges. We propose a Simulink testing approach that attempts to address these challenges. Specifically, we propose a black-box test generation approach, implemented based on meta-heuristic search, that aims to maximize diversity in test output signals generated by Simulink models. We argue that in the CPS domain test oracles are likely to be manual and therefore the main cost driver of testing. In order to lower the cost of manual test oracles, we propose a test prioritization algorithm to automatically rank test cases generated by our test generation algorithm according to their likelihood to reveal a fault. Engineers can then select, according to their test budget, a subset of the most highly ranked test cases. To demonstrate scalability, we evaluate our testing approach using industrial Simulink models. Our evaluation shows that our test generation and test prioritization approaches outperform baseline techniques that rely on random testing and structural coverage. [less ▲]

Detailed reference viewed: 430 (99 UL)
Full Text
Peer Reviewed
See detailAutomatic Generation of Tests to Exploit XML Injection Vulnerabilities in Web Applications
Jan, Sadeeq UL; Panichella, Annibale UL; Arcuri, Andrea UL et al

in IEEE Transactions on Software Engineering (2019), 45(4), 335-362

Modern enterprise systems can be composed of many web services (e.g., SOAP and RESTful). Users of such systems might not have direct access to those services, and rather interact with them through a ... [more ▼]

Modern enterprise systems can be composed of many web services (e.g., SOAP and RESTful). Users of such systems might not have direct access to those services, and rather interact with them through a single-entry point which provides a GUI (e.g., a web page or a mobile app). Although the interactions with such entry point might be secure, a hacker could trick such systems to send malicious inputs to those internal web services. A typical example is XML injection targeting SOAP communications. Previous work has shown that it is possible to automatically generate such kind of attacks using search-based techniques. In this paper, we improve upon previous results by providing more efficient techniques to generate such attacks. In particular, we investigate four different algorithms and two different fitness functions. A large empirical study, involving also two industrial systems, shows that our technique is effective at automatically generating XML injection attacks. [less ▲]

Detailed reference viewed: 562 (105 UL)