References of "Briand, Lionel 50001049"
     in
Bookmark and Share    
Full Text
Peer Reviewed
See detailScalable and Accurate Test Case Prioritization in Continuous Integration Contexts
Yaraghi, Ahmadreza; Bagherzadeh, Mojtaba; Kahani, Nafiseh et al

in IEEE Transactions on Software Engineering (in press)

Full Text
Peer Reviewed
See detailFlakify: A Black-Box, Language Model-Based Predictor for Flaky Tests
Fatima, Sakina; Ghaleb, Taher; Briand, Lionel UL

in IEEE Transactions on Software Engineering (in press)

Full Text
Peer Reviewed
See detailData-driven Mutation Analysis for Cyber-Physical Systems
Vigano, Enrico UL; Cornejo, Oscar; Pastore, Fabrizio UL et al

in IEEE Transactions on Software Engineering (in press)

Cyber-physical systems (CPSs) typically consist of a wide set of integrated, heterogeneous components; consequently, most of their critical failures relate to the interoperability of such components ... [more ▼]

Cyber-physical systems (CPSs) typically consist of a wide set of integrated, heterogeneous components; consequently, most of their critical failures relate to the interoperability of such components. Unfortunately, most CPS test automation techniques are preliminary and industry still heavily relies on manual testing. With potentially incomplete, manually-generated test suites, it is of paramount importance to assess their quality. Though mutation analysis has demonstrated to be an effective means to assess test suite quality in some specific contexts, we lack approaches for CPSs. Indeed, existing approaches do not target interoperability problems and cannot be executed in the presence of black-box or simulated components, a typical situation with CPSs. In this paper, we introduce data-driven mutation analysis, an approach that consists in assessing test suite quality by verifying if it detects interoperability faults simulated by mutating the data exchanged by software components. To this end, we describe a data-driven mutation analysis technique (DaMAT) that automatically alters the data exchanged through data buffers. Our technique is driven by fault models in tabular form where engineers specify how to mutate data items by selecting and configuring a set of mutation operators. We have evaluated DaMAT with CPSs in the space domain; specifically, the test suites for the software systems of a microsatellite and nanosatellites launched on orbit last year. Our results show that the approach effectively detects test suite shortcomings, is not affected by equivalent and redundant mutants, and entails acceptable costs. [less ▲]

Detailed reference viewed: 32 (4 UL)
Full Text
Peer Reviewed
See detailCombining Genetic Programming and Model Checking to Generate Environment Assumptions
Gaaloul, Khouloud UL; Menghi, Claudio UL; Nejati, Shiva UL et al

in IEEE Transactions on Software Engineering (in press)

Software verification may yield spurious failures when environment assumptions are not accounted for. Environment assumptions are the expectations that a system or a component makes about its operational ... [more ▼]

Software verification may yield spurious failures when environment assumptions are not accounted for. Environment assumptions are the expectations that a system or a component makes about its operational environment and are often specified in terms of conditions over the inputs of that system or component. In this article, we propose an approach to automatically infer environment assumptions for Cyber-Physical Systems (CPS). Our approach improves the state-of-the-art in three different ways: First, we learn assumptions for complex CPS models involving signal and numeric variables; second, the learned assumptions include arithmetic expressions defined over multiple variables; third, we identify the trade-off between soundness and coverage of environment assumptions and demonstrate the flexibility of our approach in prioritizing either of these criteria. We evaluate our approach using a public domain benchmark of CPS models from Lockheed Martin and a component of a satellite control system from LuxSpace, a satellite system provider. The results show that our approach outperforms state-of-the-art techniques on learning assumptions for CPS models, and further, when applied to our industrial CPS model, our approach is able to learn assumptions that are sufficiently close to the assumptions manually developed by engineers to be of practical value. [less ▲]

Detailed reference viewed: 206 (50 UL)
Full Text
Peer Reviewed
See detailSimulator-based explanation and debugging of hazard-triggering events in DNN-based safety-critical systems
Fahmy, Hazem UL; Pastore, Fabrizio UL; Briand, Lionel UL et al

in ACM Transactions on Software Engineering and Methodology (in press)

When Deep Neural Networks (DNNs) are used in safety-critical systems, engineers should determine the safety risks associated with failures (i.e., erroneous outputs) observed during testing. For DNNs ... [more ▼]

When Deep Neural Networks (DNNs) are used in safety-critical systems, engineers should determine the safety risks associated with failures (i.e., erroneous outputs) observed during testing. For DNNs processing images, engineers visually inspect all failure-inducing images to determine common characteristics among them. Such characteristics correspond to hazard-triggering events (e.g., low illumination) that are essential inputs for safety analysis. Though informative, such activity is expensive and error-prone. To support such safety analysis practices, we propose SEDE, a technique that generates readable descriptions for commonalities in failure-inducing, real-world images and improves the DNN through effective retraining. SEDE leverages the availability of simulators, which are commonly used for cyber-physical systems. It relies on genetic algorithms to drive simulators towards the generation of images that are similar to failure-inducing, real-world images in the test set; it then employs rule learning algorithms to derive expressions that capture commonalities in terms of simulator parameter values. The derived expressions are then used to generate additional images to retrain and improve the DNN. With DNNs performing in-car sensing tasks, SEDE successfully characterized hazard-triggering events leading to a DNN accuracy drop. Also, SEDE enabled retraining leading to significant improvements in DNN accuracy, up to 18 percentage points. [less ▲]

Detailed reference viewed: 29 (2 UL)
Full Text
Peer Reviewed
See detailA Machine Learning Approach for Automated Filling of Categorical Fields in Data Entry Forms
Belgacem, Hichem UL; Li, Xiaochen; Bianculli, Domenico UL et al

in ACM Transactions on Software Engineering and Methodology (in press)

Users frequently interact with software systems through data entry forms. However, form filling is time-consuming and error-prone. Although several techniques have been proposed to auto-complete or pre ... [more ▼]

Users frequently interact with software systems through data entry forms. However, form filling is time-consuming and error-prone. Although several techniques have been proposed to auto-complete or pre-fill fields in the forms, they provide limited support to help users fill categorical fields, i.e., fields that require users to choose the right value among a large set of options. In this paper, we propose LAFF, a learning-based automated approach for filling categorical fields in data entry forms. LAFF first builds Bayesian Network models by learning field dependencies from a set of historical input instances, representing the values of the fields that have been filled in the past. To improve its learning ability, LAFF uses local modeling to effectively mine the local dependencies of fields in a cluster of input instances. During the form filling phase, LAFF uses such models to predict possible values of a target field, based on the values in the already-filled fields of the form and their dependencies; the predicted values (endorsed based on field dependencies and prediction confidence) are then provided to the end-user as a list of suggestions. We evaluated LAFF by assessing its effectiveness and efficiency in form filling on two datasets, one of them proprietary from the banking domain. Experimental results show that LAFF is able to provide accurate suggestions with a Mean Reciprocal Rank value above 0.73. Furthermore, LAFF is efficient, requiring at most 317 ms per suggestion. [less ▲]

Detailed reference viewed: 109 (24 UL)
Full Text
Peer Reviewed
See detailModeling Data Protection and Privacy: Application and Experience with GDPR
Torre, Damiano UL; Alferez, Mauricio UL; Soltana, Ghanem UL et al

in Software and Systems Modeling (in press)

In Europe and indeed worldwide, the Gen- eral Data Protection Regulation (GDPR) provides pro- tection to individuals regarding their personal data in the face of new technological developments. GDPR is ... [more ▼]

In Europe and indeed worldwide, the Gen- eral Data Protection Regulation (GDPR) provides pro- tection to individuals regarding their personal data in the face of new technological developments. GDPR is widely viewed as the benchmark for data protection and privacy regulations that harmonizes data privacy laws across Europe. Although the GDPR is highly ben- e cial to individuals, it presents signi cant challenges for organizations monitoring or storing personal infor- mation. Since there is currently no automated solution with broad industrial applicability, organizations have no choice but to carry out expensive manual audits to ensure GDPR compliance. In this paper, we present a complete GDPR UML model as a rst step towards de- signing automated methods for checking GDPR compli- ance. Given that the practical application of the GDPR is infuenced by national laws of the EU Member States,we suggest a two-tiered description of the GDPR, generic and specialized. In this paper, we provide (1) the GDPR conceptual model we developed with complete trace- ability from its classes to the GDPR, (2) a glossary to help understand the model, (3) the plain-English de- scription of 35 compliance rules derived from GDPR along with their encoding in OCL, and (4) the set of 20 variations points derived from GDPR to specialize the generic model. We further present the challenges we faced in our modeling endeavor, the lessons we learned from it, and future directions for research. [less ▲]

Detailed reference viewed: 99 (13 UL)
Full Text
Peer Reviewed
See detailTest Case Selection and Prioritization Using Machine Learning: A Systematic Literature Review
Pan, Rongqi; Bagherzadeh, Mojtaba; Ghaleb, Taher et al

in Empirical Software Engineering (in press)

Regression testing is an essential activity to assure that software code changes do not adversely a ect existing functionalities. With the wide adoption of Continuous Integration (CI) in software projects ... [more ▼]

Regression testing is an essential activity to assure that software code changes do not adversely a ect existing functionalities. With the wide adoption of Continuous Integration (CI) in software projects, which increases the frequency of running software builds, running all tests can be time-consuming and resource-intensive. To alleviate that problem, Test case Selection and Prioritiza- tion (TSP) techniques have been proposed to improve regression testing by selecting and prioritizing test cases in order to provide early feedback to developers. In recent years, researchers have relied on Machine Learning (ML) techniques to achieve e ective TSP (ML-based TSP). Such techniques help combine information about test cases, from partial and imperfect sources, into accurate prediction models. This work conducts a systematic literature review focused on ML-based TSP techniques, aiming to perform an in-depth analysis of the state of the art, thus gaining insights regarding fu- ture avenues of research. To that end, we analyze 29 primary studies published from 2006 to 2020, which have been identi ed through a systematic and documented process. This paper addresses ve research questions addressing variations in ML-based TSP techniques and feature sets for training and testing ML models, alternative metrics used for evaluating the techniques, the performance of techniques, and the reproducibility of the published studies. We summarize the results related to our research questions in a high-level summary that can be used as a taxonomy for classifying future TSP studies. [less ▲]

Detailed reference viewed: 144 (22 UL)
Full Text
Peer Reviewed
See detailATM: Black-box Test Case Minimization based on Test Code Similarity and Evolutionary Search
Pan, Rongqi; Ghaleb, Taher; Briand, Lionel UL

in IEEE/ACM International Conference on Software Engineering (2023)

Full Text
Peer Reviewed
See detailAutomated, Cost-effective, and Update-driven App Testing
Ngo, Chanh Duc UL; Pastore, Fabrizio UL; Briand, Lionel UL

in ACM Transactions on Software Engineering and Methodology (2022), 31(4), 61

Apps’ pervasive role in our society led to the definition of test automation approaches to ensure their dependability. However, state-of-the-art approaches tend to generate large numbers of test inputs ... [more ▼]

Apps’ pervasive role in our society led to the definition of test automation approaches to ensure their dependability. However, state-of-the-art approaches tend to generate large numbers of test inputs and are unlikely to achieve more than 50% method coverage. In this paper, we propose a strategy to achieve significantly higher coverage of the code affected by updates with a much smaller number of test inputs, thus alleviating the test oracle problem. More specifically, we present ATUA, a model-based approach that synthesizes App models with static analysis, integrates a dynamically-refined state abstraction function, and combines complementary testing strategies, including (1) coverage of the model structure, (2) coverage of the App code, (3) random exploration, and (4) coverage of dependencies identified through information retrieval. Its model-based strategy enables ATUA to generate a small set of inputs that exercise only the code affected by the updates. In turn, this makes common test oracle solutions more cost-effective as they tend to involve human effort. A large empirical evaluation, conducted with 72 App versions belonging to nine popular Android Apps, has shown that ATUA is more effective and less effort-intensive than state-of-the-art approaches when testingApp updates. [less ▲]

Detailed reference viewed: 81 (20 UL)
Full Text
Peer Reviewed
See detailMutation Analysis for Cyber-Physical Systems: Scalable Solutions and Results in the Space Domain
Cornejo Olivares, Oscar Eduardo UL; Pastore, Fabrizio UL; Briand, Lionel UL

in IEEE Transactions on Software Engineering (2022), 48(10), 39133939

On-board embedded software developed for spaceflight systems (space software) must adhere to stringent software quality assurance procedures. For example, verification and validation activities are ... [more ▼]

On-board embedded software developed for spaceflight systems (space software) must adhere to stringent software quality assurance procedures. For example, verification and validation activities are typically performed and assessed by third party organizations. To further minimize the risk of human mistakes, space agencies, such as the European Space Agency (ESA), are looking for automated solutions for the assessment of software testing activities, which play a crucial role in this context. Though space software is our focus here, it should be noted that such software shares the above considerations, to a large extent, with embedded software in many other types of cyber-physical systems. Over the years, mutation analysis has shown to be a promising solution for the automated assessment of test suites; it consists of measuring the quality of a test suite in terms of the percentage of injected faults leading to a test failure. A number of optimization techniques, addressing scalability and accuracy problems, have been proposed to facilitate the industrial adoption of mutation analysis. However, to date, two major problems prevent space agencies from enforcing mutation analysis in space software development. First, there is uncertainty regarding the feasibility of applying mutation analysis optimization techniques in their context. Second, most of the existing techniques either can break the real-time requirements common in embedded software or cannot be applied when the software is tested in Software Validation Facilities, including CPU emulators and sensor simulators. In this paper, we enhance mutation analysis optimization techniques to enable their applicability to embedded software and propose a pipeline that successfully integrates them to address scalability and accuracy issues in this context, as described above. Further, we report on the largest study involving embedded software systems in the mutation analysis literature. Our research is part of a research project funded by ESA ESTEC involving private companies (GomSpace Luxembourg and LuxSpace) in the space sector. These industry partners provided the case studies reported in this paper; they include an on-board software system managing a microsatellite currently on-orbit, a set of libraries used in deployed cubesats, and a mathematical library certified by ESA. [less ▲]

Detailed reference viewed: 463 (41 UL)
Full Text
Peer Reviewed
See detailOptimal Priority Assignment for Real-Time Systems: A Coevolution-Based Approach
Lee, Jaekwon UL; Shin, Seung Yeob UL; Nejati, Shiva et al

in Empirical Software Engineering (2022), 27

In real-time systems, priorities assigned to real-time tasks determine the order of task executions, by relying on an underlying task scheduling policy. Assigning optimal priority values to tasks is ... [more ▼]

In real-time systems, priorities assigned to real-time tasks determine the order of task executions, by relying on an underlying task scheduling policy. Assigning optimal priority values to tasks is critical to allow the tasks to complete their executions while maximizing safety margins from their specified deadlines. This enables real-time systems to tolerate unexpected overheads in task executions and still meet their deadlines. In practice, priority assignments result from an interactive process between the development and testing teams. In this article, we propose an automated method that aims to identify the best possible priority assignments in real-time systems, accounting for multiple objectives regarding safety margins and engineering constraints. Our approach is based on a multi-objective, competitive coevolutionary algorithm mimicking the interactive priority assignment process between the development and testing teams. We evaluate our approach by applying it to six industrial systems from different domains and several synthetic systems. The results indicate that our approach significantly outperforms both our baselines, i.e., random search and sequential search, and solutions defined by practitioners. Our approach scales to complex industrial systems as an offline analysis method that attempts to find near-optimal solutions within acceptable time, i.e., less than 16 hours. [less ▲]

Detailed reference viewed: 106 (32 UL)
Full Text
Peer Reviewed
See detailGuidelines for Assessing the Accuracy of Log Message Template Identification Techniques
Khan, Zanis Ali UL; Shin, Donghwan UL; Bianculli, Domenico UL et al

in Proceedings of the 44th International Conference on Software Engineering (ICSE ’22) (2022, July)

Log message template identification aims to convert raw logs containing free-formed log messages into structured logs to be processed by automated log-based analysis, such as anomaly detection and model ... [more ▼]

Log message template identification aims to convert raw logs containing free-formed log messages into structured logs to be processed by automated log-based analysis, such as anomaly detection and model inference. While many techniques have been proposed in the literature, only two recent studies provide a comprehensive evaluation and comparison of the techniques using an established benchmark composed of real-world logs. Nevertheless, we argue that both studies have the following issues: (1) they used different accuracy metrics without comparison between them, (2) some ground-truth (oracle) templates are incorrect, and (3) the accuracy evaluation results do not provide any information regarding incorrectly identified templates. In this paper, we address the above issues by providing three guidelines for assessing the accuracy of log template identification techniques: (1) use appropriate accuracy metrics, (2) perform oracle template correction, and (3) perform analysis of incorrect templates. We then assess the application of such guidelines through a comprehensive evaluation of 14 existing template identification techniques on the established benchmark logs. Results show very different insights than existing studies and in particular a much less optimistic outlook on existing techniques. [less ▲]

Detailed reference viewed: 316 (35 UL)
Full Text
Peer Reviewed
See detailBlack-box Safety Analysis and Retraining of DNNs based on Feature Extraction and Clustering
Attaoui, Mohammed Oualid UL; Fahmy, Hazem UL; Pastore, Fabrizio UL et al

in ACM Transactions on Software Engineering and Methodology (2022)

Deep neural networks (DNNs) have demonstrated superior performance over classical machine learning to support many features in safety-critical systems. Although DNNs are now widely used in such systems (e ... [more ▼]

Deep neural networks (DNNs) have demonstrated superior performance over classical machine learning to support many features in safety-critical systems. Although DNNs are now widely used in such systems (e.g., self driving cars), there is limited progress regarding automated support for functional safety analysis in DNN-based systems. For example, the identification of root causes of errors, to enable both risk analysis and DNN retraining, remains an open problem. In this paper, we propose SAFE, a black-box approach to automatically characterize the root causes of DNN errors. SAFE relies on a transfer learning model pre-trained on ImageNet to extract the features from error-inducing images. It then applies a density-based clustering algorithm to detect arbitrary shaped clusters of images modeling plausible causes of error. Last, clusters are used to effectively retrain and improve the DNN. The black-box nature of SAFE is motivated by our objective not to require changes or even access to the DNN internals to facilitate adoption. Experimental results show the superior ability of SAFE in identifying different root causes of DNN errors based on case studies in the automotive domain. It also yields significant improvements in DNN accuracy after retraining, while saving significant execution time and memory when compared to alternatives. [less ▲]

Detailed reference viewed: 36 (4 UL)
Full Text
Peer Reviewed
See detailATUA: an update-driven app testing tool
Ngo, Chanh Duc UL; Pastore, Fabrizio UL; Briand, Lionel UL

in The 31st ACM SIGSOFT International Symposium on Software Testing and Analysis (2022, July)

App testing tools tend to generate thousand test inputs; they help engineers identify crashing conditions but not functional failures. Indeed, detecting functional failures requires the visual inspection ... [more ▼]

App testing tools tend to generate thousand test inputs; they help engineers identify crashing conditions but not functional failures. Indeed, detecting functional failures requires the visual inspection of App outputs, which is infeasible for thousands of inputs. Existing App testing tools ignore that most of the Apps are frequently updated and engineers are mainly interested in testing the updated functionalities; indeed, automated regression test cases can be used otherwise. We present ATUA, an open source tool targeting Android Apps. It achieves high coverage of the updated App code with a small number of test inputs, thus alleviating the test oracle problem (less outputs to inspect). It implements a model-based approach that synthesizes App models with static analysis, integrates a dynamically-refined state abstraction function and combines complementary testing strategies, including (1) coverage of the model structure, (2) coverage of the App code, (3) random exploration, and (4) coverage of dependencies identified through information retrieval. Our empirical evaluation, conducted with nine popular Android Apps (72 versions), has shown that ATUA, compared to state-of-the-art approaches, achieves higher code coverage while producing fewer outputs to be manually inspected. A demo video is available at https://youtu.be/RqQ1z_Nkaqo. [less ▲]

Detailed reference viewed: 34 (1 UL)
Full Text
Peer Reviewed
See detailEstimating Probabilistic Safe WCET Ranges of Real-Time Systems at Design Stages
Lee, Jaekwon UL; Shin, Seung Yeob UL; Nejati, Shiva et al

in ACM Transactions on Software Engineering and Methodology (2022)

Estimating worst-case execution times (WCET) is an important activity at early design stages of real-time systems. Based on WCET estimates, engineers make design and implementation decisions to ensure ... [more ▼]

Estimating worst-case execution times (WCET) is an important activity at early design stages of real-time systems. Based on WCET estimates, engineers make design and implementation decisions to ensure that task execution always complete before their specified deadlines. However, in practice, engineers often cannot provide precise point WCET estimates and prefer to provide plausible WCET ranges. Given a set of real-time tasks with such ranges, we provide an automated technique to determine for what WCET values the system is likely to meet its deadlines, and hence operate safely with a probabilistic guarantee. Our approach combines a search algorithm for generating worst-case scheduling scenarios with polynomial logistic regression for inferring probabilistic safe WCET ranges. We evaluated our approach by applying it to three industrial systems from different domains and several synthetic systems. Our approach efficiently and accurately estimates probabilistic safe WCET ranges within which deadlines are likely to be satisfied with a high degree of confidence. [less ▲]

Detailed reference viewed: 97 (19 UL)
Full Text
Peer Reviewed
See detailHUDD: A tool to debug DNNs for safety analysis
Fahmy, Hazem UL; Pastore, Fabrizio UL; Briand, Lionel UL

in 2022 IEEE/ACM 44th International Conference on Software Engineering (2022, May)

We present HUDD, a tool that supports safety analysis practices for systems enabled by Deep Neural Networks (DNNs) by automatically identifying the root causes for DNN errors and retraining the DNN. HUDD ... [more ▼]

We present HUDD, a tool that supports safety analysis practices for systems enabled by Deep Neural Networks (DNNs) by automatically identifying the root causes for DNN errors and retraining the DNN. HUDD stands for Heatmap-based Unsupervised Debugging of DNNs, it automatically clusters error-inducing images whose results are due to common subsets of DNN neurons. The intent is for the generated clusters to group error-inducing images having common characteristics, that is, having a common root cause. HUDD identifies root causes by applying a clustering algorithm to matrices (i.e., heatmaps) capturing the relevance of every DNN neuron on the DNN outcome. Also, HUDD retrains DNNs with images that are automatically selected based on their relatedness to the identified image clusters. Our empirical evaluation with DNNs from the automotive domain have shown that HUDD automatically identifies all the distinct root causes of DNN errors, thus supporting safety analysis. Also, our retraining approach has shown to be more effective at improving DNN accuracy than existing approaches. A demo video of HUDD is available at https://youtu.be/drjVakP7jdU. [less ▲]

Detailed reference viewed: 186 (41 UL)
Full Text
Peer Reviewed
See detailMASS: A tool for Mutation Analysis of Space CPS
Cornejo Olivares, Oscar Eduardo UL; Pastore, Fabrizio UL; Briand, Lionel UL

in 2022 IEEE/ACM 44st International Conference on Software Engineering (2022, May)

We present MASS, a mutation analysis tool for embedded software in cyber-physical systems (CPS). We target space CPS (e.g., satellites) and other CPS with similar characteristics (e.g., UAV). Mutation ... [more ▼]

We present MASS, a mutation analysis tool for embedded software in cyber-physical systems (CPS). We target space CPS (e.g., satellites) and other CPS with similar characteristics (e.g., UAV). Mutation analysis measures the quality of test suites in terms of the percentage of detected artificial faults. There are many mutation analysis tools available, but they are inapplicable to CPS because of scalability and accuracy challenges. To overcome such limitations, MASS implements a set of optimization techniques that enable the applicability of mutation analysis and address scalability and accuracy in the CPS context. MASS has been successfully evaluated on a large study involving embedded software systems provided by industry partners; the study includes an on-board software system managing a microsatellite currently on-orbit, a set of libraries used in deployed cubesats, and a mathematical library provided by the European Space Agency. A demo video of MASS is available at https://www.youtube.com/watch?v=gC1x9cU0-tU. [less ▲]

Detailed reference viewed: 119 (35 UL)
Full Text
Peer Reviewed
See detailEfficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and Many-Objective Optimization
Ul Haq, Fitash UL; Shin, Donghwan UL; Briand, Lionel UL

in Proceedings of the 44th International Conference on Software Engineering (ICSE ’22) (2022, May)

With the recent advances of Deep Neural Networks (DNNs) in real-world applications, such as Automated Driving Systems (ADS) for self-driving cars, ensuring the reliability and safety of such DNN- enabled ... [more ▼]

With the recent advances of Deep Neural Networks (DNNs) in real-world applications, such as Automated Driving Systems (ADS) for self-driving cars, ensuring the reliability and safety of such DNN- enabled Systems emerges as a fundamental topic in software testing. One of the essential testing phases of such DNN-enabled systems is online testing, where the system under test is embedded into a specific and often simulated application environment (e.g., a driving environment) and tested in a closed-loop mode in interaction with the environment. However, despite the importance of online testing for detecting safety violations, automatically generating new and diverse test data that lead to safety violations present the following challenges: (1) there can be many safety requirements to be considered at the same time, (2) running a high-fidelity simulator is often very computationally-intensive, and (3) the space of all possible test data that may trigger safety violations is too large to be exhaustively explored. In this paper, we address the challenges by proposing a novel approach, called SAMOTA (Surrogate-Assisted Many-Objective Testing Approach), extending existing many-objective search algorithms for test suite generation to efficiently utilize surrogate models that mimic the simulator, but are much less expensive to run. Empirical evaluation results on Pylot, an advanced ADS composed of multiple DNNs, using CARLA, a high-fidelity driving simulator, show that SAMOTA is significantly more effective and efficient at detecting unknown safety requirement violations than state-of-the-art many-objective test suite generation algorithms and random search. In other words, SAMOTA appears to be a key enabler technology for online testing in practice. [less ▲]

Detailed reference viewed: 945 (75 UL)
Full Text
Peer Reviewed
See detailPRINS: Scalable Model Inference for Component-based System Logs
Shin, Donghwan UL; Bianculli, Domenico UL; Briand, Lionel UL

in Empirical Software Engineering (2022)

Behavioral software models play a key role in many software engineering tasks; unfortunately, these models either are not available during software development or, if available, quickly become outdated as ... [more ▼]

Behavioral software models play a key role in many software engineering tasks; unfortunately, these models either are not available during software development or, if available, quickly become outdated as implementations evolve. Model inference techniques have been proposed as a viable solution to extract finite state models from execution logs. However, existing techniques do not scale well when processing very large logs that can be commonly found in practice. In this paper, we address the scalability problem of inferring the model of a component-based system from large system logs, without requiring any extra information. Our model inference technique, called PRINS, follows a divide-and-conquer approach. The idea is to first infer a model of each system component from the corresponding logs; then, the individual component models are merged together taking into account the flow of events across components, as reflected in the logs. We evaluated PRINS in terms of scalability and accuracy, using nine datasets composed of logs extracted from publicly available benchmarks and a personal computer running desktop business applications. The results show that PRINS can process large logs much faster than a publicly available and well-known state-of-the-art tool, without significantly compromising the accuracy of inferred models. [less ▲]

Detailed reference viewed: 300 (28 UL)