Results 1-20 of 260.
((uid:50002182))
![]() Titcheu Chekam, Thierry ![]() ![]() ![]() in Empirical Software Engineering (in press) Detailed reference viewed: 279 (35 UL)![]() Kubler, Sylvain ![]() ![]() in Expert systems with applications (2023), 211 Blockchain technologies, also known as Distributed Ledger Technologies (DLT), are increasingly being explored in many applications, especially in the presence of (potential) dis-/mis-/un-trust among ... [more ▼] Blockchain technologies, also known as Distributed Ledger Technologies (DLT), are increasingly being explored in many applications, especially in the presence of (potential) dis-/mis-/un-trust among organizations and individuals. Today, there exists a plethora of DLT platforms on the market, which makes it challenging for system designers to decide what platform they should adopt and implement. Although a few DLT comparison frameworks have been proposed in the literature, they often fail in covering all performance and functional aspects, adding that they too rarely build upon standardized criteria and recommendations. Given this state of affairs, the present paper considers a recent and exhaustive set of assessment criteria recommended by the ITU (International Telecommunication Union). Those criteria (about fifty) are nonetheless mostly defined in a textual form, which may pose interpretation problems during the implementation process. To avoid this, a systematic literature review regarding each ITU criterion is conducted with a twofold objective: (i) to understand to what extent a given criterion is considered/evaluated by the literature; (ii) to come up with ‘formal’ metric definition (i.e., on a mathematical or experimental ground) based, whenever possible, on the current literature. Following this formalization stage, a decision support tool called CREDO-DLT, which stands for “multiCRiteria-basEd ranking Of Distributed Ledger Technology platforms”, is developed using AHP and TOPSIS, which is publicly made available to help decision-maker to select the most suitable DLT platform alternative (i.e., that best suits their needs and requirements). A use case scenario in the context of energy communities is proposed to show the practicality of CREDO-DLT. •Blockchain (DLT) standardization initiatives are reviewed.•To what extent ITU’s DLT assessment criteria are covered in literature is studied.•A mathematical formalizations of the ITU recommendations are proposed.•A decision support tool (CREDO-DLT) is designed for DLT platform selection.•An energy community use case is developed to show the practicality of CREDO-DLT. [less ▲] Detailed reference viewed: 88 (2 UL)![]() Khanfir, Ahmed ![]() ![]() ![]() in 22nd IEEE International Conference on Software Quality, Reliability and Security (QRS'22) (2022, December 05) Much of recent software-engineering research has investigated the naturalness of code, the fact that code, in small code snippets, is repetitive and can be predicted using statistical language models like ... [more ▼] Much of recent software-engineering research has investigated the naturalness of code, the fact that code, in small code snippets, is repetitive and can be predicted using statistical language models like n-gram. Although powerful, training such models on large code corpus can be tedious, time consuming and sensitive to code patterns (and practices) encountered during training. Consequently, these models are often trained on a small corpus and thus only estimate the language naturalness relative to a specific style of programming or type of project. To overcome these issues, we investigate the use of pre-trained generative language models to infer code naturalness. Pre-trained models are often built on big data, are easy to use in an out-of-the-box way and include powerful learning associations mechanisms. Our key idea is to quantify code naturalness through its predictability, by using state-of-the-art generative pre-trained language models. Thus, we suggest to infer naturalness by masking (omitting) code tokens, one at a time, of code-sequences, and checking the models’ability to predict them. We explore three different predictability metrics; a) measuring the number of exact matches of the predictions, b) computing the embedding similarity between the original and predicted code, i.e., similarity at the vector space, and c) computing the confidence of the model when doing the token completion task regardless of the outcome. We implement this workflow, named CODEBERT-NT, and evaluate its capability to prioritize buggy lines over non-buggy ones when ranking code based on its naturalness. Our results, on 2,510 buggy versions of 40 projects from the SmartShark dataset, show that CODEBERT-NT outperforms both, random-uniform and complexity-based ranking techniques, and yields comparable results to the n-gram models. [less ▲] Detailed reference viewed: 76 (15 UL)![]() Bernier, Fabien ![]() ![]() ![]() in Has it Trained Yet? Workshop at the Conference on Neural Information Processing Systems (2022, December 02) Energy demand forecasting is one of the most challenging tasks for grids operators. Many approaches have been suggested over the years to tackle it. Yet, those still remain too expensive to train in terms ... [more ▼] Energy demand forecasting is one of the most challenging tasks for grids operators. Many approaches have been suggested over the years to tackle it. Yet, those still remain too expensive to train in terms of both time and computational resources, hindering their adoption as customers behaviors are continuously evolving. We introduce Transplit, a new lightweight transformer-based model, which significantly decreases this cost by exploiting the seasonality property and learning typical days of power demand. We show that Transplit can be run efficiently on CPU and is several hundred times faster than state-of-the-art predictive models, while performing as well. [less ▲] Detailed reference viewed: 115 (10 UL)![]() Lorentz, Joe ![]() ![]() ![]() in Software and Systems Modeling (2022) Models based on differential programming, like deep neural networks, are well established in research and able to outperform manually coded counterparts in many applications. Today, there is a rising ... [more ▼] Models based on differential programming, like deep neural networks, are well established in research and able to outperform manually coded counterparts in many applications. Today, there is a rising interest to introduce this flexible modeling to solve real-world problems. A major challenge when moving from research to application is the strict constraints on computational resources (memory and time). It is difficult to determine and contain the resource requirements of differential models, especially during the early training and hyperparameter exploration stages. In this article, we address this challenge by introducing CalcGraph, a model abstraction of differentiable programming layers. CalcGraph allows to model the computational resources that should be used and then CalcGraph’s model interpreter can automatically schedule the execution respecting the specifications made. We propose a novel way to efficiently switch models from storage to preallocated memory zones and vice versa to maximize the number of model executions given the available resources. We demonstrate the efficiency of our approach by showing that it consumes less resources than state-of-the-art frameworks like TensorFlow and PyTorch for single-model and multi-model execution. [less ▲] Detailed reference viewed: 53 (3 UL)![]() Souani, Badr ![]() ![]() ![]() in Jianying, Zhou (Ed.) Applied Cryptography and Network Security Workshops (2022, September 24) In this paper, we propose two empirical studies to (1) detect Android malware and (2) classify Android malware into families. We rst (1) reproduce the results of MalBERT using BERT models learning with ... [more ▼] In this paper, we propose two empirical studies to (1) detect Android malware and (2) classify Android malware into families. We rst (1) reproduce the results of MalBERT using BERT models learning with Android application's manifests obtained from 265k applications (vs. 22k for MalBERT) from the AndroZoo dataset in order to detect malware. The results of the MalBERT paper are excellent and hard to believe as a manifest only roughly represents an application, we therefore try to answer the following questions in this paper. Are the experiments from MalBERT reproducible? How important are Permissions for mal- ware detection? Is it possible to keep or improve the results by reducing the size of the manifests? We then (2) investigate if BERT can be used to classify Android malware into families. The results show that BERT can successfully di erentiate malware/goodware with 97% accuracy. Further- more BERT can classify malware families with 93% accuracy. We also demonstrate that Android permissions are not what allows BERT to successfully classify and even that it does not actually need it. [less ▲] Detailed reference viewed: 52 (3 UL)![]() Garg, Aayush ![]() ![]() ![]() in Empirical Software Engineering (2022) Detailed reference viewed: 165 (27 UL)![]() ; Bissyande, Tegawendé François D Assise ![]() in Empirical Software Engineering (2022), 27 Detailed reference viewed: 46 (0 UL)![]() Khanfir, Ahmed ![]() ![]() in ACM Transactions on Software Engineering and Methodology (2022) Detailed reference viewed: 36 (1 UL)![]() Ojdanic, Milos ![]() ![]() ![]() in ACM Transactions on Software Engineering and Methodology (2022) Context: When software evolves, opportunities for introducing faults appear. Therefore, it is important to test the evolved program behaviors during each evolution cycle. However, while software evolves ... [more ▼] Context: When software evolves, opportunities for introducing faults appear. Therefore, it is important to test the evolved program behaviors during each evolution cycle. However, while software evolves, its complexity is also evolving, introducing challenges to the testing process. To deal with this issue, testing techniques should be adapted to target the effect of the program changes instead of the entire program functionality. To this end, commit-aware mutation testing, a powerful testing technique, has been proposed. Unfortunately, commit-aware mutation testing is challenging due to the complex program semantics involved. Hence, it is pertinent to understand the characteristics, predictability, and potential of the technique. Objective: We conduct an exploratory study to investigate the properties of commit-relevant mutants, i.e., the test elements of commit-aware mutation testing, by proposing a general definition and an experimental approach to identify them. We thus, aim at investigating the prevalence, location, and comparative advantages of commit-aware mutation testing over time (i.e., the program evolution). We also investigate the predictive power of several commit-related features in identifying and selecting commit-relevant mutants to understand the essential properties for its best-effort application case. Method: Our commit-relevant definition relies on the notion of observational slicing, approximated by higher-order mutation. Specifically, our approach utilizes the impact of mutants, effects of one mutant on another in capturing and analyzing the implicit interactions between the changed and unchanged code parts. The study analyses millions of mutants (over 10 million), 288 commits, five (5) different open-source software projects involving over 68,213 CPU days of computation and sets a ground truth where we perform our analysis. Results: Our analysis shows that commit-relevant mutants are located mainly outside of program commit change (81%), suggesting a limitation in previous work. We also note that effective selection of commit-relevant mutants has the potential of reducing the number of mutants by up to 93%. In addition, we demonstrate that commit relevant mutation testing is significantly more effective and efficient than state-of-the-art baselines, i.e., random mutant selection and analysis of only mutants within the program change. In our analysis of the predictive power of mutants and commit-related features (e.g., number of mutants within a change, mutant type, and commit size) in predicting commit-relevant mutants, we found that most proxy features do not reliably predict commit-relevant mutants. Conclusion: This empirical study highlights the properties of commit-relevant mutants and demonstrates the importance of identifying and selecting commit-relevant mutants when testing evolving software systems. [less ▲] Detailed reference viewed: 43 (1 UL)![]() Ghamizi, Salah ![]() ![]() E-print/Working paper (2022) Clinicians use chest radiography (CXR) to diagnose common pathologies. Automated classification of these diseases can expedite analysis workflow, scale to growing numbers of patients and reduce healthcare ... [more ▼] Clinicians use chest radiography (CXR) to diagnose common pathologies. Automated classification of these diseases can expedite analysis workflow, scale to growing numbers of patients and reduce healthcare costs. While research has produced classification models that perform well on a given dataset, the same models lack generalization on different datasets. This reduces confidence that these models can be reliably deployed across various clinical settings. We propose an approach based on multitask learning to improve model generalization. We demonstrate that learning a (main) pathology together with an auxiliary pathology can significantly impact generalization performance (between -10% and +15% AUC-ROC). A careful choice of auxiliary pathology even yields competitive performance with state-of-the-art models that rely on fine-tuning or ensemble learning, using between 6% and 34% of the training data that these models required. We, further, provide a method to determine what is the best auxiliary task to choose without access to the target dataset. Ultimately, our work makes a big step towards the creation of CXR diagnosis models applicable in the real world, through the evidence that multitask learning can drastically improve generalization. [less ▲] Detailed reference viewed: 134 (17 UL)![]() Garg, Aayush ![]() ![]() ![]() in IEEE Transactions on Software Engineering (2022) Detailed reference viewed: 141 (36 UL)![]() Ghamizi, Salah ![]() ![]() ![]() in The Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI- 23) - SafeAI Workshop, Washington, D.C., Feb 13-14, 2023 (2022) Detailed reference viewed: 62 (0 UL)![]() Ghamizi, Salah ![]() ![]() ![]() in Proceedings of the thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI-22) (2022) Vulnerability to adversarial attacks is a well-known weakness of Deep Neural networks. While most of the studies focus on single-task neural networks with computer vision datasets, very little research ... [more ▼] Vulnerability to adversarial attacks is a well-known weakness of Deep Neural networks. While most of the studies focus on single-task neural networks with computer vision datasets, very little research has considered complex multi-task models that are common in real applications. In this paper, we evaluate the design choices that impact the robustness of multi-task deep learning networks. We provide evidence that blindly adding auxiliary tasks, or weighing the tasks provides a false sense of robustness. Thereby, we tone down the claim made by previous research and study the different factors which may affect robustness. In particular, we show that the choice of the task to incorporate in the loss function are important factors that can be leveraged to yield more robust models. [less ▲] Detailed reference viewed: 169 (10 UL)![]() Simonetto, Thibault Jean Angel ![]() ![]() ![]() in Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22 (2022) The generation of feasible adversarial examples is necessary for properly assessing models that work in constrained feature space. However, it remains a challenging task to enforce constraints into ... [more ▼] The generation of feasible adversarial examples is necessary for properly assessing models that work in constrained feature space. However, it remains a challenging task to enforce constraints into attacks that were designed for computer vision. We propose a unified framework to generate feasible adversarial examples that satisfy given domain constraints. Our framework can handle both linear and non-linear constraints. We instantiate our framework into two algorithms: a gradient-based attack that introduces constraints in the loss function to maximize, and a multi-objective search algorithm that aims for misclassification, perturbation minimization, and constraint satisfaction. We show that our approach is effective in four different domains, with a success rate of up to 100%, where state-of-the-art attacks fail to generate a single feasible example. In addition to adversarial retraining, we propose to introduce engineered non-convex constraints to improve model adversarial robustness. We demonstrate that this new defense is as effective as adversarial retraining. Our framework forms the starting point for research on constrained adversarial attacks and provides relevant baselines and datasets that future research can exploit. [less ▲] Detailed reference viewed: 64 (3 UL)![]() Gubri, Martin ![]() ![]() ![]() in The 38th Conference on Uncertainty in Artificial Intelligence (2022) An established way to improve the transferability of black-box evasion attacks is to craft the adversarial examples on an ensemble-based surrogate to increase diversity. We argue that transferability is ... [more ▼] An established way to improve the transferability of black-box evasion attacks is to craft the adversarial examples on an ensemble-based surrogate to increase diversity. We argue that transferability is fundamentally related to uncertainty. Based on a state-of-the-art Bayesian Deep Learning technique, we propose a new method to efficiently build a surrogate by sampling approximately from the posterior distribution of neural network weights, which represents the belief about the value of each parameter. Our extensive experiments on ImageNet, CIFAR-10 and MNIST show that our approach improves the success rates of four state-of-the-art attacks significantly (up to 83.2 percentage points), in both intra-architecture and inter-architecture transferability. On ImageNet, our approach can reach 94% of success rate while reducing training computations from 11.6 to 2.4 exaflops, compared to an ensemble of independently trained DNNs. Our vanilla surrogate achieves 87.5% of the time higher transferability than three test-time techniques designed for this purpose. Our work demonstrates that the way to train a surrogate has been overlooked, although it is an important element of transfer-based attacks. We are, therefore, the first to review the effectiveness of several training methods in increasing transferability. We provide new directions to better understand the transferability phenomenon and offer a simple but strong baseline for future work. [less ▲] Detailed reference viewed: 76 (9 UL)![]() Ruiz Rodriguez, Marcelo Luis ![]() ![]() in Robotics and Computer-Integrated Manufacturing (2022) In the context of Industry 4.0, companies understand the advantages of performing Predictive Maintenance (PdM). However, when moving towards PdM, several considerations must be carefully examined. First ... [more ▼] In the context of Industry 4.0, companies understand the advantages of performing Predictive Maintenance (PdM). However, when moving towards PdM, several considerations must be carefully examined. First, they need to have a sufficient number of production machines and relative fault data to generate maintenance predictions. Second, they need to adopt the right maintenance approach, which, ideally, should self-adapt to the machinery, priorities of the organization, technician skills, but also to be able to deal with uncertainty. Reinforcement learning (RL) is envisioned as a key technique in this regard due to its inherent ability to learn by interacting through trials and errors, but very few RL-based maintenance frameworks have been proposed so far in the literature, or are limited in several respects. This paper proposes a new multi-agent approach that learns a maintenance policy performed by technicians, under the uncertainty of multiple machine failures. This approach comprises RL agents that partially observe the state of each machine to coordinate the decision-making in maintenance scheduling, resulting in the dynamic assignment of maintenance tasks to technicians (with different skills) over a set of machines. Experimental evaluation shows that our RL-based maintenance policy outperforms traditional maintenance policies (incl., corrective and preventive ones) in terms of failure prevention and downtime, improving by ≈75% the overall performance. [less ▲] Detailed reference viewed: 112 (2 UL)![]() Hu, Qiang ![]() ![]() ![]() in ACM Transactions on Software Engineering and Methodology (2022) Similar to traditional software that is constantly under evolution, deep neural networks (DNNs) need to evolve upon the rapid growth of test data for continuous enhancement, e.g., adapting to distribution ... [more ▼] Similar to traditional software that is constantly under evolution, deep neural networks (DNNs) need to evolve upon the rapid growth of test data for continuous enhancement, e.g., adapting to distribution shift in a new environment for deployment. However, it is labor-intensive to manually label all the collected test data. Test selection solves this problem by strategically choosing a small set to label. Via retraining with the selected set, DNNs will achieve competitive accuracy. Unfortunately, existing selection metrics involve three main limitations: 1) using different retraining processes; 2) ignoring data distribution shifts; 3) being insufficiently evaluated. To fill this gap, we first conduct a systemically empirical study to reveal the impact of the retraining process and data distribution on model enhancement. Then based on our findings, we propose a novel distribution-aware test (DAT) selection metric. Experimental results reveal that retraining using both the training and selected data outperforms using only the selected data. None of the selection metrics perform the best under various data distributions. By contrast, DAT effectively alleviates the impact of distribution shifts and outperforms the compared metrics by up to 5 times and 30.09% accuracy improvement for model enhancement on simulated and in-the-wild distribution shift scenarios, respectively. [less ▲] Detailed reference viewed: 273 (62 UL)![]() Ghamizi, Salah ![]() ![]() ![]() in Proceedings of International Conference on Computer Vision 2021 (2021) Evasion Attacks have been commonly seen as a weakness of Deep Neural Networks. In this paper, we flip the paradigm and envision this vulnerability as a useful application. We propose EAST, a new ... [more ▼] Evasion Attacks have been commonly seen as a weakness of Deep Neural Networks. In this paper, we flip the paradigm and envision this vulnerability as a useful application. We propose EAST, a new steganography and watermarking technique based on multi-label targeted evasion attacks. Our results confirm that our embedding is elusive; it not only passes unnoticed by humans, steganalysis methods, and machine-learning detectors. In addition, our embedding is resilient to soft and aggressive image tampering (87% recovery rate under jpeg compression). EAST outperforms existing deep-learning-based steganography approaches with images that are 70% denser and 73% more robust and supports multiple datasets and architectures. [less ▲] Detailed reference viewed: 185 (25 UL)![]() Ghamizi, Salah ![]() ![]() ![]() E-print/Working paper (2021) Vulnerability to adversarial attacks is a well-known weakness of Deep Neural Networks. While most of the studies focus on natural images with standardized benchmarks like ImageNet and CIFAR, little ... [more ▼] Vulnerability to adversarial attacks is a well-known weakness of Deep Neural Networks. While most of the studies focus on natural images with standardized benchmarks like ImageNet and CIFAR, little research has considered real world applications, in particular in the medical domain. Our research shows that, contrary to previous claims, robustness of chest x-ray classification is much harder to evaluate and leads to very different assessments based on the dataset, the architecture and robustness metric. We argue that previous studies did not take into account the peculiarity of medical diagnosis, like the co-occurrence of diseases, the disagreement of labellers (domain experts), the threat model of the attacks and the risk implications for each successful attack. In this paper, we discuss the methodological foundations, review the pitfalls and best practices, and suggest new methodological considerations for evaluating the robustness of chest xray classification models. Our evaluation on 3 datasets, 7 models, and 18 diseases is the largest evaluation of robustness of chest x-ray classification models. We believe our findings will provide reliable guidelines for realistic evaluation and improvement of the robustness of machine learning models for medical diagnosis. [less ▲] Detailed reference viewed: 162 (19 UL) |
||