![]() Titcheu Chekam, Thierry ![]() ![]() ![]() in Empirical Software Engineering (in press) Detailed reference viewed: 223 (25 UL)![]() Samhi, Jordan ![]() ![]() ![]() in 19th International Conference on Mining Software Repositories, Data Showcase, (MSR 2022) (2022, May 23) Many Android apps analyzers rely, among other techniques, on dynamic analysis to monitor their runtime behavior and detect potential security threats. However, malicious developers use subtle, though ... [more ▼] Many Android apps analyzers rely, among other techniques, on dynamic analysis to monitor their runtime behavior and detect potential security threats. However, malicious developers use subtle, though efficient, techniques to bypass dynamic analyzers. Logic bombs are examples of popular techniques where the malicious code is triggered only under specific circumstances, challenging comprehensive dynamic analyses. The research community has proposed various approaches and tools to detect logic bombs. Unfortunately, rigorous assessment and fair comparison of state-of-the-art techniques are impossible due to the lack of ground truth. In this paper, we present TriggerZoo, a new dataset of 406 Android apps containing logic bombs and benign trigger-based behavior that we release only to the research community using authenticated API. These apps are real-world apps from Google Play that have been automatically infected by our tool AndroBomb. The injected pieces of code implementing the logic bombs cover a large pallet of realistic logic bomb types that we have manually characterized from a set of real logic bombs. Researchers can exploit this dataset as ground truth to assess their approaches and provide comparisons against other tools. [less ▲] Detailed reference viewed: 81 (7 UL)![]() Samhi, Jordan ![]() ![]() in 44th International Conference on Software Engineering (ICSE 2022) (2022, May 21) One prominent tactic used to keep malicious behavior from being detected during dynamic test campaigns is logic bombs, where malicious operations are triggered only when specific conditions are satisfied ... [more ▼] One prominent tactic used to keep malicious behavior from being detected during dynamic test campaigns is logic bombs, where malicious operations are triggered only when specific conditions are satisfied. Defusing logic bombs remains an unsolved problem in the literature. In this work, we propose to investigate Suspicious Hidden Sensitive Operations (SHSOs) as a step towards triaging logic bombs. To that end, we develop a novel hybrid approach that combines static analysis and anomaly detection techniques to uncover SHSOs, which we predict as likely implementations of logic bombs. Concretely, Difuzer identifies SHSO entry-points using an instrumentation engine and an inter-procedural data-flow analysis. Then, it extracts trigger-specific features to characterize SHSOs and leverages One-Class SVM to implement an unsupervised learning model for detecting abnormal triggers. We evaluate our prototype and show that it yields a precision of 99.02% to detect SHSOs among which 29.7% are logic bombs. Difuzer outperforms the state-of-the-art in revealing more logic bombs while yielding less false positives in about one order of magnitude less time. All our artifacts are released to the community. [less ▲] Detailed reference viewed: 56 (8 UL)![]() Samhi, Jordan ![]() ![]() ![]() in 44th International Conference on Software Engineering (ICSE 2022) (2022, May 21) Native code is now commonplace within Android app packages where it co-exists and interacts with Dex bytecode through the Java Native Interface to deliver rich app functionalities. Yet, state-of-the-art ... [more ▼] Native code is now commonplace within Android app packages where it co-exists and interacts with Dex bytecode through the Java Native Interface to deliver rich app functionalities. Yet, state-of-the-art static analysis approaches have mostly overlooked the presence of such native code, which, however, may implement some key sensitive, or even malicious, parts of the app behavior. This limitation of the state of the art is a severe threat to validity in a large range of static analyses that do not have a complete view of the executable code in apps. To address this issue, we propose a new advance in the ambitious research direction of building a unified model of all code in Android apps. The JuCify approach presented in this paper is a significant step towards such a model, where we extract and merge call graphs of native code and bytecode to make the final model readily-usable by a common Android analysis framework: in our implementation, JuCify builds on the Soot internal intermediate representation. We performed empirical investigations to highlight how, without the unified model, a significant amount of Java methods called from the native code are ``unreachable'' in apps' call-graphs, both in goodware and malware. Using JuCify, we were able to enable static analyzers to reveal cases where malware relied on native code to hide invocation of payment library code or of other sensitive code in the Android framework. Additionally, JuCify's model enables state-of-the-art tools to achieve better precision and recall in detecting data leaks through native code. Finally, we show that by using JuCify we can find sensitive data leaks that pass through native code. [less ▲] Detailed reference viewed: 73 (11 UL)![]() Daoudi, Nadia ![]() ![]() ![]() in ACM Transactions on Privacy and Security (2022), 25(2), Machine learning (ML) advances have been extensively explored for implementing large-scale malware detection. When reported in the literature, performance evaluation of ML-based detectors generally ... [more ▼] Machine learning (ML) advances have been extensively explored for implementing large-scale malware detection. When reported in the literature, performance evaluation of ML-based detectors generally focuses on highlighting the ratio of samples that are correctly or incorrectly classified, overlooking essential questions on why/how the learned models can be demonstrated as reliable. In the Android ecosystem, several recent studies have highlighted how evaluation setups can carry biases related to datasets or evaluation methodologies. Nevertheless, there is little work attempting to dissect the produced model to provide some understanding of its intrinsic characteristics. In this work, we fill this gap by performing a comprehensive analysis of a state-of-the-art Android Malware detector, namely DREBIN, which constitutes today a key reference in the literature. Our study mainly targets an in-depth understanding of the classifier characteristics in terms of (1) which features actually matter among the hundreds of thousands that DREBIN extracts, (2) whether the high scores of the classifier are dependent on the dataset age, (3) whether DREBIN's explanations are consistent within malware families, etc. Overall, our tentative analysis provides insights into the discriminatory power of the feature set used by DREBIN to detect malware. We expect our findings to bring about a systematisation of knowledge for the community. [less ▲] Detailed reference viewed: 196 (22 UL)![]() Arslan, Yusuf ![]() ![]() ![]() in In Proceedings of the 14th International Conference on Agents and Artificial Intelligence (ICAART 2022) (2022, February) In industrial contexts, when an ML model classifies a sample as positive, it raises an alarm, which is subsequently sent to human analysts for verification. Reducing the number of false alarms upstream in ... [more ▼] In industrial contexts, when an ML model classifies a sample as positive, it raises an alarm, which is subsequently sent to human analysts for verification. Reducing the number of false alarms upstream in an ML pipeline is paramount to reduce the workload of experts while increasing customers’ trust. Increasingly, SHAP Explanations are leveraged to facilitate manual analysis. Because they have been shown to be useful to human analysts in the detection of false positives, we postulate that SHAP Explanations may provide a means to automate false-positive reduction. To confirm our intuition, we evaluate clustering and rules detection metrics with ground truth labels to understand the utility of SHAP Explanations to discriminate false positives from true positives. We show that SHAP Explanations are indeed relevant in discriminating samples and are a relevant candidate to automate ML tasks and help to detect and reduce false-positive results. [less ▲] Detailed reference viewed: 128 (10 UL)![]() ; ; Kong, Pingfan ![]() in A First Look at Security Risks of Android TV Apps (2021, November 15) In this paper, we present to the community the first preliminary study on the security risks of Android TV apps. To the best of our knowledge, despite the fact that various efforts have been put into ... [more ▼] In this paper, we present to the community the first preliminary study on the security risks of Android TV apps. To the best of our knowledge, despite the fact that various efforts have been put into analyzing Android apps, our community has not explored TV versions of Android apps. There is hence no publicly available dataset containing Android TV apps. To this end, We start by collecting a large set of Android TV apps from the official Google Play store. We then experimentally look at those apps from four security aspects: VirusTotal scans, requested permissions, security flaws, and privacy leaks. Our experimental results reveal that, similar to that of Android smartphone apps, Android TV apps can also come with different security issues. We hence argue that our community should pay more attention to analyze Android TV apps. [less ▲] Detailed reference viewed: 50 (1 UL)![]() Sun, Tiezhu ![]() ![]() ![]() in 2021 36th IEEE/ACM International Conference on Automated Software Engineering Workshops (ASEW) (2021, November 15) Machine learning has been widely employed in the literature of malware detection because it is adapted to the need for scalability in vetting large scale samples of Android. Feature engineering has ... [more ▼] Machine learning has been widely employed in the literature of malware detection because it is adapted to the need for scalability in vetting large scale samples of Android. Feature engineering has therefore been the key focus for research advances. Recently, a new research direction that builds on the momentum of Deep Learning for computer vision has produced promising results with image representations of Android byte- code. In this work, we postulate that other artifacts such as the binary (native) code and metadata/configuration files could be looked at to build more exhaustive representations of Android apps. We show that binary code and metadata files can also provide relevant information for Android malware detection, i.e., that they can allow to detect Malware that are not detected by models built only on bytecode. Furthermore, we investigate the potential benefits of combining all these artifacts into a unique representation with a strong signal for reasoning about maliciousness. [less ▲] Detailed reference viewed: 105 (22 UL)![]() Kong, Pingfan ![]() ![]() in Automated Software Engineering (2021) Android framework-specific app crashes are hard to debug. Indeed, the callback-based event-driven mechanism of Android challenges crash localization techniques that are developed for traditional Java ... [more ▼] Android framework-specific app crashes are hard to debug. Indeed, the callback-based event-driven mechanism of Android challenges crash localization techniques that are developed for traditional Java programs. The key challenge stems from the fact that the buggy code location may not even be listed within the stack trace. For example, our empirical study on 500 framework-specific crashes from an open benchmark has revealed that 37 percent of the crash types are related to bugs that are outside the stack traces. Moreover, Android programs are a mixture of code and extra-code artifacts such as the Manifest file. The fact that any artifact can lead to failures in the app execution creates the need to position the localization target beyond the code realm. In this paper, we propose Anchor, a two-phase suspicious bug location suggestion tool. Anchor specializes in finding crash-inducing bugs outside the stack trace. Anchor is lightweight and source code independent since it only requires the crash message and the apk file to locate the fault. Experimental results, collected via cross-validation and in-the- wild dataset evaluation, show that Anchor is effective in locating Android framework-specific crashing faults. [less ▲] Detailed reference viewed: 36 (6 UL)![]() ; ; et al in IEEE International Conference on Software Maintenance and Evolution (ICSME) (2021, September) Detailed reference viewed: 37 (0 UL)![]() ; ; et al in IEEE International Conference on Software Maintenance and Evolution (ICSME) (2021, September) Detailed reference viewed: 37 (1 UL)![]() ; ; et al in ACM Transactions on Software Engineering and Methodology (2021), 30(3), 1-38 Detailed reference viewed: 31 (0 UL)![]() ; ; Bissyande, Tegawendé François D Assise ![]() in ACM Transactions on Software Engineering and Methodology (2021), 30(3), 1-36 Android developers heavily use reflection in their apps for legitimate reasons. However, reflection is also significantly used for hiding malicious actions. Unfortunately, current state-of-the-art static ... [more ▼] Android developers heavily use reflection in their apps for legitimate reasons. However, reflection is also significantly used for hiding malicious actions. Unfortunately, current state-of-the-art static analysis tools for Android are challenged by the presence of reflective calls which they usually ignore. Thus, the results of their security analysis, e.g., for private data leaks, are incomplete, given the measures taken by malware writers to elude static detection. We propose a new instrumentation-based approach to address this issue in a non-invasive way. Specifically, we introduce to the community a prototype tool called DroidRA, which reduces the resolution of reflective calls to a composite constant propagation problem and then leverages the COAL solver to infer the values of reflection targets. After that, it automatically instruments the app to replace reflective calls with their corresponding Java calls in a traditional paradigm. Our approach augments an app so that it can be more effectively statically analyzable, including by such static analyzers that are not reflection-aware. We evaluate DroidRA on benchmark apps as well as on real-world apps, and demonstrate that it can indeed infer the target values of reflective calls and subsequently allow state-of-the-art tools to provide more sound and complete analysis results. [less ▲] Detailed reference viewed: 34 (2 UL)![]() Lothritz, Cedric ![]() ![]() ![]() in 26th International Conference on Applications of Natural Language to Information Systems (2021, June 25) With the momentum of conversational AI for enhancing client-to-business interactions, chatbots are sought in various domains, including FinTech where they can automatically handle requests for opening ... [more ▼] With the momentum of conversational AI for enhancing client-to-business interactions, chatbots are sought in various domains, including FinTech where they can automatically handle requests for opening/closing bank accounts or issuing/terminating credit cards. Since they are expected to replace emails and phone calls, chatbots must be capable to deal with diversities of client populations. In this work, we focus on the variety of languages, in particular in multilingual countries. Specifically, we investigate the strategies for training deep learning models of chatbots with multilingual data. We perform experiments for the specific tasks of Intent Classification and Slot Filling in financial domain chatbots and assess the performance of mBERT multilingual model vs multiple monolingual models. [less ▲] Detailed reference viewed: 107 (13 UL)![]() Samhi, Jordan ![]() ![]() ![]() in 43rd International Conference on Software Engineering (ICSE) (2021, May) Inter-Component Communication (ICC) is a key mechanism in Android. It enables developers to compose rich functionalities and explore reuse within and across apps. Unfortunately, as reported by a large ... [more ▼] Inter-Component Communication (ICC) is a key mechanism in Android. It enables developers to compose rich functionalities and explore reuse within and across apps. Unfortunately, as reported by a large body of literature, ICC is rather "complex and largely unconstrained", leaving room to a lack of precision in apps modeling. To address the challenge of tracking ICCs within apps, state of the art static approaches such as Epicc, IccTA and Amandroid have focused on the documented framework ICC methods (e.g., startActivity) to build their approaches. In this work we show that ICC models inferred in these state of the art tools may actually be incomplete: the framework provides other atypical ways of performing ICCs. To address this limitation in the state of the art, we propose RAICC a static approach for modeling new ICC links and thus boosting previous analysis tasks such as ICC vulnerability detection, privacy leaks detection, malware detection, etc. We have evaluated RAICC on 20 benchmark apps, demonstrating that it improves the precision and recall of uncovered leaks in state of the art tools. We have also performed a large empirical investigation showing that Atypical ICC methods are largely used in Android apps, although not necessarily for data transfer. We also show that RAICC increases the number of ICC links found by 61.6% on a dataset of real-world malicious apps, and that RAICC enables the detection of new ICC vulnerabilities. [less ▲] Detailed reference viewed: 112 (28 UL)![]() ![]() Samhi, Jordan ![]() ![]() ![]() Article for general public (2021) Detailed reference viewed: 81 (9 UL)![]() Arslan, Yusuf ![]() ![]() ![]() in Companion Proceedings of the Web Conference 2021 (WWW '21 Companion), April 19--23, 2021, Ljubljana, Slovenia (2021, April 19) Detailed reference viewed: 118 (20 UL)![]() Riom, Timothée ![]() ![]() ![]() in Empirical Software Engineering (2021), 26 Detecting vulnerabilities in software is a constant race between development teams and potential attackers. While many static and dynamic approaches have focused on regularly analyzing the software in its ... [more ▼] Detecting vulnerabilities in software is a constant race between development teams and potential attackers. While many static and dynamic approaches have focused on regularly analyzing the software in its entirety, a recent research direction has focused on the analysis of changes that are applied to the code. VCCFinder is a seminal approach in the literature that builds on machine learning to automatically detect whether an incoming commit will introduce some vulnerabilities. Given the influence of VCCFinder in the literature, we undertake an investigation into its performance as a state-of-the-art system. To that end, we propose to attempt a replication study on the VCCFinder supervised learning approach. The insights of our failure to replicate the results reported in the original publication informed the design of a new approach to identify vulnerability-contributing commits based on a semi-supervised learning technique with an alternate feature set. We provide all artefacts and a clear description of this approach as a new reproducible baseline for advancing research on machine learning-based identification of vulnerability-introducing commits [less ▲] Detailed reference viewed: 89 (10 UL)![]() ; ; Liu, Kui ![]() in 28th IEEE International Conference on Software Analysis, Evolution and Reengineering, Hawaii 9-12 March 2021 (2021, March 10) The literature of Automated Program Repair is largely dominated by approaches that leverage test suites not only to expose bugs but also to validate the generated patches. Unfortunately, beyond the widely ... [more ▼] The literature of Automated Program Repair is largely dominated by approaches that leverage test suites not only to expose bugs but also to validate the generated patches. Unfortunately, beyond the widely-discussed concern that test suites are an imperfect oracle because they can be incomplete, they can include tests that are flaky. A flaky test is one that can be passed or failed by a program in a non-deterministic way. Such tests are generally carefully removed from the repair benchmarks. In practice, however, flaky tests are available test suite of software repositories. To the best of our knowledge, no study has discussed this threat to validity for evaluation of program repair. In this work, we highlight this threat and further investigate the impact of flaky tests by reverting their removal from the Defects4J benchmark. Our study aims to characterize the impact of flaky tests for localizing bugs and the eventual influence on the repair performance. Among other insights, we find that (1) although flaky tests are few (≈0.3%) of total tests, they affect experiments related to a large proportion (98.9%) of Defects4J real-world faults; (2) most flaky tests (98%) actually provide deterministic results under specific environment configurations (with the jdk version influencing the results); (3) flaky tests drastically hinder the effectiveness of spectrum-based fault localization (e.g., the rankings of 90 bugs drop down while none of the bugs obtains better location results compared with results achieved without flaky tests); and (4) the repairability of APR tools is greatly affected by the presence of flaky tests (e.g., 10 state of the art APR tools can now fix significantly fewer bugs than when the benchmark is manually curated to remove flaky tests). Given that the detection of flaky tests is still nascent, we call for the program repair community to relax the artificial assumption that the test suite is free from flaky tests. One direction that we propose is to consider developing strategies where patches that partially-fix bugs are considered worthwhile: a patch may make the program pass some test cases but fail some (which may actually be the flaky ones). [less ▲] Detailed reference viewed: 22 (0 UL)![]() ; ; Koyuncu, Anil ![]() in Journal of Systems and Software (2021) Detailed reference viewed: 127 (5 UL) |
||