![]() Daoudi, Nadia ![]() Doctoral thesis (2023) Android offers plenty of services to mobile users and has gained significant popularity worldwide. The success of Android has resulted in attracting more mobile users but also malware authors. Indeed ... [more ▼] Android offers plenty of services to mobile users and has gained significant popularity worldwide. The success of Android has resulted in attracting more mobile users but also malware authors. Indeed, attackers target Android markets to spread their malicious apps and infect users’ devices. The consequences vary from displaying annoying ads to gaining financial benefits from users. To overcome the threat posed by Android malware, Machine Learning has been leveraged as a promising technique to automatically detect malware. The literature on Android malware detection lavishes with a huge variety of ML-based approaches that are designed to discriminate malware from legitimate samples. These techniques generally rely on manually engineered features that are extracted from the apps’ artefacts. Reported to be highly effective, Android malware detection approaches seem to be the magical solution to stop the proliferation of malware. Unfortunately, the gap between the promised and the actual detection performance is far from negligible. Despite the rosy excellent detection performance painted in the literature, the detection reports show that Android malware is still spreading and infecting mobile users. In this thesis, we investigate the reasons that impede state-of-the-art Android malware detection approaches to surround the spread of Android malware and propose solutions and directions to boost their detection performance. In the first part of this thesis, we focus on revisiting the state of the art in Android malware detection. Specifically, we conduct a comprehensive study to assess the reproducibility of state-of-the-art Android malware detectors. We consider research papers published at 16 major venues over a period of ten years and report our reproduction outcome. We also discuss the different obstacles to reproducibility and how they can be overcome. Then, we perform an exploratory analysis on a state-of-the-art malware detector, DREBIN, to gain an in-depth understanding of its inner working. Our study provides insights into the quality of DREBIN’s features and their effectiveness in discriminating Android malware. In the second part of this thesis, we investigate novel features for Android malware detection that do not involve manual engineering. Specifically, we propose an Android malware detection approach, DexRay, that relies on features extracted automatically from the apps. We convert the raw bytecode of the app DEX files into an image and train a 1-dimensional convolutional neural network to automatically learn the relevant features. Our approach stands out for the simplicity of its design choices and its high detection performance, which make it a foundational framework for further developing this domain. In the third part, we attempt to push the frontier of Android malware detection via enhancing the detection performance of the state of the art. We show through a large-scale evaluation of four state-of-the-art malware detectors that their detection performance is highly dependent on the experimental dataset. To solve this issue, we investigate the added value of combining their features and predictions using 22 combination methods. While it does not improve the detection performance reported by individual approaches, the combination of features and predictions maintains the highest detection performance independently of the dataset. We further propose a novel technique, Guided Retraining, that boosts the detection performance of state-of-the-art Android malware detectors. Guided Retraining uses contrastive learning to learn a better representation of the difficult samples to improve their prediction. [less ▲] Detailed reference viewed: 109 (22 UL)![]() Daoudi, Nadia ![]() ![]() in Empirical Software Engineering (2022), 28 Research on Android malware detection based on Machine learning has been prolific in recent years. In this paper, we show, through a large-scale evaluation of four state-of-the-art approaches that their ... [more ▼] Research on Android malware detection based on Machine learning has been prolific in recent years. In this paper, we show, through a large-scale evaluation of four state-of-the-art approaches that their achieved performance fluctuates when applied to different datasets. Combining existing approaches appears as an appealing method to stabilise performance. We therefore proceed to empirically investigate the effect of such combinations on the overall detection performance. In our study, we evaluated 22 methods to combine feature sets or predictions from the state-of-the-art approaches. Our results showed that no method has significantly enhanced the detection performance reported by the state-of-the-art malware detectors. Nevertheless, the performance achieved is on par with the best individual classifiers for all settings. Overall, we conduct extensive experiments on the opportunity to combine state-of-the-art detectors. Our main conclusion is that combining state-of-theart malware detectors leads to a stabilisation of the detection performance, and a research agenda on how they should be combined effectively is required to boost malware detection. All artefacts of our large-scale study (i.e., the dataset of ∼0.5 million apks and all extracted features) are made available for replicability. [less ▲] Detailed reference viewed: 29 (4 UL)![]() Samhi, Jordan ![]() ![]() ![]() in 44th International Conference on Software Engineering (ICSE 2022) (2022, May 21) Native code is now commonplace within Android app packages where it co-exists and interacts with Dex bytecode through the Java Native Interface to deliver rich app functionalities. Yet, state-of-the-art ... [more ▼] Native code is now commonplace within Android app packages where it co-exists and interacts with Dex bytecode through the Java Native Interface to deliver rich app functionalities. Yet, state-of-the-art static analysis approaches have mostly overlooked the presence of such native code, which, however, may implement some key sensitive, or even malicious, parts of the app behavior. This limitation of the state of the art is a severe threat to validity in a large range of static analyses that do not have a complete view of the executable code in apps. To address this issue, we propose a new advance in the ambitious research direction of building a unified model of all code in Android apps. The JuCify approach presented in this paper is a significant step towards such a model, where we extract and merge call graphs of native code and bytecode to make the final model readily-usable by a common Android analysis framework: in our implementation, JuCify builds on the Soot internal intermediate representation. We performed empirical investigations to highlight how, without the unified model, a significant amount of Java methods called from the native code are ``unreachable'' in apps' call-graphs, both in goodware and malware. Using JuCify, we were able to enable static analyzers to reveal cases where malware relied on native code to hide invocation of payment library code or of other sensitive code in the Android framework. Additionally, JuCify's model enables state-of-the-art tools to achieve better precision and recall in detecting data leaks through native code. Finally, we show that by using JuCify we can find sensitive data leaks that pass through native code. [less ▲] Detailed reference viewed: 107 (17 UL)![]() Daoudi, Nadia ![]() ![]() ![]() in ACM Transactions on Privacy and Security (2022), 25(2), Machine learning (ML) advances have been extensively explored for implementing large-scale malware detection. When reported in the literature, performance evaluation of ML-based detectors generally ... [more ▼] Machine learning (ML) advances have been extensively explored for implementing large-scale malware detection. When reported in the literature, performance evaluation of ML-based detectors generally focuses on highlighting the ratio of samples that are correctly or incorrectly classified, overlooking essential questions on why/how the learned models can be demonstrated as reliable. In the Android ecosystem, several recent studies have highlighted how evaluation setups can carry biases related to datasets or evaluation methodologies. Nevertheless, there is little work attempting to dissect the produced model to provide some understanding of its intrinsic characteristics. In this work, we fill this gap by performing a comprehensive analysis of a state-of-the-art Android Malware detector, namely DREBIN, which constitutes today a key reference in the literature. Our study mainly targets an in-depth understanding of the classifier characteristics in terms of (1) which features actually matter among the hundreds of thousands that DREBIN extracts, (2) whether the high scores of the classifier are dependent on the dataset age, (3) whether DREBIN's explanations are consistent within malware families, etc. Overall, our tentative analysis provides insights into the discriminatory power of the feature set used by DREBIN to detect malware. We expect our findings to bring about a systematisation of knowledge for the community. [less ▲] Detailed reference viewed: 224 (24 UL)![]() Sun, Tiezhu ![]() ![]() ![]() in 2021 36th IEEE/ACM International Conference on Automated Software Engineering Workshops (ASEW) (2021, November 15) Machine learning has been widely employed in the literature of malware detection because it is adapted to the need for scalability in vetting large scale samples of Android. Feature engineering has ... [more ▼] Machine learning has been widely employed in the literature of malware detection because it is adapted to the need for scalability in vetting large scale samples of Android. Feature engineering has therefore been the key focus for research advances. Recently, a new research direction that builds on the momentum of Deep Learning for computer vision has produced promising results with image representations of Android byte- code. In this work, we postulate that other artifacts such as the binary (native) code and metadata/configuration files could be looked at to build more exhaustive representations of Android apps. We show that binary code and metadata files can also provide relevant information for Android malware detection, i.e., that they can allow to detect Malware that are not detected by models built only on bytecode. Furthermore, we investigate the potential benefits of combining all these artifacts into a unique representation with a strong signal for reasoning about maliciousness. [less ▲] Detailed reference viewed: 131 (29 UL)![]() Daoudi, Nadia ![]() ![]() ![]() in Communications in Computer and Information Science (2021) Computer vision has witnessed several advances in recent years, with unprecedented performance provided by deep representation learning research. Image formats thus appear attractive to other fields such ... [more ▼] Computer vision has witnessed several advances in recent years, with unprecedented performance provided by deep representation learning research. Image formats thus appear attractive to other fields such as malware detection, where deep learning on images alleviates the need for comprehensively hand-crafted features generalising to different malware variants. We postulate that this research direction could become the next frontier in Android malware detection, and therefore requires a clear roadmap to ensure that new approaches indeed bring novel contributions. We contribute with a first building block by developing and assessing a baseline pipeline for image-based malware detection with straightforward steps. We propose DexRay, which converts the bytecode of the app DEX files into grey-scale “vector” images and feeds them to a 1-dimensional Convolutional Neural Network model. We view DexRay as foundational due to the exceedingly basic nature of the design choices, allowing to infer what could be a minimal performance that can be obtained with image-based learning in malware detection. The performance of DexRay evaluated on over 158k apps demonstrates that, while simple, our approach is effective with a high detection rate(F1-score= 0.96). Finally, we investigate the impact of time decay and image-resizing on the performance of DexRay and assess its resilience to obfuscation. This work-in-progress paper contributes to the domain of Deep Learning based Malware detection by providing a sound, simple, yet effective approach (with available artefacts) that can be the basis to scope the many profound questions that will need to be investigated to fully develop this domain. [less ▲] Detailed reference viewed: 136 (24 UL)![]() Daoudi, Nadia ![]() ![]() ![]() in Empirical Software Engineering (2021), 26 A well-known curse of computer security research is that it often produces systems that, while technically sound, fail operationally. To overcome this curse, the community generally seeks to assess ... [more ▼] A well-known curse of computer security research is that it often produces systems that, while technically sound, fail operationally. To overcome this curse, the community generally seeks to assess proposed systems under a variety of settings in order to make explicit every potential bias. In this respect, recently, research achievements on machine learning based malware detection are being considered for thorough evaluation by the community. Such an effort of comprehensive evaluation supposes first and foremost the possibility to perform an independent reproduction study in order to sharpen evaluations presented by approaches’ authors. The question Can published approaches actually be reproduced? thus becomes paramount despite the little interest such mundane and practical aspects seem to attract in the malware detection field. In this paper, we attempt a complete reproduction of five Android Malware Detectors from the literature and discuss to what extent they are “reproducible”. Notably, we provide insights on the implications around the guesswork that may be required to finalise a working implementation. Finally, we discuss how barriers to reproduction could be lifted, and how the malware detection field would benefit from stronger reproducibility standards—like many various fields already have. [less ▲] Detailed reference viewed: 305 (30 UL) |
||