REVISITING AND BOOSTING STATE-OF-THE-ART ML-BASED ANDROID MALWARE DETECTORS

DAOUDI, Nadia

No full text

Doctoral thesis (Dissertations and theses)

REVISITING AND BOOSTING STATE-OF-THE-ART ML-BASED ANDROID MALWARE DETECTORS

DAOUDI, Nadia

2023

Permalink
https://hdl.handle.net/10993/54218

Files (0)Send to Details Statistics Bibliography Similar publications

Files

Full Text

No document available.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

Android Security; Android; Malware; Deep Learning; Machine Learning; DREBIN; Replicability; Reproducibility; Ensemble Learning; Difficult Samples; Retraining

Abstract :

[en] Android offers plenty of services to mobile users and has gained significant popularity worldwide. The success of Android has resulted in attracting more mobile users but also malware authors. Indeed, attackers target Android markets to spread their malicious apps and infect users’ devices. The consequences vary from displaying annoying ads to gaining financial benefits from users. To overcome the threat posed by Android malware, Machine Learning has been leveraged as a promising technique to automatically detect malware. The literature on Android malware detection lavishes with a huge variety of ML-based approaches that are designed to discriminate malware from legitimate samples. These techniques generally rely on manually engineered features that are extracted from the apps’ artefacts. Reported to be highly effective, Android malware detection approaches seem to be the magical solution to stop the proliferation of malware. Unfortunately, the gap between the promised and the actual detection performance is far from negligible. Despite the rosy excellent detection performance painted in the literature, the detection reports show that Android malware is still spreading and infecting mobile users. In this thesis, we investigate the reasons that impede state-of-the-art Android malware detection approaches to surround the spread of Android malware and propose solutions and directions to boost their detection performance. In the first part of this thesis, we focus on revisiting the state of the art in Android malware detection. Specifically, we conduct a comprehensive study to assess the reproducibility of state-of-the-art Android malware detectors. We consider research papers published at 16 major venues over a period of ten years and report our reproduction outcome. We also discuss the different obstacles to reproducibility and how they can be overcome. Then, we perform an exploratory analysis on a state-of-the-art malware detector, DREBIN, to gain an in-depth understanding of its inner working. Our study provides insights into the quality of DREBIN’s features and their effectiveness in discriminating Android malware. In the second part of this thesis, we investigate novel features for Android malware detection that do not involve manual engineering. Specifically, we propose an Android malware detection approach, DexRay, that relies on features extracted automatically from the apps. We convert the raw bytecode of the app DEX files into an image and train a 1-dimensional convolutional neural network to automatically learn the relevant features. Our approach stands out for the simplicity of its design choices and its high detection performance, which make it a foundational framework for further developing this domain. In the third part, we attempt to push the frontier of Android malware detection via enhancing the detection performance of the state of the art. We show through a large-scale evaluation of four state-of-the-art malware detectors that their detection performance is highly dependent on the experimental dataset. To solve this issue, we investigate the added value of combining their features and predictions using 22 combination methods. While it does not improve the detection performance reported by individual approaches, the combination of features and predictions maintains the highest detection performance independently of the dataset. We further propose a novel technique, Guided Retraining, that boosts the detection performance of state-of-the-art Android malware detectors. Guided Retraining uses contrastive learning to learn a better representation of the difficult samples to improve their prediction.

Research center :

- Interdisciplinary Centre for Security, Reliability and Trust (SnT) > TruX - Trustworthy Software Engineering

Disciplines :

Computer science

Author, co-author :

DAOUDI, Nadia ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > TruX

Language :

English

Title :

REVISITING AND BOOSTING STATE-OF-THE-ART ML-BASED ANDROID MALWARE DETECTORS

Defense date :

24 January 2023

Institution :

Unilu - University of Luxembourg, Luxembourg

Degree :

Docteur en Informatique

Promotor :

KLEIN, Jacques

President :

BISSYANDE, Tegawendé François D Assise

Jury member :

Allix, Kévin

Cavallaro, Lorenzo

Bugiel, Sven

Focus Area :

Security, Reliability and Trust

FnR Project :

FNR11693861 - Characterization Of Malicious Code In Mobile Apps: Towards Accurate And Explainable Malware Detection, 2017 (01/06/2018-31/12/2021) - Jacques Klein

Name of the research project :

HitDroid

Available on ORBilu :

since 27 January 2023

Statistics

Number of views

343 (33 by Unilu)

Number of downloads

0 (0 by Unilu)

More statistics