[en] Machine Learning-based malware detection is a promis-
ing scalable method for identifying suspicious applica-
tions. In particular, in today’s mobile computing realm
where thousands of applications are daily poured into
markets, such a technique could be valuable to guaran-
tee a strong filtering of malicious apps. The success
of machine-learning approaches however is highly de-
pendent on (1) the quality of the datasets that are used
for training and of (2) the appropriateness of the tested
datasets with regards to the built classifiers. Unfortu-
nately, there is scarce mention of these aspects in the
evaluation of existing state-of-the-art approaches in the
literature.
In this paper, we consider the relevance of history in
the construction of datasets, to highlight its impact on
the performance of the malware detection scheme. Typ-
ically, we show that simply picking a random set of
known malware to train a malware detector, as it is done
in most assessment scenarios from the literature, yields
significantly biased results. In the process of assessing
the extent of this impact through various experiments, we
were also able to confirm a number of intuitive assump-
tions about Android malware. For instance, we discuss
the existence of Android malware lineages and how they
could impact the performance of malware detection in
the wild.
Research center :
ULHPC - University of Luxembourg: High Performance Computing
Disciplines :
Computer science
Author, co-author :
Allix, Kevin ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) ; University of Luxembourg > Faculty of Science, Technology and Communication (FSTC) > Computer Science and Communications Research Unit (CSC)
Klein, Jacques ; University of Luxembourg > Faculty of Science, Technology and Communication (FSTC) > Computer Science and Communications Research Unit (CSC)
Le Traon, Yves ; University of Luxembourg > Faculty of Science, Technology and Communication (FSTC) > Computer Science and Communications Research Unit (CSC)
Language :
English
Title :
Machine Learning-Based Malware Detection for Android Applications: History Matters!
Publication date :
26 May 2014
Publisher :
University of Luxembourg, SnT, Luxembourg, Luxembourg