Empirical assessment of machine learning-based malware detectors for Android: Measuring the Gap between In-the-Lab and In-the-Wild Validation Scenarios

ALLIX, Kevin; BISSYANDE, Tegawendé François D Assise; JEROME, Quentin; KLEIN, Jacques; STATE, Radu; LE TRAON, Yves

doi:10.1007/s10664-014-9352-6

Article (Scientific journals)

Empirical assessment of machine learning-based malware detectors for Android: Measuring the Gap between In-the-Lab and In-the-Wild Validation Scenarios

ALLIX, Kevin; BISSYANDE, Tegawendé François D Assise; JEROME, Quentin et al.

2014 • In Empirical Software Engineering, p. 1-29

Peer Reviewed verified by ORBi

Permalink
https://hdl.handle.net/10993/20068

DOI
10.1007/s10664-014-9352-6

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

emse-in_the_lab-vs-in_the_wild.pdf

Author preprint (1.17 MB)

Download

All documents in ORBilu are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

Machine learning; Ten-Fold; Malware; Android

Abstract :

[en] To address the issue of malware detection through large sets of applications, researchers have recently started to investigate the capabilities of machine-learning techniques for proposing effective approaches. So far, several promising results were recorded in the literature, many approaches being assessed with what we call in the lab validation scenarios. This paper revisits the purpose of malware detection to discuss whether such in the lab validation scenarios provide reliable indications on the performance of malware detectors in real-world settings, aka in the wild. To this end, we have devised several Machine Learning classifiers that rely on a set of features built from applications’ CFGs. We use a sizeable dataset of over 50 000 Android applications collected from sources where state-of-the art approaches have selected their data. We show that, in the lab, our approach outperforms existing machine learning-based approaches. However, this high performance does not translate in high performance in the wild. The performance gap we observed—F-measures dropping from over 0.9 in the lab to below 0.1 in the wild —raises one important question: How do state-of-the-art approaches perform in the wild ?

Research center :

ULHPC - University of Luxembourg: High Performance Computing

Disciplines :

Computer science

Author, co-author :

ALLIX, Kevin ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > Computer Science and Communications Research Unit (CSC)

BISSYANDE, Tegawendé François D Assise ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)

JEROME, Quentin ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)

KLEIN, Jacques ; University of Luxembourg > Faculty of Science, Technology and Communication (FSTC) > Computer Science and Communications Research Unit (CSC)

STATE, Radu ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)

LE TRAON, Yves ; University of Luxembourg > Faculty of Science, Technology and Communication (FSTC) > Computer Science and Communications Research Unit (CSC)

External co-authors :

Language :

English

Title :

Empirical assessment of machine learning-based malware detectors for Android: Measuring the Gap between In-the-Lab and In-the-Wild Validation Scenarios

Publication date :

24 December 2014

Journal title :

Empirical Software Engineering

ISSN :

1382-3256

eISSN :

1573-7616

Publisher :

Springer US

Pages :

1-29

Peer reviewed :

Peer Reviewed verified by ORBi

Additional URL :

http://dx.doi.org/10.1007/s10664-014-9352-6

Available on ORBilu :

since 17 February 2015

Statistics

Number of views

623 (55 by Unilu)

Number of downloads

2984 (55 by Unilu)

More statistics

Scopus citations^®

107

Scopus citations^®
without self-citations

OpenCitations

OpenAlex citations

130

WoS citations^™