Large-scale Machine Learning-based Malware Detection: Confronting the "10-fold Cross Validation" Scheme with Reality

ALLIX, Kevin; BISSYANDE, Tegawendé François D Assise; JEROME, Quentin; KLEIN, Jacques; STATE, Radu; LE TRAON, Yves

doi:10.1145/2557547.2557587

Download

Paper published in a book (Scientific congresses, symposiums and conference proceedings)

Large-scale Machine Learning-based Malware Detection: Confronting the "10-fold Cross Validation" Scheme with Reality

ALLIX, Kevin; BISSYANDE, Tegawendé François D Assise; JEROME, Quentin et al.

2014 • In Proceedings of the 4th ACM Conference on Data and Application Security and Privacy

Peer reviewed

Permalink
https://hdl.handle.net/10993/18024

DOI
10.1145/2557547.2557587

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

p163-allix.pdf

Publisher postprint (578.88 kB)

Download

All documents in ORBilu are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

android; machine learning; malware; ten-fold

Abstract :

[en] To address the issue of malware detection, researchers have recently started to investigate the capabilities of machine- learning techniques for proposing effective approaches. Sev- eral promising results were recorded in the literature, many approaches being assessed with the common “10-Fold cross validation” scheme. This paper revisits the purpose of mal- ware detection to discuss the adequacy of the “10-Fold” scheme for validating techniques that may not perform well in real- ity. To this end, we have devised several Machine Learning classifiers that rely on a novel set of features built from ap- plications’ CFGs. We use a sizeable dataset of over 50,000 Android applications collected from sources where state-of- the art approaches have selected their data. We show that our approach outperforms existing machine learning-based approaches. However, this high performance on usual-size datasets does not translate in high performance in the wild.

Disciplines :

Computer science

Author, co-author :

ALLIX, Kevin ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) ; University of Luxembourg > Faculty of Science, Technology and Communication (FSTC) > Computer Science and Communications Research Unit (CSC)

BISSYANDE, Tegawendé François D Assise ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)

JEROME, Quentin ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)

KLEIN, Jacques ; University of Luxembourg > Faculty of Science, Technology and Communication (FSTC) > Computer Science and Communications Research Unit (CSC)

STATE, Radu ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)

LE TRAON, Yves ; University of Luxembourg > Faculty of Science, Technology and Communication (FSTC) > Computer Science and Communications Research Unit (CSC)

Language :

English

Title :

Large-scale Machine Learning-based Malware Detection: Confronting the "10-fold Cross Validation" Scheme with Reality

Publication date :

March 2014

Event name :

4th ACM Conference on Data and Application Security and Privacy

Event place :

San Antonio, Texas, United States

Event date :

from 03-03-2014 to 05-03-2014

Main work title :

Proceedings of the 4th ACM Conference on Data and Application Security and Privacy

Publisher :

ACM, New York, NY, USA, Unknown/unspecified

ISBN/EAN :

978-1-4503-2278-2

Collection name :

CODASPY '14

Pages :

163--166

Peer reviewed :

Peer reviewed

Additional URL :

http://doi.acm.org/10.1145/2557547.2557587

Available on ORBilu :

since 21 September 2014

Statistics

Number of views

480 (34 by Unilu)

Number of downloads

2239 (23 by Unilu)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

OpenCitations

OpenAlex citations