Creating better ground truth to further understand Android malware: A large scale mining approach based on antivirus labels and malicious artifacts

HURIER, Médéric

Doctoral thesis (Dissertations and theses)

HURIER, Médéric

2019

Permalink
https://hdl.handle.net/10993/39903

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

thesis.pdf

Author postprint (2.81 MB)

Download

All documents in ORBilu are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

android; malware; ground-truth

Abstract :

[en] Mobile applications are essential for interacting with technology and other people. With more than 2 billion devices deployed all over the world, Android offers a thriving ecosystem by making accessible the work of thousands of developers on digital marketplaces such as Google Play. Nevertheless, the success of Android also exposes millions of users to malware authors who seek to siphon private information and hijack mobile devices for their benefits. To fight against the proliferation of Android malware, the security community embraced machine learning, a branch of artificial intelligence that powers a new generation of detection systems. Machine learning algorithms, however, require a substantial number of qualified samples to learn the classification rules enforced by security experts. Unfortunately, malware ground truths are notoriously hard to construct due to the inherent complexity of Android applications and the global lack of public information about malware. In a context where both information and human resources are limited, the security community is in demand for new approaches to aid practitioners to accurately define Android malware, automate classification decisions, and improve the comprehension of Android malware. This dissertation proposes three solutions to assist with the creation of malware ground truths. The first contribution is STASE, an analytical framework that qualifies the composition of malware ground truths. STASE reviews the information shared by antivirus products with nine metrics in order to support the reproducibility of research experiments and detect potential biases. This dissertation reports the results of STASE against three typical settings and suggests additional recommendations for designing experiments based on Android malware. The second contribution is EUPHONY, a heuristic system built to unify family clusters belonging to malware ground truths. EUPHONY exploits the co-occurrence of malware labels obtained from antivirus reports to study the relationship between Android applications and proposes a single family name per sample for the sake of facilitating malware experiments. This dissertation evaluates EUPHONY on well-known malware ground truths to assess the precision of our approach and produce a large dataset of malware tags for the research community. The third contribution is AP-GRAPH, a knowledge database for dissecting the characteristics of malware ground truths. AP-GRAPH leverages the results of EUPHONY and static analysis to index artifacts that are highly correlated with malware activities and recommend the inspection of the most suspicious components. This dissertation explores the set of artifacts retrieved by AP-GRAPH from popular malware families to track down their correlation and their evolution compared to other malware populations.

Disciplines :

Computer science

Author, co-author :

HURIER, Médéric ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)

Language :

English

Title :

Creating better ground truth to further understand Android malware: A large scale mining approach based on antivirus labels and malicious artifacts

Defense date :

01 July 2019

Number of pages :

160

Institution :

Unilu - University of Luxembourg, Luxembourg

Degree :

Docteur en Informatique

Promotor :

LE TRAON, Yves

President :

KLEIN, Jacques

Jury member :

BISSYANDE, Tegawendé François D Assise

Lalande, Jean-François

Octeau, Damien

Focus Area :

Security, Reliability and Trust

Available on ORBilu :

since 15 July 2019

Statistics

Number of views

570 (38 by Unilu)

Number of downloads

1583 (40 by Unilu)

More statistics