References of "Allix, Kevin 50000167"
     in
Bookmark and Share    
Full Text
Peer Reviewed
See detailAndroZoo: Collecting Millions of Android Apps for the Research Community
Allix, Kevin UL; Bissyande, Tegawendé François D Assise UL; Klein, Jacques UL et al

in Proceedings of the 13th International Workshop on Mining Software Repositories (2016, May)

We present a growing collection of Android Applications collected from several sources, including the official Google Play app market. Our dataset, AndroZoo, currently contains more than three million ... [more ▼]

We present a growing collection of Android Applications collected from several sources, including the official Google Play app market. Our dataset, AndroZoo, currently contains more than three million apps, each of which has been analysed by tens of different AntiVirus products to know which applications are detected as Malware. We provide this dataset to contribute to ongoing research efforts, as well as to enable new potential research topics on Android Apps. By releasing our dataset to the research community, we also aim at encouraging our fellow researchers to engage in reproducible experiments. [less ▲]

Detailed reference viewed: 1102 (32 UL)
Full Text
Peer Reviewed
See detailOn the Lack of Consensus in Anti-Virus Decisions: Metrics and Insights on Building Ground Truths of Android Malware
Hurier, Médéric UL; Allix, Kevin UL; Bissyande, Tegawendé François D Assise UL et al

in Detection of Intrusions and Malware, and Vulnerability Assessment - 13th International Conference (2016)

There is generally a lack of consensus in Antivirus (AV) engines' decisions on a given sample. This challenges the building of authoritative ground-truth datasets. Instead, researchers and practitioners ... [more ▼]

There is generally a lack of consensus in Antivirus (AV) engines' decisions on a given sample. This challenges the building of authoritative ground-truth datasets. Instead, researchers and practitioners may rely on unvalidated approaches to build their ground truth, e.g., by considering decisions from a selected set of Antivirus vendors or by setting up a threshold number of positive detections before classifying a sample. Both approaches are biased as they implicitly either decide on ranking AV products, or they consider that all AV decisions have equal weights. In this paper, we extensively investigate the lack of agreement among AV engines. To that end, we propose a set of metrics that quantitatively describe the different dimensions of this lack of consensus. We show how our metrics can bring important insights by using the detection results of 66 AV products on 2 million Android apps as a case study. Our analysis focuses not only on AV binary decision but also on the notoriously hard problem of labels that AVs associate with suspicious files, and allows to highlight biases hidden in the collection of a malware ground truth---a foundation stone of any machine learning-based malware detection approach. [less ▲]

Detailed reference viewed: 365 (25 UL)
Full Text
See detailChallenges and Outlook in Machine Learning-based Malware Detection for Android
Allix, Kevin UL

Doctoral thesis (2015)

Just like in traditional desktop computing, one of the major security issues in mobile computing lies in malicious software. Several recent studies have shown that Android, as today’s most widespread ... [more ▼]

Just like in traditional desktop computing, one of the major security issues in mobile computing lies in malicious software. Several recent studies have shown that Android, as today’s most widespread Operating System, is the target of most of the new families of malware. Manually analysing an Android application to determine whether it is malicious or not is a time- consuming process. Furthermore, because of the complexity of analysing an application, this task can only be conducted by highly-skilled—hence hard to come by—professionals. Researchers naturally sought to transfer this process from humans to computers to lower the cost of detecting malware. Machine-Learning techniques, looking at patterns amongst known malware and inferring models of what discriminates malware from goodware, have long been summoned to build malware detectors. The vast quantity of data involved in malware detection, added to the fact that we do not know a priori how to express in technical terms the difference between malware and goodware, indeed makes the malware detection question a seemingly textbook example of a possible Machine- Learning application. Despite the vast amount of literature published on the topic of detecting malware with machine- learning, malware detection is not a solved problem. In this Thesis, we investigate issues that affect performance evaluation and that thus may render current machine learning-based mal- ware detectors for Android hardly usable in practical settings, and we propose an approach to overcome those issues. While the experiments presented in this thesis all rely on feature-sets obtained through lightweight static analysis, several of our findings could apply equally to all Machine Learning-based malware detection approaches. In the first part of this thesis, background information on machine-learning and on malware detection is provided, and the related work is described. A snapshot of the malware landscape in Android application markets is then presented. The second part discusses three pitfalls hindering the evaluation of malware detectors. We show with extensive experiments how validation methodology, History-unaware dataset construction and the choice of a ground truth can heavily interfere with the performance results of malware detectors. In a third part, we present an practical approach to detect Android Malware in real-world settings. We then propose several research paths to get closer to our long term goal of building practical, dependable and predictable Android Malware detectors. [less ▲]

Detailed reference viewed: 274 (30 UL)
Full Text
Peer Reviewed
See detailPotential Component Leaks in Android Apps: An Investigation into a new Feature Set for Malware Detection
Li, Li UL; Allix, Kevin UL; Li, Daoyuan UL et al

in The 2015 IEEE International Conference on Software Quality, Reliability and Security (QRS 2015) (2015, August)

Detailed reference viewed: 508 (259 UL)
Full Text
See detailA Study of Potential Component Leaks in Android Apps
Li, Li UL; Allix, Kevin UL; Li, Daoyuan UL et al

Report (2015)

We discuss the capability of a new feature set for malware detection based on potential component leaks (PCLs). PCLs are defined as sensitive data-flows that involve Android inter-component communications ... [more ▼]

We discuss the capability of a new feature set for malware detection based on potential component leaks (PCLs). PCLs are defined as sensitive data-flows that involve Android inter-component communications. We show that PCLs are common in Android apps and that malicious applications indeed manipulate significantly more PCLs than benign apps. Then, we evaluate a machine learning-based approach relying on PCLs. Experimental validation show high performance with 95% precision for identifying malware, demonstrating that PCLs can be used for discriminating malicious apps from benign apps. By further investigating the generalization ability of this feature set, we highlight an issue often overlooked in the Android malware detection community: Qualitative aspects of training datasets have a strong impact on a malware detector’s performance. Furthermore, this impact cannot be overcome by simply increasing the Quantity of training material. [less ▲]

Detailed reference viewed: 167 (2 UL)
Full Text
Peer Reviewed
See detailAre Your Training Datasets Yet Relevant? - An Investigation into the Importance of Timeline in Machine Learning-Based Malware Detection
Allix, Kevin UL; Bissyande, Tegawendé François D Assise UL; Klein, Jacques UL et al

in Engineering Secure Software and Systems - 7th International Symposium ESSoS 2015, Milan, Italy, March 4-6, 2015. Proceedings (2015)

In this paper, we consider the relevance of timeline in the construction of datasets, to highlight its impact on the performance of a machine learning-based malware detection scheme. Typically, we show ... [more ▼]

In this paper, we consider the relevance of timeline in the construction of datasets, to highlight its impact on the performance of a machine learning-based malware detection scheme. Typically, we show that simply picking a random set of known malware to train a malware detector, as it is done in many assessment scenarios from the literature, yields significantly biased results. In the process of assessing the extent of this impact through various experiments, we were also able to con- firm a number of intuitive assumptions about Android malware. For instance, we discuss the existence of Android malware lineages and how they could impact the performance of malware detection in the wild. [less ▲]

Detailed reference viewed: 1103 (31 UL)
Full Text
Peer Reviewed
See detailEmpirical assessment of machine learning-based malware detectors for Android: Measuring the Gap between In-the-Lab and In-the-Wild Validation Scenarios
Allix, Kevin UL; Bissyande, Tegawendé François D Assise UL; Jerome, Quentin UL et al

in Empirical Software Engineering (2014)

To address the issue of malware detection through large sets of applications, researchers have recently started to investigate the capabilities of machine-learning techniques for proposing effective ... [more ▼]

To address the issue of malware detection through large sets of applications, researchers have recently started to investigate the capabilities of machine-learning techniques for proposing effective approaches. So far, several promising results were recorded in the literature, many approaches being assessed with what we call in the lab validation scenarios. This paper revisits the purpose of malware detection to discuss whether such in the lab validation scenarios provide reliable indications on the performance of malware detectors in real-world settings, aka in the wild. To this end, we have devised several Machine Learning classifiers that rely on a set of features built from applications’ CFGs. We use a sizeable dataset of over 50 000 Android applications collected from sources where state-of-the art approaches have selected their data. We show that, in the lab, our approach outperforms existing machine learning-based approaches. However, this high performance does not translate in high performance in the wild. The performance gap we observed—F-measures dropping from over 0.9 in the lab to below 0.1 in the wild —raises one important question: How do state-of-the-art approaches perform in the wild ? [less ▲]

Detailed reference viewed: 443 (43 UL)
Full Text
Peer Reviewed
See detailA Forensic Analysis of Android Malware -- How is Malware Written and How It Could Be Detected?
Allix, Kevin UL; Jerome, Quentin UL; Bissyande, Tegawendé François D Assise UL et al

in Proceedings of the 2014 IEEE 38th Annual Computer Software and Applications Conference (2014, July)

We consider in this paper the analysis of a large set of malware and benign applications from the Android ecosystem. Although a large body of research work has dealt with Android malware over the last ... [more ▼]

We consider in this paper the analysis of a large set of malware and benign applications from the Android ecosystem. Although a large body of research work has dealt with Android malware over the last years, none has addressed it from a forensic point of view. After collecting over 500,000 applications from user markets and research repositories, we perform an analysis that yields precious insights on the writing process of Android malware. This study also explores some strange artifacts in the datasets, and the divergent capabilities of state-of-the-art antivirus to recognize/define malware. We further highlight some major weak usage and misunderstanding of Android security by the criminal community and show some patterns in their operational flow. Finally, using insights from this analysis, we build a naive malware detection scheme that could complement existing anti virus software. [less ▲]

Detailed reference viewed: 299 (16 UL)
Full Text
Peer Reviewed
See detailUsing opcode-sequences to detect malicious Android applications
Jerome, Quentin UL; Allix, Kevin UL; State, Radu UL et al

in IEEE International Conference on Communications, ICC 2014, Sydney Australia, June 10-14, 2014 (2014, June)

Recently, the Android platform has seen its number of malicious applications increased sharply. Motivated by the easy application submission process and the number of alternative market places for ... [more ▼]

Recently, the Android platform has seen its number of malicious applications increased sharply. Motivated by the easy application submission process and the number of alternative market places for distributing Android applications, rogue authors are developing constantly new malicious programs. While current anti-virus software mainly relies on signature detection, the issue of alternative malware detection has to be addressed. In this paper, we present a feature based detection mechanism relying on opcode-sequences combined with machine learning techniques. We assess our tool on both a reference dataset known as Genome Project as well as on a wider sample of 40,000 applications retrieved from the Google Play Store. [less ▲]

Detailed reference viewed: 210 (12 UL)
Full Text
See detailMachine Learning-Based Malware Detection for Android Applications: History Matters!
Allix, Kevin UL; Bissyande, Tegawendé François D Assise UL; Klein, Jacques UL et al

Report (2014)

Machine Learning-based malware detection is a promis- ing scalable method for identifying suspicious applica- tions. In particular, in today’s mobile computing realm where thousands of applications are ... [more ▼]

Machine Learning-based malware detection is a promis- ing scalable method for identifying suspicious applica- tions. In particular, in today’s mobile computing realm where thousands of applications are daily poured into markets, such a technique could be valuable to guaran- tee a strong filtering of malicious apps. The success of machine-learning approaches however is highly de- pendent on (1) the quality of the datasets that are used for training and of (2) the appropriateness of the tested datasets with regards to the built classifiers. Unfortu- nately, there is scarce mention of these aspects in the evaluation of existing state-of-the-art approaches in the literature. In this paper, we consider the relevance of history in the construction of datasets, to highlight its impact on the performance of the malware detection scheme. Typ- ically, we show that simply picking a random set of known malware to train a malware detector, as it is done in most assessment scenarios from the literature, yields significantly biased results. In the process of assessing the extent of this impact through various experiments, we were also able to confirm a number of intuitive assump- tions about Android malware. For instance, we discuss the existence of Android malware lineages and how they could impact the performance of malware detection in the wild. [less ▲]

Detailed reference viewed: 606 (35 UL)
Full Text
Peer Reviewed
See detailLarge-scale Machine Learning-based Malware Detection: Confronting the "10-fold Cross Validation" Scheme with Reality
Allix, Kevin UL; Bissyande, Tegawendé François D Assise UL; Jerome, Quentin UL et al

in Proceedings of the 4th ACM Conference on Data and Application Security and Privacy (2014, March)

To address the issue of malware detection, researchers have recently started to investigate the capabilities of machine- learning techniques for proposing effective approaches. Sev- eral promising results ... [more ▼]

To address the issue of malware detection, researchers have recently started to investigate the capabilities of machine- learning techniques for proposing effective approaches. Sev- eral promising results were recorded in the literature, many approaches being assessed with the common “10-Fold cross validation” scheme. This paper revisits the purpose of mal- ware detection to discuss the adequacy of the “10-Fold” scheme for validating techniques that may not perform well in real- ity. To this end, we have devised several Machine Learning classifiers that rely on a novel set of features built from ap- plications’ CFGs. We use a sizeable dataset of over 50,000 Android applications collected from sources where state-of- the art approaches have selected their data. We show that our approach outperforms existing machine learning-based approaches. However, this high performance on usual-size datasets does not translate in high performance in the wild. [less ▲]

Detailed reference viewed: 293 (21 UL)
Full Text
See detailImproving Privacy on Android Smartphones Through In-Vivo Bytecode Instrumentation
Bartel, Alexandre UL; Klein, Jacques UL; Monperrus, Martin et al

Report (2012)

In this paper we claim that a widely applicable and efficient means to fight against malicious mobile Android applications is: 1) to perform runtime monitoring 2) by instrumenting the application bytecode ... [more ▼]

In this paper we claim that a widely applicable and efficient means to fight against malicious mobile Android applications is: 1) to perform runtime monitoring 2) by instrumenting the application bytecode and 3) in-vivo, i.e. directly on the smartphone. We present a tool chain to do this and present experimental results showing that this tool chain can run on smartphones in a reasonable amount of time and with a realistic effort. Our findings also identify challenges to be addressed before running powerful runtime monitoring and instrumentations directly on smartphones. We implemented two use-cases leveraging the tool chain: FineGPolicy, a fine-grained user centric permission policy system and AdRemover an advertisement remover. Both prototypes improve the privacy of Android systems thanks to in-vivo bytecode instrumentation. [less ▲]

Detailed reference viewed: 245 (26 UL)
Full Text
See detailIn-Vivo Bytecode Instrumentation for Improving Privacy on Android Smartphones in Uncertain Environments
Bartel, Alexandre; Klein, Jacques UL; Monperrus, Martin et al

E-print/Working paper (2012)

In this paper we claim that an efficient and readily applicable means to improve privacy of Android applications is: 1) to perform runtime monitoring by instrumenting the application bytecode and 2) in ... [more ▼]

In this paper we claim that an efficient and readily applicable means to improve privacy of Android applications is: 1) to perform runtime monitoring by instrumenting the application bytecode and 2) in-vivo, i.e. directly on the smartphone. We present a tool chain to do this and present experimental results showing that this tool chain can run on smartphones in a reasonable amount of time and with a realistic effort. Our findings also identify challenges to be addressed before running powerful runtime monitoring and instrumentations directly on smartphones. We implemented two use-cases leveraging the tool chain: BetterPermissions, a fine-grained user centric permission policy system and AdRemover an advertisement remover. Both prototypes improve the privacy of Android systems thanks to in-vivo bytecode instrumentation. [less ▲]

Detailed reference viewed: 79 (18 UL)