[en] Research on Android malware detection based on Machine learning has been prolific in recent years. In this paper, we show, through a large-scale evaluation of four state-of-the-art approaches that their achieved performance fluctuates when applied to different datasets. Combining existing approaches appears as an appealing method to stabilise performance. We therefore proceed to empirically investigate the effect of such combinations on the overall detection performance. In our study, we evaluated 22 methods to combine feature sets or predictions from the state-of-the-art approaches. Our results showed that no method has significantly enhanced the detection performance reported by the state-of-the-art malware detectors. Nevertheless, the performance achieved is on par with the best individual classifiers for all settings. Overall, we conduct extensive experiments on the opportunity to combine state-of-the-art detectors. Our main conclusion is that combining state-of-theart malware detectors leads to a stabilisation of the detection performance, and a research agenda on how they should be combined effectively is required to boost malware detection. All artefacts of our large-scale study (i.e., the dataset of ∼0.5 million apks and all extracted features) are made available for replicability.
Disciplines :
Computer science
Author, co-author :
Daoudi, Nadia ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > TruX
FNR11693861 > Jacques Klein > CHARACTERIZE > Characterization Of Malicious Code In Mobile Apps: Towards Accurate And Explainable Malware Detection > 01/06/2018 > 31/12/2021 > 2017
Funders :
FNR - Fonds National de la Recherche University of Luxembourg - UL SPARTA Luxembourg Ministry of Foreign and European Affairs
Afonso VM, de Amorim MF, Grégio ARA, Junquera GB, de Geus PL (2015) Identifying android malware using dynamically obtained features. J Comput Virology Hacking Tech 11(1):9–17 DOI: 10.1007/s11416-014-0226-7
Alam MS, Vuong ST (2013) Random forest classification for detecting android malware. In: 2013 IEEE International conference on green computing and communications and IEEE internet of things and IEEE cyber, physical and social computing, pp 663–669. https://doi.org/10.1109/GreenCom-iThings-CPSCom.2013.122
Allix K, Bissyandé TF, Jérome Q, Klein J, State R, Le Traon Y (2016a) Empirical assessment of machine learning-based malware detectors for android. Empirical Softw Eng 21(1):183–211. 10.1007/s10664-014-9352-6 DOI: 10.1007/s10664-014-9352-6
Allix K, Bissyandé TF, Klein J, Le Traon Y (2016b) Androzoo: collecting millions of android apps for the research community. In: Proceedings of the 13th international conference on mining software repositories, ACM, New York, MSR ’16, pp 468–471. https://doi.org/10.1145/2901739.2903508
Allix K, Bissyandé TF, Klein J, LeTraon Y (2015) Are your training datasets yet relevant? In: Piessens F, Caballero J, Bielova N (eds) Engineering secure software and systems, springer international publishing, Cham, pp 51–67. https://doi.org/10.1007/978-3-319-15618-7_5
Appice A, Andresini G, Malerba D (2020) Clustering-aided multi-view classification: a case study on android malware detection. J Intell Inf Syst 55(1):1–26 DOI: 10.1007/s10844-020-00598-6
Arp D, Quiring E, Pendlebury F, Warnecke A, Pierazzi F, Wressnegger C, Cavallaro L, Rieck K (2020) Dos and don’ts of machine learning in computer security. arXiv: 201009470
Arp D, Spreitzenbarth M, Hübner M, Gascon H, Rieck K (2014) Drebin: efficient and explainable detection of android malware in your pocket. In: Proceedings of the ISOC network and distributed system security symposium (NDSS), San Diego, CA
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140 DOI: 10.1007/BF00058655
Breiman L (2001) Random forests. Mach Learn 45(1):5–32 DOI: 10.1023/A:1010933404324
Brown G (2010) Ensemble learning. Encyclopedia Mach Learn 312:15–19
Caruana R, Niculescu-Mizil A, Crew G, Ksikes A (2004) Ensemble selection from libraries of models. In: Proceedings of the twenty-first international conference on machine learning, association for computing machinery, New York, ICML ’04, p 18. https://doi.org/10.1145/1015330.1015432
Christianah A, Gyunka B, Oluwatobi A (2020) Optimizing android malware detection via ensemble learning. https://www.learntechlib.org/p/217826
DATA G (2020) G DATA mobile malware report. https://www.gdatasoftware.com/news/1970/01/-36401-g-data-mobile-malware-report-harmful-android-apps-every-eight-seconds. Accessed 10 June 2021
Daoudi N, Allix K, Bissyandé TF, Klein J (2021a) A deep dive inside drebin: an explorative analysis beyond android malware detection scores. ACM Trans Privacy Secur (TOPS) Appear
Daoudi N, Allix K, Bissyandé TF, Klein J (2021b) Lessons learnt on reproducibility in machine learning based android malware detection. Empirical Softw Eng 26(4):1–53. 10.1007/s10664-021-09955-7 DOI: 10.1007/s10664-021-09955-7
Daoudi N, Samhi J, Kabore AK, Allix K, Bissyandé TF, Klein J (2021c) Dexray: a simple, yet effective deep learning approach to android malware detection based on image representation of bytecode. In: Wang G, Ciptadi A, Ahmadzadeh A (eds) Deployable machine learning for security defense, springer international publishing, Cham, pp 81–106. https://doi.org/10.1007/978-3-030-87839-9_4
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
Dhalaria M, Gandotra E (2020) Android malware detection using chi-square feature selection and ensemble learning method. In: 2020 Sixth international conference on parallel, distributed and grid computing (PDGC), pp 36–41. 10.1109/PDGC50313.2020.9315818
Ding Y, Zhang X, Hu J, Xu W (2020) Android malware detection method based on bytecode image. J Ambient Intell Humanized Comput:1–10
Dong X, Yu Z, Cao W, Shi Y, Ma Q (2020) A survey on ensemble learning. Frontiers Comput Sci 14(2):241–258 DOI: 10.1007/s11704-019-8208-z
Fereidooni H, Conti M, Yao D, Sperduti A (2016) Anastasia: android malware detection using static analysis of applications. In: 2016 8th IFIP international conference on new technologies, mobility and security (NTMS), pp 1–5. https://doi.org/10.1109/NTMS.2016.7792435
Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139. 10.1006/jcss.1997.1504, https://www.sciencedirect.com/science/article/pii/S002200009791504X DOI: 10.1006/jcss.1997.1504
Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Amer Stat Assoc 32(200):675–701. 10.1080/01621459.1937.10503522 DOI: 10.1080/01621459.1937.10503522
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Annals Stat:1189–1232
Garcia J, Hammad M, Malek S (2018) Lightweight, obfuscation-resilient detection and family identification of android malware. ACM Trans Softw Eng Methodol, vol 26(3). https://doi.org/10.1145/3162625
Huang TH, Kao H (2018) R2-d2: color-inspired convolutional neural network (cnn)-based android malware detections. In: 2018 IEEE international conference on big data (big data), pp 2633–2642. https://doi.org/10.1109/BigData.2018.8622324
Hurier M, Allix K, Bissyandé TF, Klein J, Le Traon Y (2016) On the lack of consensus in anti-virus decisions: metrics and insights on building ground truths of android malware. In: Proceedings of the 13th international conference on detection of intrusions and malware, and vulnerability assessment - vol 9721, Springer-Verlag, Berlin, Heidelberg, DIMVA 2016, pp 142–162. https://doi.org/10.1007/978-3-319-40667-1_8
Idrees F, Rajarajan M, Conti M, Chen TM, Rahulamathavan Y (2017) Pindroid: a novel android malware detection system using ensemble learning methods, vol 68, pp 36–46. https://doi.org/10.1016/j.cose.2017.03.011, https://www.sciencedirect.com/science/article/pii/S0167404817300640
Kaspersky (2021) Kaspersky security network. https://securelist.com/it-threat-evolution-q1-2021-mobile-statistics/102547/. Accessed 10 June 2021
Mariconti E, Onwuzurike L, Andriotis P, De Cristofaro E, Ross G, Stringhini G (2017) Mamadroid: detecting android malware by building markov chains of behavioral models. In: ISOC network and distributed systems security symposiym (NDSS), San Diego, CA
Miller B, Kantchelian A, Tschantz MC, Afroz S, Bachwani R, Faizullabhoy R, Huang L, Shankar V, Wu T, Yiu G, Joseph AD, Tygar JD (2016) Reviewer integration and performance measurement for malware detection. In: Caballero J, Zurutuza U, Rodríguez RJ (eds) Detection of intrusions and malware, and vulnerability assessment. Springer international publishing, Cham, pp 122–141
Milosevic N, Dehghantanha A, Choo KKR (2017) Machine learning aided android malware classification. Comput Electr Eng 61:266–274. 10.1016/j.compeleceng.2017.02.013, https://www.sciencedirect.com/science/article/pii/S0045790617303087 DOI: 10.1016/j.compeleceng.2017.02.013
Nemenyi PB (1963) Distribution-free multiple comparisons. Princeton University
Onwuzurike L, Mariconti E, Andriotis P, Cristofaro ED, Ross G, Stringhini G (2019) Mamadroid: detecting android malware by building markov chains of behavioral models (extended version). ACM Trans Priv Secur 22(2):14:1–14:34. 10.1145/3313391 DOI: 10.1145/3313391
Parab S, Bhalerao S (2010) Choosing statistical test. Int J Ayurveda Res 1(3):187 DOI: 10.4103/0974-7788.72494
Pendlebury F, Pierazzi F, Jordaney R, Kinder J, Cavallaro L (2019) TESSERACT: eliminating experimental bias in malware classification across space and time. In: 28th USENIX security symposium (USENIX security 19), USENIX association, Santa Clara, CA, pp 729–746. https://www.usenix.org/conference/usenixsecurity19/presentation/pendlebury
Perinetti G (2016) Statips part i: choosing statistical test when dealing with differences. South European J Orthodontics Dentofacial Res 3(1):3–4 DOI: 10.5937/sejodr3-1264
Rossow C, Dietrich CJ, Grier C, Kreibich C, Paxson V, Pohlmann N, Bos H, Steen VM (2012) Prudent practices for designing malware experiments: status quo and outlook. In: 2012 IEEE symposium on security and privacy, pp 65–79. https://doi.org/10.1109/SP.2012.14
Sagi O, Rokach L (2018) Ensemble learning: a survey. Wiley Interdisciplinary Rev Data Mining Knowl Discover 8(4):e1249
Salem A, Banescu S, Pretschner A (2021) Maat: automatically analyzing virustotal for accurate labeling and effective malware detection. ACM Trans Priv Secur, vol 24(4). 10.1145/3465361
Sebastián M, Rivera R, Kotzias P, Caballero J (2016) Avclass: a tool for massive malware labeling. In: International symposium on research in attacks, intrusions, and defenses, Springer, pp 230-253
Sheldon MR, Fillyaw MJ, Thompson WD (1996) The use and interpretation of the friedman test in the analysis of ordinal-scale data in repeated measures designs. Physiother Res Int 1(4):221–228 DOI: 10.1002/pri.66
Sun T, Daoudi N, Allix K, Bissyandé TF (2021) Android malware detection: looking beyond dalvik bytecode. In: Proceedings of the 36th IEEE/ACM international conference on automated software engineering workshops, ASE ’21
Wang J, Jing Q, Gao J, Qiu X (2020) Sedroid: a robust android malware detector using selective ensemble learning. In: 2020 IEEE wireless communications and networking conference (WCNC), pp 1–5. https://doi.org/10.1109/WCNC45663.2020.9120537
Wang X, Zhang D, Su X, Li W (2017) Mlifdect: android malware detection based on parallel machine learning and information fusion. Secur Commun Netw, vol 2017
Wu Y, Li X, Zou D, Yang W, Zhang X, Jin H (2019) Malscan: fast market-wide mobile malware scanning by social-network centrality analysis. In: 2019 34th IEEE/ACM international conference on automated software engineering (ASE), pp 139–150
Wu D, Mao C, Wei T, Lee H, Wu K (2012) Droidmat: android malware detection through manifest and api calls tracing. In: 2012 Seventh asia joint conference on information security, pp 62–69. https://doi.org/10.1109/AsiaJCIS.2012.18
Xu J, Li Y, Deng RH (2021) Differential training: a generic framework to reduce label noises for android malware detection. In: Proceeding of network and distributed system security symposium (NDSS)
Yerima SY, Sezer S, Muttik I (2014) Android malware detection using parallel machine learning classifiers. In: 2014 Eighth international conference on next generation mobile apps, services and technologies, pp 37–42. https://doi.org/10.1109/NGMAST.2014.23
Yerima SY, Sezer S, Muttik I (2015) High accuracy android malware detection using ensemble learning. IET Inf Secur 9(6):313–320 DOI: 10.1049/iet-ifs.2014.0099
Zhang X, Jin Z (2016) A new semantics-based android malware detection. In: 2016 2nd IEEE international conference on computer and communications (ICCC), pp 1412–1416. https://doi.org/10.1109/CompComm.2016.7924936
Zhang W, Ren H, Jiang Q, Zhang K (2015) Exploring feature extraction and elm in malware detection for android devices. In: Hu X, Xia Y, Zhang Y, Zhao D (eds) Advances in neural networks – ISNN 2015, Springer international publishing, Cham, pp 489-498
Zhao Y, Li L, Wang H, Cai H, Bissyandé TF, Klein J, Grundy J (2021) On the impact of sample duplication in machine-learning-based android malware detection. ACM Trans Softw Eng Methodol, vol 30(3). 10.1145/3446905
Zhao C, Wang C, Zheng W (2019) Android malware detection based on sensitive permissions and apis. In: International conference on security and privacy in new computing environments, Springer, pp 105–113
Zhao C, Zheng W, Gong L, Zhang M, Wang C (2018) Quick and accurate android malware detection based on sensitive apis. In: 2018 IEEE international conference on smart internet of things (SmartIoT), pp 143–148. 10.1109/SmartIoT.2018.00034
Zhu H, Li Y, Li R, Li J, You Z, Song H (2020) Sedmdroid: an enhanced stacking ensemble of deep learning framework for android malware detection. IEEE Trans Netw Sci Eng:1–1. https://doi.org/10.1109/TNSE.2020.2996379