[en] Android malware is now pervasive and evolving rapidly. Thousands of malware samples are discovered every day with new models of attacks. The growth of these threats has come hand in hand with the proliferation of collective repositories sharing the latest specimens. Having access to a large number of samples opens new research directions aiming at efficiently vetting apps. However, automatically inferring a reference ground-truth from those repositories is not straightforward and can inadvertently lead to unforeseen misconceptions. On the one hand, samples are often mis-labeled as different parties use distinct naming schemes for the same sample. On the other hand, samples are frequently mis-classified due to conceptual errors made during labeling processes. In this paper, we analyze the associations between all labels given by different vendors and we propose a system called EUPHONY to systematically unify common samples into family groups. The key novelty of our approach is that no a-priori knowledge on malware families is needed. We evaluate our approach using reference datasets and more than 0.4 million additional samples outside of these datasets. Results show that EUPHONY provides competitive performance against the state-of-the-art.
Centre de recherche :
University of Luxembourg: Interdisciplinary Centre for Security, Reliability and Trust - SNT
Disciplines :
Sciences informatiques
Auteur, co-auteur :
HURIER, Médéric ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)
Suarez-Tangil, Guillermo; University College London - UCL
Dash, Santanu Kumar; University College London - UCL
LE TRAON, Yves ; University of Luxembourg > Faculty of Science, Technology and Communication (FSTC) > Computer Science and Communications Research Unit (CSC)
KLEIN, Jacques ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > Computer Science and Communications Research Unit (CSC)
Cavallaro, Lorenzo; Royal Holloway, University of London
Co-auteurs externes :
yes
Langue du document :
Anglais
Titre :
Euphony: Harmonious Unification of Cacophonous Anti-Virus Vendor Labels for Android Malware
Date de publication/diffusion :
21 mai 2017
Nom de la manifestation :
The 14th International Conference on Mining Software Repositories
Lieu de la manifestation :
Buenos Aires, Argentine
Date de la manifestation :
from 20-05-2017 to 21-05-2017
Manifestation à portée :
International
Titre de l'ouvrage principal :
MSR 2017
Peer reviewed :
Peer reviewed
Focus Area :
Security, Reliability and Trust
Projet FnR :
FNR5921289 - Static Analysis For Android Security: Building The Map Of Android Inter-application Communication, 2013 (01/05/2014-30/04/2017) - Jacques Klein
Y. Aafer, W. Du, and H. Yin. Droidapiminer: Mining api-level features for robust malware detection in android. Security and Privacy in Communication Networks, 127:86-103, 2013.
K. Allix, T. F. Bissyandé, Q. Jérome, J. Klein, R. State, and Y. Le Traon. Empirical assessment of machine learning-based malware detectors for android. Empirical Software Engineering, 2014.
K. Allix, T. F. Bissyandé, J. Klein, and Y. Le Traon. Androzoo: collecting millions of android apps for the research community. In Proceedings of the 13th International Workshop on Mining Software Repositories, pages 468-471. ACM, 2016.
D. Arp, M. Spreitzenbarth, H. Malte, H. Gascon, and K. Rieck. Drebin: Effective and explainable detection of android malware in your pocket. Symposium on Network and Distributed System Security (NDSS), pages 23-26, 2014.
AVClass. Avclass repository cloned on oct 24, 2016. commit head: 80c14adcc29978ab813b41c73dd485072e576140.
AVClass. Github of avclass. https://github.com/malicialab/avclass.
M. Bailey, J. Oberheide, J. Andersen, Z. M. Mao, F. Jahanian, and J. Nazario. Automated classification and analysis of internet malware. Recent Advances in Intrusion Detection, 4637/2007:178-197, 2007.
D. Barrera, H. G. ö. b. Kayacik, P. C. van Oorschot, and A. Somayaji. A methodology for empirical analysis of permission-based security models and its application to android. Proceedings of the 17th ACM CCS, (1):73-84, 2010.
V. Bontchev. Current status of the caro malware naming scheme. Virus Bulletin (VB2005), Dublin, Ireland, 2005.
P.-M. Bureau and D. Harley. A dose by any other name. In Virus Bulletin Conference, VB, volume 8, pages 224-231, 2008.
I. Burguera, U. Zurutuza, and S. Nadjm-Tehrani. Crowdroid: Behaviorbased malware detection system for android. Proceedings of the 1st ACM workshop on Security and privacy in smartphones and mobile devices-SPSM 11, (January):15, 2011.
J. Canto, H. Sistemas, M. Dacier, E. Kirda, and C. Leita. Large scale malware collection: lessons learned. 27th International Symposium on Reliable Distributed Systems., 52(1):35-44, 2008.
CARO. Caro naming convention. http://www.caro.org/articles/naming. html.
S. Chakradeo, B. Reaves, P. Traynor, and W. Enck. Mast: Triage for market-scale mobile malware analysis. In Proceedings of the sixth ACM conference on Security and privacy in wireless and mobile networks, pages 13-24. ACM, 2013.
W. Chen, D. Aspinall, A. D. Gordon, C. Sutton, and I. Muttik. More semantics more robust: Improving android malware classifiers. In Proceedings of the 9th ACM Conference on Security & Privacy in Wireless and Mobile Networks, pages 147-158. ACM, 2016.
S. K. Dash, G. Suarez-Tangil, S. Khan, K. Tam, M. Ahmadi, J. Kinder, and L. Cavallaro. Droidscribe: Classifying android malware based on runtime behavior. In Mobile Security Technologies (MoST 2016), 2016.
L. R. Dice. Measures of the amount of ecologic association between species. Ecology, 26(3):297-302, 1945.
W. Enck, D. Octeau, P. McDaniel, and S. Chaudhuri. A study of android application security. In Proceedings of the 20th USENIX Security, page 21, 2011.
W. Enck, M. Ongtang, and P. McDaniel. On lightweight mobile phone application certification. Proceedings of the 16th ACM conference on Computer and communications security-CCS 09, pages 235-245, 2009.
Euphony. Github of euphony. https://github.com/fmind/euphony.
A. P. Felt, E. Chin, S. Hanna, D. Song, and D. Wagner. Android permissions demystified. In Proceedings of the 18th ACM conference on Computer and communications security, CCS 11, pages 627-638, New York, NY, USA, 2011. ACM.
H. Gascon, F. Yamaguchi, D. Arp, and K. Rieck. Structural detection of android malware using embedded call graphs. In Proceedings of the 2013 ACM workshop on Artificial intelligence and security, pages 45-54. ACM, 2013.
M. Grace, Y. Zhou, Q. Zhang, S. Zou, and X. Jiang. Riskranker : Scalable and accurate zero-day android malware detection categories and subject descriptors. Proceedings of the 10th International Conference on Mobile Systems, Applications, and Services, pages 281-293, 2011.
M. Hurier, K. Allix, T. F. Bissyandé, J. Klein, and Y. Le Traon. On the lack of consensus in anti-virus decisions: Metrics and insights on building ground truths of android malware. In Proceedings of the 13th International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment-Volume 9721, DIMVA 2016, pages 142-162, New York, NY, USA, 2016. Springer-Verlag New York, Inc.
A. Kantchelian, M. C. Tschantz, S. Afroz, B. Miller, V. Shankar, R. Bachwani, A. D. Joseph, and J. D. Tygar. Better malware ground truth: Techniques for weighting anti-virus vendor labels. AISec 15, pages 45-56. ACM, 2015.
T. Kelchner. The (in)consistent naming of malcode. Computer Fraud and Security, 2010(2):5-7, 2010.
P. Li, L. Liu, D. Gao, and M. K. Reiter. Recent Advances in Intrusion Detection: 13th International Symposium, RAID 2010. Proceedings, chapter On Challenges in Evaluating Malware Clustering, pages 238-255. Springer Berlin Heidelberg, 2010.
M. Lindorfer, M. Neugschwandtner, and C. Platzer. Marvin: Efficient and comprehensive mobile app classification through static and dynamic analysis. In COMPSAC14, 7 2015.
A. Marx and F. Dessmann. The wildlist is dead , long live the wildlist ! table of contents. (SEPTEMBER):136-146, 2007.
H. Peng, C. Gates, B. Sarma, N. Li, Y. Qi, R. Potharaju, C. Nita-Rotaru, and I. Molloy. Using probabilistic generative models for ranking risks of android apps. In Proceedings of the 2012 ACM CCS, pages 241-252. ACM, 2012.
R. Perdisci and M. U. Vamo: Towards a fully automated malware clustering validity analysis. Annual Computer Security Applications Conference, page 329, 2012.
R. C. Prim. Shortest connection networks and some generalizations. Bell System Technical Journal, 36(6):1389-1401, 1957.
V. Rastogi, Y. Chen, and W. Enck. Appsplayground : Automatic security analysis of smartphone applications. CODASPY 13 (3rd ACM conference on Data and Application Security and Privac), pages 209-220, 2013.
A. Reina, A. Fattori, and L. Cavallaro. A system call-centric analysis and stimulation technique to automatically reconstruct android malware behaviors. EuroSec, April, 2013.
C. Rossow, C. J. Dietrich, C. Grier, C. Kreibich, V. Paxson, N. Pohlmann, H. Bos, and M. Van Steen. Prudent practices for designing malware experiments: Status quo and outlook. Proceedings of S&P, pages 65-79, 2012.
B. P. Sarma, N. Li, C. Gates, R. Potharaju, C. Nita-Rotaru, and I. Molloy. Android permissions: A perspective combining risks and benefits. In Proceedings of the 17th ACM symposium on Access Control Models and Technologies, SACMAT 12, pages 13-22, New York, NY, USA, 2012. ACM.
M. Sebastián, R. Rivera, P. Kotzias, and J. Caballero. Avclass: A tool for massive malware labeling. In International Symposium on Research in Attacks, Intrusions, and Defenses, pages 230-253. Springer, 2016.
R. Sommer and V. Paxson. Outside the closed world: On using machine learning for network intrusion detection. In Proceedings of the 2010 IEEE S&P, pages 305-316.
G. Suarez-Tangil, S. K. Dash, M. Ahmadi, J. Kinder, G. Giacinto, and L. Cavallaro. DroidSieve: Fast and accurate classification of obfuscated android malware. May 2017.
G. Suarez-Tangil, J. E. Tapiador, P. Peris, and A. Ribagorda. Evolution, detection and analysis of malware for smart devices. IEEE Communications Surveys & Tutorials, 16(2):961-987, May 2014.
G. Suarez-Tangil, J. E. Tapiador, P. Peris-Lopez, and J. Blasco. Dendroid: A text mining approach to analyzing and classifying code structures in android malware families. Expert Systems with Applications, 41(4):1104-1117, 2014.
S. University of Luxembourg. Androzoo repository. https://androzoo. uni.lu.
VirusTotal. VirusTotal about page. https://www.virustotal.com/en/about/.
L. Weichselbaum, M. Neugschwandtner, M. Lindorfer, Y. Fratantonio, V. van der Veen, and C. Platzer. Andrubis: Android malware under the magnifying glass. Vienna University of Technology, Tech. Rep. TRISECLAB-0414, 1:5, 2014.
D.-J. Wu, C.-H. Mao, T.-E. Wei, H.-M. Lee, and K.-P. Wu. Droidmat: Android malware detection through manifest and api calls tracing. In Information Security (Asia JCIS), 2012 Seventh Asia Joint Conference on, pages 62-69, 2012.
L. Yan and H. Yin. Droidscope: seamlessly reconstructing the os and dalvik semantic views for dynamic android malware analysis. Proceedings of the 21st USENIX Security Symposium, page 29, 2012.
Y. Zhou and X. Jiang. Dissecting android malware: Characterization and evolution. In Proceedings of the 2012 IEEE S&P, pages 95-109, 2012.
Y. Zhou, Z. Wang, W. Zhou, and X. Jiang. Hey, you, get off of my market: Detecting malicious apps in official and alternative android markets. Proceedings of the 19th Annual Network and Distributed System Security Symposium, (2):5-8, 2012.