[en] Researchers generally look for specific files within Android application packages (APKs) during their analysis, focusing on common files such as Dalvik bytecode or the Android manifest. However, Android apps are complex archive files containing various types of files. Failing to account for all files during analyses can compromise end-user security, and despite the wealth of existing techniques to analyze Android apps, only a few studies explore the diversity of files within apps. To bridge this gap, we propose the first large-scale empirical study that dissects the content of Android apps from Google Play. In our study, we explore the different file types and their usage trends. We enhance our analysis by exploring compressed files and the files they contain. We finally investigate to which extent developers use disguised files, i.e., files whose extension is conventionally associated with a file type different than its own (e.g., a Dalvik dex file with the extension “.png”), and study if they are a hint of maliciousness. Our results show that: ❶ Android apps comprise diverse file types, with over 15 000 distinct file extensions and more than 1000 unique file types found in our dataset containing over 400 000 APKs; and ❷ we found many cases where developers use a wrong relation between the file type and its extension to load malicious code at runtime.
Disciplines :
Sciences informatiques
Auteur, co-auteur :
RUIZ JIMÉNEZ, Pedro Jesús ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > TruX
SAMHI, Jordan ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > TruX
G. Suarez-Tangil, J. E. Tapiador, and P. Peris-Lopez, "Stegomalware: Playing hide and seek with malicious components in smartphone apps," in Information Security and Cryptology: 10th International Conference, Inscrypt 2014, Beijing, China, December 13-15, 2014, Revised Selected Papers 10. Springer, 2015, pp. 496-515.
P. Manev, "Hunting for malware masquerading as an image file, urlhttps://www.stamus-networks.com/blog/huntingfor-malware-masquerading-as-an-image-file," Stamus Networks, 2022. [Online]. Available: https://www.stamus-networks.com/blog/ hunting-for-malware-masquerading-as-an-image-file
K. Zanki, "Malware in images: When you can't see 'the whole picture', https://www.reversinglabs.com/blog/malware-in-images," Reversing Labs, 2021. [Online]. Available: https://www.reversinglabs.com/ blog/malware-in-images
F. Wei, S. Roy, X. Ou, and Robby, "Amandroid: A precise and general inter-component data flow analysis framework for security vetting of android apps," ACM Trans. Priv. Secur., vol. 21, no. 3, apr 2018. [Online]. Available: https://doi.org/10.1145/3183575
J. Samhi, A. Bartel, T. F. Bissyande, and J. Klein, "Raicc: Revealing atypical inter-component communication in android apps," in 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). Los Alamitos, CA, USA: IEEE Computer Society, May 2021, pp. 1398-1409. [Online]. Available: https: //doi.org/10.1109/ICSE43902.2021.00126
M. I. Gordon, D. Kim, J. H. Perkins, L. Gilham, N. Nguyen, and M. C. Rinard, "Information flow analysis of android applications in droidsafe." in NDSS, vol. 15, 2015, p. 110.
D. Wu, D. Gao, R. H. Deng, and C. Rocky K. C., "When program analysis meets bytecode search: Targeted and efficient inter-procedural analysis of modern android apps in backdroid," in 2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 2021, pp. 543-554.
F. Pauck and H. Wehrheim, "Jicer: Simplifying cooperative android app analysis tasks," in 2021 IEEE 21st International Working Conference on Source Code Analysis and Manipulation (SCAM), 2021, pp. 187-197.
E. Blázquez and J. Tapiador, "Kunai: A static analysis framework for android apps," SoftwareX, vol. 22, p. 101370, 2023.
J. Samhi, L. Li, T. F. Bissyande, and J. Klein, "Difuzer: Uncovering suspicious hidden sensitive operations in android apps," in 2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE). Los Alamitos, CA, USA: IEEE Computer Society, May 2022, pp. 723-735. [Online]. Available: https://doi.ieeecomputersociety.org/ 10.1145/3510003.3510135
H. Wang, H. Li, and Y. Guo, "Understanding the evolution of mobile app ecosystems: A longitudinal measurement study of google play," in The World Wide Web Conference, ser. WWW '19. New York, NY, USA: Association for Computing Machinery, 2019, p. 1988-1999. [Online]. Available: https://doi.org/10.1145/3308558.3313611
H. Wang, Z. Liu, J. Liang, N. Vallina-Rodriguez, Y. Guo, L. Li, J. Tapiador, J. Cao, and G. Xu, "Beyond google play: A large-scale comparative study of chinese android app markets," in Proceedings of the Internet Measurement Conference 2018, ser. IMC '18. New York, NY, USA: Association for Computing Machinery, 2018, p. 293-307. [Online]. Available: https://doi.org/10.1145/3278532.3278558
N. Viennot, E. Garcia, and J. Nieh, "A measurement study of google play," in The 2014 ACM International Conference on Measurement and Modeling of Computer Systems, ser. SIGMETRICS '14. New York, NY, USA: Association for Computing Machinery, 2014, p. 221-233. [Online]. Available: https://doi.org/10.1145/2591971.2592003
W. Martin, F. Sarro, Y. Jia, Y. Zhang, and M. Harman, "A survey of app store analysis for software engineering," IEEE Transactions on Software Engineering, vol. 43, no. 9, pp. 817-847, 2017.
D. Arp, M. Spreitzenbarth, M. Hubner, H. Gascon, K. Rieck, and C. Siemens, "Drebin: Effective and explainable detection of android malware in your pocket." in Ndss, vol. 14, 2014, pp. 23-26.
G. Shrivastava and P. Kumar, "Sensdroid: analysis for malicious activity risk of android application," Multimedia Tools and Applications, vol. 78, no. 24, pp. 35 713-35 731, 2019.
S. Alam, Z. Qu, R. Riley, Y. Chen, and V. Rastogi, "Droidnative: Automating and optimizing detection of android native code malware variants," computers & security, vol. 65, pp. 230-246, 2017.
D.-J. Wu, C.-H. Mao, T.-E. Wei, H.-M. Lee, and K.-P. Wu, "Droidmat: Android malware detection through manifest and api calls tracing," in 2012 Seventh Asia joint conference on information security. IEEE, 2012, pp. 62-69.
H. Gascon, F. Yamaguchi, D. Arp, and K. Rieck, "Structural detection of android malware using embedded call graphs," in Proceedings of the 2013 ACM workshop on Artificial intelligence and security, 2013, pp. 45-54.
Mozilla. (2023) Mime types, https://developer.mozilla.org/en-US/docs/ Web/HTTP/Basics of HTTP/MIME types. Mozilla.
IBM. (2018) What is a magic number?, https://www.ibm.com/ support/pages/what-magic-number. IBM. [Online]. Available: https: //www.ibm.com/support/pages/what-magic-number
K. Allix, T. F. Bissyandé, J. Klein, and Y. Le Traon, "Androzoo: Collecting millions of android apps for the research community," in Proceedings of the 13th International Conference on Mining Software Repositories, ser. MSR '16. New York, NY, USA: ACM, 2016, pp. 468-471. [Online]. Available: http://doi.acm.org/10.1145/2901739.2903508
"Zipfile library https://docs.python.org/3/library/zipfile.html," 2023, accessed November 2023.
Anonymous, "Extension median count evolution," 2024. [Online]. Available: https://anonymous.4open.science/api/repo/SANER2025/file/ Plots/Extension Median Count Evolution.pdf
J. Baker, "What is google's virustotal?, https://bestantivirus.com/blog/ what-is-googles-virustotal.html," Best Antivirus, 2020. [Online]. Available: https://bestantivirus.com/blog/what-is-googles-virustotal.html
R. Vallée-Rai, P. Co, E. Gagnon, L. Hendren, P. Lam, and V. Sundaresan, "Soot-a java bytecode optimization framework," in Proceedings of the 1999 Conference of the Centre for Advanced Studies on Collaborative Research, ser. CASCON '99. IBM Press, 1999, p. 13.
S. Arzt, S. Rasthofer, C. Fritz, E. Bodden, A. Bartel, J. Klein, Y. Le Traon, D. Octeau, and P. McDaniel, "Flowdroid: Precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for android apps," SIGPLAN Not., vol. 49, no. 6, p. 259-269, jun 2014. [Online]. Available: https://doi.org/10.1145/2666356.2594299
D. Stevens, "Malicious pdf documents explained," IEEE Security & Privacy, vol. 9, no. 1, pp. 80-82, 2011.
A. Fisher, "Github gist: Maps file extensions to mime types," 2015, accessed: October 2024. [Online]. Available: https://gist.github.com/ adamfisher/16fe8c619ea389944d0f
magic, "@magic/mime-types," 2023, accessed: October 2024. [Online]. Available: https://github.com/magic/mime-types/tree/master
J. Samhi and A. Zeller, "Androlog: Android instrumentation and code coverage analysis," in Companion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering, ser. FSE 2024. New York, NY, USA: Association for Computing Machinery, 2024, p. 597-601. [Online]. Available: https://doi.org/10.1145/3663529.3663806
J. Team, "Jadx: Dex to java decompiler," 2024, accessed August 2024. [Online]. Available: https://github.com/skylot/jadx
S. Alam, Z. Qu, R. Riley, Y. Chen, and V. Rastogi, "Droidnative: Automating and optimizing detection of android native code malware variants," computers & security, vol. 65, pp. 230-246, 2017.
F. Wei, X. Lin, X. Ou, T. Chen, and X. Zhang, "Jn-saf: Precise and efficient ndk/jni-aware inter-language static analysis framework for security vetting of android applications with native code," in Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, 2018, pp. 1137-1150.
L. Xue, C. Qian, H. Zhou, X. Luo, Y. Zhou, Y. Shao, and A. T. Chan, "Ndroid: Toward tracking information flows across multiple android contexts," IEEE Transactions on Information Forensics and Security, vol. 14, no. 3, pp. 814-828, 2018.
J. Samhi, J. Gao, N. Daoudi, P. Graux, H. Hoyez, X. Sun, K. Allix, T. F. Bissyandé, and J. Klein, "Jucify: A step towards android code unification for enhanced static analysis," in 2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE). Los Alamitos, CA, USA: IEEE Computer Society, May 2022, pp. 1232-1244. [Online]. Available: https://doi.ieeecomputersociety.org/10.1145/3510003.3512766
J. Bai, W. Wang, Y. Qin, S. Zhang, J. Wang, and Y. Pan, "Bridgetaint: a bi-directional dynamic taint tracking method for javascript bridges in android hybrid applications," IEEE Transactions on Information Forensics and Security, vol. 14, no. 3, pp. 677-692, 2018.
S. Lee, J. Dolby, and S. Ryu, "Hybridroid: static analysis framework for android hybrid applications," in Proceedings of the 31st IEEE/ACM international conference on automated software engineering, 2016, pp. 250-261.
Y. Liu, X. Chen, P. Liu, J. Grundy, C. Chen, and L. Li, "Reunify: A step towards whole program analysis for react native android apps," arXiv preprint arXiv:2309.03524, 2023.
H. Cai and B. Ryder, "A longitudinal study of application structure and behaviors in android," IEEE Transactions on Software Engineering, vol. 47, no. 12, pp. 2934-2955, 2020.
H. Cai and B. G. Ryder, "Droidfax: A toolkit for systematic characterization of android applications," in 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 2017, pp. 643-647.
I. Ghafir, V. Prenosil, M. Hammoudeh, F. J. Aparicio-Navarro, K. Rabie, and A. Jabban, "Disguised executable files in spear-phishing emails: Detecting the point of entry in advanced persistent threat," in Proceedings of the 2nd International Conference on Future Networks and Distributed Systems, ser. ICFNDS '18. New York, NY, USA: Association for Computing Machinery, 2018. [Online]. Available: https://doi.org/10.1145/3231053.3231097
N. Nissim, A. Cohen, C. Glezer, and Y. Elovici, "Detection of malicious pdf files and directions for enhancements: A state-of-the art survey," Computers & Security, vol. 48, pp. 246-266, 2015. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0167404814001606
E. M. Rudd, R. Harang, and J. Saxe, "Meade: Towards a malicious email attachment detection engine," in 2018 IEEE International Symposium on Technologies for Homeland Security (HST). IEEE, 2018, pp. 1-7.
F.-H. Hsu, C.-K. Tso, Y.-C. Yeh, W.-J. Wang, and L.-H. Chen, "Browserguard: A behavior-based solution to drive-by-download attacks," IEEE Journal on Selected Areas in Communications, vol. 29, no. 7, pp. 1461-1468, 2011.
W. Yan, Z. Zhang, and N. Ansari, "Revealing packed malware," IEEE Security & Privacy, vol. 6, no. 5, pp. 65-69, 2008.
P. Lagadec, "Opendocument and open xml security (openoffice. org and ms office 2007)," Journal in Computer Virology, vol. 4, no. 2, pp. 115-125, 2008.
F. Daryabar, A. Dehghantanha, and H. G. Broujerdi, "Investigation of malware defence and detection techniques," International Journal of Digital Information and Wireless Communications (IJDIWC), vol. 1, no. 3, pp. 645-650, 2011.
R. Lyda and J. Hamrock, "Using entropy analysis to find encrypted and packed malware," IEEE Security & Privacy, vol. 5, no. 2, pp. 40-45, 2007.
L. Martignoni, M. Christodorescu, and S. Jha, "Omniunpack: Fast, generic, and safe unpacking of malware," in Twenty-Third Annual Computer Security Applications Conference (ACSAC 2007). IEEE, 2007, pp. 431-441.