Revisiting Android App Categorization

[en] Numerous tools rely on automatic categorization of Android apps as part of their methodology. However, incorrect categorization can lead to inaccurate outcomes, such as a malware detector wrongly flagging a benign app as malicious. One such example is the SlideIT Free Keyboard app, which has over 500000 downloads on Google Play. Despite being a "Keyboard" app, it is often wrongly categorized alongside "Language" apps due to the app's description focusing heavily on language support, resulting in incorrect analysis outcomes, including mislabeling it as a potential malware when it is actually a benign app. Hence, there is a need to improve the categorization of Android apps to benefit all the tools relying on it. In this paper, we present a comprehensive evaluation of existing Android app categorization approaches using our new ground-truth dataset. Our evaluation demonstrates the notable superiority of approaches that utilize app descriptions over those solely relying on data extracted from the APK file, while also leaving space for potential improvement in the former category. Thus, we propose two innovative approaches that effectively outperform the performance of existing methods in both description-based and APK-based methodologies. Finally, by employing our novel description-based approach, we have successfully demonstrated that adopting a higher-performing categorization method can significantly benefit tools reliant on app categorization, leading to an improvement in their overall performance. This highlights the significance of developing advanced and efficient app categorization methodologies for improved results in software engineering tasks.

Disciplines :

Computer science

Author, co-author :

ALECCI, Marco ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > TruX

SAMHI, Jordan ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust > TruX > Team Jacques KLEIN ; CISPA Helmholtz Center for Information Security

BISSYANDE, Tegawendé François d Assise ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > TruX

KLEIN, Jacques ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > TruX

External co-authors :

yes

Language :

English

Title :

Revisiting Android App Categorization

Publication date :

April 2024

Event name :

International Conference on Software Engineering 2024

Event organizer :

ACM/IEEE

Event place :

Lisbon, Portugal

Event date :

14-20 April 2024

Audience :

International

Main work title :

ICSE '24: Proceedings of the 46th International Conference on Software Engineering

Publisher :

IEEE Press

Peer reviewed :

Peer reviewed

Commentary :

Accepted at ICSE2024

Available on ORBilu :

since 25 November 2023

Statistics

Number of views

153 (16 by Unilu)

Number of downloads

46 (3 by Unilu)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

OpenAlex citations

Bibliography

Marco Alecci, Pedro J. R. Jiménez, Kevin Allix, Tegawendé F. Bissyandé, and Jacques Klein. 2024. AndroZoo: A Retrospective with a Glimpse into the Future. In 2024 IEEE/ACM 21st International Conference on Mining Software Repositories (MSR). IEEE Computer Society.
Kevin Allix, Tegawendé F. Bissyandé, Jacques Klein, and Yves Le Traon. 2016. AndroZoo: Collecting Millions of Android Apps for the Research Community. In Proceedings of the 13th International Conference on Mining Software Repositories (Austin, Texas) (MSR '16). ACM, New York, NY, USA, 468-471. https://doi.org/ 10.1145/2901739.2903508
Afnan Alsubaihin, Federica Sarro, Sue Black, and Licia Capra. 2019. Empirical comparison of text-based mobile apps similarity measurement techniques. Empirical Software Engineering 24 (12 2019). https://doi.org/10.1007/s10664-019-09726-5
Afnan Alsubaihin, Federica Sarro, Sue Black, L. Capra, Mark Harman, Yue Jia, and Y. Zhang. 2016. Clustering Mobile Apps Based on Mined Textual Features. 1-10. https://doi.org/10.1145/2961111.2962600
Enrique Amigó, Julio Gonzalo, Javier Artiles, and Felisa Verdejo. 2009. A comparison of extrinsic clustering evaluation metrics based on formal constraints. Information Retrieval 12, 4 (Aug 2009), 461-486. https://doi.org/10.1007/s10791-008-9066-8
Azmi Aminordin, Mohd Faizal Abdollah, Robiah Yusof, and Rabiah Ahmad. 2018. Preliminary Findings: Revising Developer Guideline Using Word Frequency for Identifying Apps Miscategorization. In Proceedings of the Second International Conference on the Future of ASEAN (ICoFA) 2017-Volume 2, Rizauddin Saian and Mohd Azwan Abbas (Eds.). Springer Singapore, Singapore, 123-131.
Dan Arp, Michael Spreitzenbarth, Malte Hubner, Hugo Gascon, and Konrad Rieck. 2014. DREBIN: Effective and Explainable Detection of Android Malware in Your Pocket. In Network and Distributed System Security Symposium.
Vitalii Avdiienko, Konstantin Kuznetsov, Isabelle Rommelfanger, Andreas Rau, Alessandra Gorla, and Andreas Zeller. 2017. Detecting Behavior Anomalies in Graphical User Interfaces. In Proceedings of the 39th International Conference on Software Engineering Companion (Buenos Aires, Argentina) (ICSE-C '17). IEEE Press, 201-203. https://doi.org/10.1109/ICSE-C.2017.130
Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics 5 (2017), 135-146. https://doi.org/10.1162/tacl_a_ 00051
Tossapon Boongoen and Natthakan Iam-On. 2018. Cluster ensembles: A survey of approaches with recent extensions and applications. Computer Science Review 28 (2018), 1-25. https://doi.org/10.1016/j.cosrev.2018.01.003
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. arXiv:2005.14165 [cs.CL]
T. Calinski and J Harabasz. 1974. A dendrite method for cluster analysis. Communications in Statistics 3, 1 (1974), 1-27. https://doi.org/10.1080/03610927408827101
Ning Chen, Steven Hoi, Shaohua Li, and Xiaokui Xiao. 2015. SimApp: A Framework for Detecting Similar Mobile Applications by Online Kernel Learning. WSDM 2015-Proceedings of the 8th ACM International Conference on Web Search and Data Mining. https://doi.org/10.1145/2684822.2685305
Nadia Daoudi, Jordan Samhi, Abdoul Kader Kabore, Kevin Allix, Tegawendé F. Bissyandé, and Jacques Klein. 2021. DexRay: A Simple, yet Effective Deep Learning Approach to Android Malware Detection Based on Image Representation of Bytecode. In Deployable Machine Learning for Security Defense, Gang Wang, Arridhana Ciptadi, and Ali Ahmadzadeh (Eds.). Springer International Publishing, Cham, 81-106.
David L. Davies and Donald W. Bouldin. 1979. A Cluster Separation Measure. IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-1, 2 (1979), 224-227. https://doi.org/10.1109/TPAMI.1979.4766909
Anthony Desnos. 2011. https://github.com/androguard/androguard.
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805 [cs.CL]
Fahimeh Ebrahimi, Miroslav Tushev, and Anas Mahmoud. 2021. Classifying Mobile Applications Using Word Embeddings. ACM Trans. Softw. Eng. Methodol. 31, 2, Article 20 (nov 2021), 30 pages. https://doi.org/10.1145/3474827
Wenhao Fan, Yeh g Chen, Yuan'an Liu, and Fan Wu. 2019. DroidARA: Android Application Automatic Categorization Based on API Relationship Analysis. IEEE Access 7 (2019), 157987-157996.
E. B. Fowlkes and C. L. Mallows. 1983. A Method for Comparing Two Hierarchical Clusterings. J. Amer. Statist. Assoc. 78, 383 (1983), 553-569. https://doi.org/10. 1080/01621459.1983.10478008
Brendan J. Frey and Delbert Dueck. 2007. Clustering by Passing Messages Between Data Points. Science 315, 5814 (2007), 972-976. https://doi.org/10.1126/ science.1136800 arXiv:https://www.science.org/doi/pdf/10.1126/science.1136800
Alessandra Gorla, Ilaria Tavecchia, Florian Gross, and Andreas Zeller. 2014. Checking App Behavior Against App Descriptions. In ICSE '14: Proceedings of the 2014 International Conference on Software Engineering (Hyderabad, India). ACM Press, 292-302.
Nils Gruschka, Luigi Lo Iacono, and Jan Tolsdorf. 2018. Classification of Android App Permissions.
Lawrence Hubert and Phipps Arabie. 1985. Comparing partitions. Journal of Classification 2, 1 (Dec 1985), 193-218. https://doi.org/10.1007/BF01908075 Company: Springer Distributor: Springer Institution: Springer Label: Springer number: 1 publisher: Springer-Verlag.
IEEEXplore. 2023. https://ieeexplore.ieee.org/Xplore/home.jsp.
Ian T Jolliffe and Jorge Cadima. 2016. Principal component analysis: a review and recent developments. Philosophical transactions of the royal society A: Mathematical, Physical and Engineering Sciences 374, 2065 (2016), 20150202.
Konstantin Kuznetsov, Vitalii Avdiienko, Alessandra Gorla, and Andreas Zeller. 2016. Checking App User Interfaces against App Descriptions. In Proceedings of the International Workshop on App Market Analytics (Seattle, WA, USA) (WAMA 2016). Association for Computing Machinery, New York, NY, USA, 1-7. https: //doi.org/10.1145/2993259.2993265
Anran Li, Shuangshuang Xue, Xiang-Yang Li, Lan Zhang, and Jianwei Qian. 2022. AppDNA: Profiling App Behavior via Deep-Learning Function Call Graphs. IEEE Transactions on Emerging Topics in Computing 10, 1 (2022), 414-427. https: //doi.org/10.1109/TETC.2020.3026335
ACM Digital Library. 2023. https://dl.acm.org/.
S. Lloyd. 1982. Least squares quantization in PCM. IEEE Transactions on Information Theory 28, 2 (1982), 129-137. https://doi.org/10.1109/TIT.1982.1056489
Arvind Mahindru and Amrit Sangal. 2021. SemiDroid: a behavioral malware detector based on unsupervised machine learning techniques using feature selection approaches. International Journal of Machine Learning and Cybernetics 12 (05 2021). https://doi.org/10.1007/s13042-020-01238-9
William Martin, Federica Sarro, Yue Jia, Yuanyuan Zhang, and Mark Harman. 2017. A Survey of App Store Analysis for Software Engineering. IEEE Transactions on Software Engineering 43, 9 (2017), 817-847. https://doi.org/10.1109/TSE.2016. 2630689
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. In Advances in Neural Information Processing Systems, C.J. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K.Q. Weinberger (Eds.), Vol. 26. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2013/file/ 9aa42b31882ec039965f3c4923ce901b-Paper.pdf
A. Mohammad Ebrahimi, M. Saber Gholami, Saeedeh Momtazi, M. R. Meybodi, and A. Abdollahzadeh Barforoush. 2020. Correlation Analysis of Applications' Features: A Case Study on Google Play. In Data Science: From Research to Application, Mahdi Bohlouli, Bahram Sadeghi Bigham, Zahra Narimani, Mahdi Vasighi, and Ebrahim Ansari (Eds.). Springer International Publishing, Cham, 202-216.
Annamalai Narayanan, Charlie Soh, Lihui Chen, Yang Liu, and Lipo Wang. 2018. Apk2vec: Semi-Supervised Multi-view Representation Learning for Profiling Android Applications. In 2018 IEEE International Conference on Data Mining (ICDM). 357-366. https://doi.org/10.1109/ICDM.2018.00051
Arvind Neelakantan, Tao Xu, Raul Puri, Alec Radford, Jesse Michael Han, Jerry Tworek, Qiming Yuan, Nikolas Tezak, Jong Wook Kim, Chris Hallacy, Johannes Heidecke, Pranav Shyam, Boris Power, Tyna Eloundou Nekoul, Girish Sastry, Gretchen Krueger, David Schnurr, Felipe Petroski Such, Kenny Hsu, Madeleine Thompson, Tabarak Khan, Toki Sherbakov, Joanne Jang, Peter Welinder, and Lilian Weng. 2022. Text and Code Embeddings by Contrastive Pre-Training. arXiv:2201.10005 [cs.CL]
Robin Nix and Jian Zhang. 2017. Classification of Android apps and malware using deep neural networks. In 2017 International Joint Conference on Neural Networks (IJCNN). 1871-1878. https://doi.org/10.1109/IJCNN.2017.7966078
Babatunde Olabenjo. 2016. Applying Naive Bayes Classification to Google Play Apps Categorization. arXiv:1608.08574 [cs.LG]
OpenAI. 2022. https://openai.com/blog/introducing-text-and-code-embeddings.
OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL]
OpenAI. 2023. https://github.com/openai/openai-cookbook/blob/main/examples/ How_to_count_tokens_with_tiktoken.ipynb.
OpenAI. 2023. https://openai.com/blog/new-and-improved-embedding-model.
OpenAI. 2023. https://platform.openai.com/docs/guides/embeddings.
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825-2830.
Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. GloVe: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Doha, Qatar, 1532-1543. https://doi.org/10.3115/v1/ D14-1162
Jathushan Rajasegaran, Naveen Karunanayake, Ashanie Gunathillake, Suranga Seneviratne, and Guillaume Jourjon. 2019. A Multi-Modal Neural Embeddings Approach for Detecting Mobile Counterfeit Apps. In TheWorld WideWeb Conference (San Francisco, CA, USA) (WWW '19). Association for Computing Machinery, New York, NY, USA, 3165-3171. https://doi.org/10.1145/3308558.3313427
William M. Rand. 1971. Objective Criteria for the Evaluation of Clustering Methods. J. Amer. Statist. Assoc. 66, 336 (1971), 846-850. https://doi.org/10.1080/ 01621459.1971.10482356
Peter J. Rousseeuw. 1987. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20 (1987), 53-65. https://doi.org/10.1016/0377-0427(87)90125-7
Mukund Rungta, Praneet Prabhakar Sherki, Mehak Preet Dhaliwal, Hemant Tiwari, and Vanraj Vala. 2020. Two-Phase Multimodal Neural Network for App Categorization using APK Resources. In 2020 IEEE 14th International Conference on Semantic Computing (ICSC). 162-165. https://doi.org/10.1109/ICSC.2020.00032
Google Scholar. 2023. https://scholar.google.com/.
Sevil Sen and Burcu Can. 2021. Android Security using NLP Techniques: A Review. arXiv:2107.03072 [cs.CR]
Md. Shamsujjoha, John Grundy, Li Li, Hourieh Khalajzadeh, and Qinghua Lu. 2021. Checking App Behavior Against App Descriptions: What If There are No App Descriptions?. In 2021 IEEE/ACM 29th International Conference on Program Comprehension (ICPC). 422-432. https://doi.org/10.1109/ICPC52881.2021.00050
Tiezhu Sun, Weiguo Pian, Nadia Daoudi, Kevin Allix, Tegawendé F. Bissyandé, and Jacques Klein. 2023. LaFiCMIL: Rethinking Large File Classification from the Perspective of Correlated Multiple Instance Learning. arXiv:2308.01413 [cs.CL]
Wenqi Sun, Songyang Wu, and Zhi Xue. 2020. Clustering Mobile Apps based on Design and Manufacturing Genre. In 2020 IEEE 6th International Conference on Computer and Communications (ICCC). 1956-1960. https://doi.org/10.1109/ ICCC51575.2020.9344944
Didi Surian, Suranga Seneviratne, Aruna Seneviratne, and Sanjay Chawla. 2017. App Miscategorization Detection: A Case Study on Google Play. IEEE Transactions on Knowledge and Data Engineering 29, 8 (2017), 1591-1604. https://doi.org/10. 1109/TKDE.2017.2686851
Virus Total. 2020. Virus total free online virus, malware and url scanner. https: //www.virustotal.com/en
Saarland University. 2014. https://www.st.cs.uni-saarland.de/appmining/.
Wei Wang, Yuanyuan Li, Xing Wang, Jiqiang Liu, and Xiangliang Zhang. 2018. Detecting Android malicious apps and categorizing benign apps with ensemble of classifiers. Future Generation Computer Systems 78 (2018), 987-994. https: //doi.org/10.1016/j.future.2017.01.019
Ryszard Wisniewski. 2016. https://ibotpeaches.github.io/Apktool/.
Xinli Yang, David Lo, Li Li, Xin Xia, Tegawendé F. Bissyandé, and Jacques Klein. 2017. Characterizing malicious Android apps by mining topic-specific data flow signatures. Information and Software Technology 90 (2017), 27-39. https: //doi.org/10.1016/j.infsof.2017.04.007
Chengpeng Zhang, Haoyu Wang, Ran Wang, Yao Guo, and Guoai Xu. 2018. Rechecking App Behavior against App Description in the Context of Third-party Libraries. 665-710. https://doi.org/10.18293/SEKE2018-180