Android Malware Detection Using BERT

[en] In this paper, we propose two empirical studies to (1) detect Android malware and (2) classify Android malware into families. We rst (1) reproduce the results of MalBERT using BERT models learning with Android application's manifests obtained from 265k applications (vs. 22k for MalBERT) from the AndroZoo dataset in order to detect malware. The results of the MalBERT paper are excellent and hard to believe as a manifest only roughly represents an application, we therefore try to answer the following questions in this paper. Are the experiments from MalBERT reproducible? How important are Permissions for mal- ware detection? Is it possible to keep or improve the results by reducing the size of the manifests? We then (2) investigate if BERT can be used to classify Android malware into families. The results show that BERT can successfully di erentiate malware/goodware with 97% accuracy. Further- more BERT can classify malware families with 93% accuracy. We also demonstrate that Android permissions are not what allows BERT to successfully classify and even that it does not actually need it.

Research center :

Interdisciplinary Centre for Security, Reliability and Trust (SnT) > Security Design and Validation Research Group (SerVal)

Disciplines :

Computer science

Author, co-author :

SOUANI, Badr ; University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)

KHANFIR, Ahmed ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SerVal

BARTEL, Alexandre ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)

ALLIX, Kevin ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > TruX

LE TRAON, Yves ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SerVal

External co-authors :

yes

Language :

English

Title :

Android Malware Detection Using BERT

Alternative titles :

[en] Android Malware Detection Using BERT

Publication date :

24 September 2022

Event name :

ACNS 2022: Applied Cryptography and Network Security Workshops

Event organizer :

ACNS

Event place :

Rome, Italy

Event date :

June 20–23, 2022

By request :

Yes

Audience :

International

Main work title :

Applied Cryptography and Network Security Workshops

Main work alternative title :

[en] Applied Cryptography and Network Security Workshops

Author, co-author :

Jianying, Zhou

Publisher :

Springer, Berlin, Germany

ISBN/EAN :

978-3-031-16815-4

Collection name :

LNCS 13285

Pages :

575–591

Peer reviewed :

Peer reviewed

Focus Area :

Security, Reliability and Trust

Additional URL :

https://link.springer.com/chapter/10.1007/978-3-031-16815-4_31

Name of the research project :

Android malware detection using BERT

Funders :

University of Luxembourg - UL

Available on ORBilu :

since 04 November 2022

Statistics

Number of views

284 (27 by Unilu)

Number of downloads

1032 (21 by Unilu)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

OpenCitations

OpenAlex citations

Bibliography

The free encyclopedia. https://www.wikipedia.org/
Google play store. https://play.google.com/
Tensorflow. https://www.tensorflow.org/
Alazab, M., Alazab, M., Shalaginov, A., Mesleh, A., Awajan, A.: Intelligent mobile malware detection using permission requests and API calls. Future Gener. Comput. Syst. 107, 509–521 (2020). https://doi.org/10.1016/j.future.2020.02.002, https://www.sciencedirect.com/science/article/pii/S0167739X19321223
Allix, K., Bissyandé, T.F., Klein, J., Le Traon, Y.: AndroZoo: collecting millions of android apps for the research community. In: 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR), pp. 468–471. IEEE (2016)
Alsoghyer, S., Almomani, I.: On the effectiveness of application permissions for android ransomware detection. In: 2020 6th Conference on Data Science and Machine Learning Applications (CDMA), pp. 94–99 (2020). https://doi.org/10. 1109/CDMA47397.2020.00022
Alzaylaee, M.K., Yerima, S.Y., Sezer, S.: DL-droid: deep learning based android malware detection using real devices. Comput. Secur. 89, 101663 (2020). https://doi.org/10.1016/j.cose.2019.101663, https://www.sciencedirect. com/science/article/pii/S0167404819300161
Arora, A., Peddoju, S.K., Conti, M.: PermPair: android malware detection using permission pairs. IEEE Trans. Inf. Forensics Secur. 15, 1968–1982 (2020). https://doi.org/10.1109/TIFS.2019.2950134
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding, pp. 4171–4186 (2019). https://doi.org/10.18653/v1/N19-1423, https://aclanthology.org/N19-1423
Feng, Z., et al.: CodeBERT: a pre-trained model for programming and natural languages. arXiv e-prints arXiv:2002.08155 (2020)
He, P., Liu, X., Gao, J., Chen, W.: DeBERTa: decoding-enhanced BERT with Disentangled Attention. arXiv e-prints arXiv:2006.03654 (2020)
Jeffrey, M., Nathan, H., William, G., Ryan, B.: Machine learning-based android malware detection using manifest permissions (2021). https://doi.org/10.24251/HICSS.2021.839
Jimenez, M., Rwemalika, R., Papadakis, M., Sarro, F., Le Traon, Y., Harman, M.: The importance of accounting for real-world labelling when predicting software vulnerabilities. In: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 695–705 (2019)
Jin, D., Jin, Z., Zhou, J.T., Szolovits, P.: Is BERT really robust? Natural language attack on text classification and entailment. arXiv preprint arXiv:1907.11932 (2019)
Karbab, E.B., Debbabi, M., Derhab, A., Mouheb, D.: MalDozer: automatic framework for android malware detection using deep learning. Digit. Invest. 24, S48–S59 (2018). https://doi.org/10.1016/j.diin.2018.01.007, https://www. sciencedirect.com/science/article/pii/S1742287618300392
Kim, T., Kang, B., Rho, M., Sezer, S., Im, E.G.: A multimodal deep learning method for android malware detection using various features. IEEE Trans. Inf. Forensics Secur. 14(3), 773–788 (2019). https://doi.org/10.1109/TIFS.2018. 2866319
Lee, Y., Saxe, J., Harang, R.: CATBERT: context-aware tiny BERT for detecting social engineering emails. arXiv e-prints arXiv:2010.03484 (2020)
Li, J., Sun, L., Yan, Q., Li, Z., Srisa-an, W., Ye, H.: Significant permission identification for machine-learning-based android malware detection. IEEE Trans. Industr. Inf. 14(7), 3216–3225 (2018). https://doi.org/10.1109/TII.2017.2789219
Liu, K., Xu, S., Xu, G., Zhang, M., Sun, D., Liu, H.: A review of android malware detection approaches based on machine learning. IEEE Access 8, 124579–124607 (2020)
Liu, K., Xu, S., Xu, G., Zhang, M., Sun, D., Liu, H.: A review of android malware detection approaches based on machine learning. IEEE Access 8, 124579–124607 (2020). https://doi.org/10.1109/ACCESS.2020.3006143
Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. arXiv e-prints arXiv:1907.11692 (2019)
Oak, R., Du, M., Yan, D., Takawale, H., Amit, I.: Malware detection on highly imbalanced data through sequence modeling. In: Proceedings of the 12th ACM Workshop on Artificial Intelligence and Security, AISec 2019, pp. 37–48. Association for Computing Machinery, New York (2019). https://doi.org/10.1145/3338501.3357374
Peiravian, N., Zhu, X.: Machine learning for android malware detection using permission and API calls, pp. 300–305 (2013). https://doi.org/10.1109/ICTAI.2013. 53
Rahali, A., Akhloufi, M.A.: MalBERT: using transformers for cybersecurity and malicious software detection (2021)
Sebastián, M., Rivera, R., Kotzias, P., Caballero, J.: AVclass: a tool for massive malware labeling. In: Monrose, F., Dacier, M., Blanc, G., Garcia-Alfaro, J. (eds.) RAID 2016. LNCS, vol. 9854, pp. 230–253. Springer, Cham (2016). https://doi. org/10.1007/978-3-319-45719-2 11
Sun, C., Qiu, X., Xu, Y., Huang, X.: How to fine-tune BERT for text classification? arXiv e-prints arXiv:1905.05583 (2019)
Sun, T., Daoudi, N., Allix, K., Bissyandé, T.F.: Android malware detection: looking beyond Dalvik bytecode. In: 2021 36th IEEE/ACM International Conference on Automated Software Engineering Workshops (ASEW), pp. 34–39. IEEE (2021)
Vaswani, A., et al.: Attention is all you need. 30 (2017). https://proceedings. neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
Wu, D.J., Mao, C.H., Wei, T.E., Lee, H.M., Wu, K.P.: DroidMat: android malware detection through manifest and API calls tracing, pp. 62–69 (2012). https://doi. org/10.1109/AsiaJCIS.2012.18
Xu, H., Liu, B., Shu, L., Yu, P.S.: BERT post-training for review reading comprehension and aspect-based sentiment analysis. arXiv e-prints arXiv:1904.02232 (2019)
Xu, Z., Fang, X., Yang, G.: Malbert: a novel pre-training method for malware detection. Comput. Secur. 111, 102458 (2021). https://doi.org/10.1016/j.cose.2021. 102458, https://www.sciencedirect.com/science/article/pii/S0167404821002820
Yang, W., et al.: End-to-end open-domain question answering with BERTserini. arXiv e-prints arXiv:1902.01718 (2019)
Yuan, Z., Lu, Y., Wang, Z., Xue, Y.: Droid-sec: deep learning in android malware detection. SIGCOMM Comput. Commun. Rev. 44(4), 371–372 (2014). https://doi.org/10.1145/2740070.2631434
Zhu, J., et al.: Incorporating BERT into neural machine translation. arXiv e-prints arXiv:2002.06823 (2020)