Abstract :
[en] In this paper, we propose two empirical studies to (1) detect
Android malware and (2) classify Android malware into families. We
rst (1) reproduce the results of MalBERT using BERT models learning
with Android application's manifests obtained from 265k applications
(vs. 22k for MalBERT) from the AndroZoo dataset in order to detect
malware. The results of the MalBERT paper are excellent and hard to
believe as a manifest only roughly represents an application, we therefore
try to answer the following questions in this paper. Are the experiments
from MalBERT reproducible? How important are Permissions for mal-
ware detection? Is it possible to keep or improve the results by reducing
the size of the manifests? We then (2) investigate if BERT can be used to
classify Android malware into families. The results show that BERT can
successfully di erentiate malware/goodware with 97% accuracy. Further-
more BERT can classify malware families with 93% accuracy. We also
demonstrate that Android permissions are not what allows BERT to
successfully classify and even that it does not actually need it.
Scopus citations®
without self-citations
3