[en] We discuss the capability of a new feature set for malware detection based on potential component leaks (PCLs). PCLs are defined as sensitive data-flows that involve Android inter-component communications. We show that PCLs are common in Android apps and that malicious applications indeed manipulate significantly more PCLs than benign apps. Then, we evaluate a machine learning-based approach relying on PCLs. Experimental validation show high performance with 95% precision for identifying malware, demonstrating that PCLs can be used for discriminating malicious apps from benign apps.
By further investigating the generalization ability of this feature set, we highlight an issue often overlooked in the Android malware detection community: Qualitative aspects of training datasets have a strong impact on a malware detector’s performance. Furthermore, this impact cannot be overcome by simply increasing the Quantity of training material.