Abstract :
[en] Android malware detection and family classification have been extensively studied, yet localizing the exact malicious payloads within a detected sample remains a challenging and labor-intensive task. We propose RAML, a novel Retrieval-Augmented Malicious payload Localization pipeline inspired by retrieval-augmented generation (RAG), which leverages large language models (LLMs) to bridge high-level behavior descriptions and low-level Smali code. RAML generates class-level descriptions from Smali code, embeds them into a vector database, and performs semantic retrieval via similarity search. Matched candidates are re-ranked with LLM assistance, followed by method-level LLM analysis to precisely identify malicious methods and provide insightful role explanations. Preliminary results show that RAML effectively localizes corresponding malicious payloads based on behavioral descriptions, narrows the analysis scope, and reduces manual effort—offering a promising direction for automated malware forensics.
FnR Project :
FNR16344458 - REPROCESS - Pre And Post Processing For Comprehensive And Practical Android App Static Analysis, 2021 (01/07/2022-30/06/2025) - Jacques Klein
FNR18154263 - UNLOCK - Breaking The Barriers Of Android Dynamic Analysis With Static Analysis, 2023 (01/01/2024-31/12/2026) - Jacques Klein
Scopus citations®
without self-citations
0