RAML: Toward Retrieval-Augmented Localization of Malicious Payloads in Android Apps

SUN, Tiezhu; ALECCI, Marco; SONG, Yewei; TANG, Xunzhu; KIM, Kisub; SAMHI, Jordan; BISSYANDE, Tegawendé François d Assise; KLEIN, Jacques

doi:10.1109/ASE63991.2025.00351

Download

Paper published in a book (Scientific congresses, symposiums and conference proceedings)

RAML: Toward Retrieval-Augmented Localization of Malicious Payloads in Android Apps

SUN, Tiezhu; ALECCI, Marco; SONG, Yewei et al.

2025 • In The 40th IEEE/ACM International Conference on Automated Software Engineering, ASE 2025

Peer reviewed

Permalink
https://hdl.handle.net/10993/65747

DOI
10.1109/ASE63991.2025.00351

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

_ASE_2025__RAML__Toward_Retrieval_Augmented_Localization_of_Malicious_Payloads_in_Android_Apps.pdf

Author postprint (1.15 MB)

Download

All documents in ORBilu are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

Android Malware Analysis; Malicious Payload Localization; Retrieval-Augmented Generation

Abstract :

[en] Android malware detection and family classification have been extensively studied, yet localizing the exact malicious payloads within a detected sample remains a challenging and labor-intensive task. We propose RAML, a novel Retrieval-Augmented Malicious payload Localization pipeline inspired by retrieval-augmented generation (RAG), which leverages large language models (LLMs) to bridge high-level behavior descriptions and low-level Smali code. RAML generates class-level descriptions from Smali code, embeds them into a vector database, and performs semantic retrieval via similarity search. Matched candidates are re-ranked with LLM assistance, followed by method-level LLM analysis to precisely identify malicious methods and provide insightful role explanations. Preliminary results show that RAML effectively localizes corresponding malicious payloads based on behavioral descriptions, narrows the analysis scope, and reduces manual effort—offering a promising direction for automated malware forensics.

Disciplines :

Computer science

Author, co-author :

SUN, Tiezhu ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > TruX

ALECCI, Marco ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > TruX

SONG, Yewei ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > TruX

TANG, Xunzhu ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > TruX

KIM, Kisub ; DGIST

SAMHI, Jordan ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > TruX

BISSYANDE, Tegawendé François d Assise ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > TruX

KLEIN, Jacques ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > TruX

External co-authors :

yes

Language :

English

Title :

RAML: Toward Retrieval-Augmented Localization of Malicious Payloads in Android Apps

Publication date :

16 November 2025

Event name :

The 40th IEEE/ACM International Conference on Automated Software Engineering, ASE 2025

Event date :

16 - 20 November 2025

Audience :

International

Main work title :

The 40th IEEE/ACM International Conference on Automated Software Engineering, ASE 2025

Publisher :

IEEE/ACM

Peer reviewed :

Peer reviewed

FnR Project :

FNR16344458 - REPROCESS - Pre And Post Processing For Comprehensive And Practical Android App Static Analysis, 2021 (01/07/2022-30/06/2025) - Jacques Klein
FNR18154263 - UNLOCK - Breaking The Barriers Of Android Dynamic Analysis With Static Analysis, 2023 (01/01/2024-31/12/2026) - Jacques Klein

Available on ORBilu :

since 09 September 2025

Statistics

Number of views

274 (10 by Unilu)

Number of downloads

294 (20 by Unilu)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

OpenAlex citations

Bibliography

A. Turner, "How many android users are there global and us statistics (2025)," https://www.bankmycell.com/blog/ how-many-android-users-are-there, 2025, accessed: June 2025.
P. Faruki, R. Bhan, V. Jain, S. Bhatia, N. El Madhoun, and R. Pamula, "A survey and evaluation of android-based malware evasion techniques and detection frameworks," Information, vol. 14, no. 7, p. 374, 2023.
A. Ruggia, D. Nisi, S. Dambra, A. Merlo, D. Balzarotti, and S. Aonzo, "Unmasking the veiled: A comprehensive analysis of android evasive malware," in Proceedings of the 19th ACM Asia Conference on Computer and Communications Security, 2024, pp. 383-398.
D. Arp, M. Spreitzenbarth, M. Hubner, H. Gascon, K. Rieck, and C. Siemens, "Drebin: Effective and explainable detection of android malware in your pocket." in Ndss, vol. 14, no. 1, 2014, pp. 23-26.
Y. Wu, X. Li, D. Zou, W. Yang, X. Zhang, and H. Jin, "Malscan: Fast market-wide mobile malware scanning by social-network centrality analysis," in 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2019, pp. 139-150.
N. Daoudi, J. Samhi, A. K. Kabore, K. Allix, T. F. Bissyandé, and J. Klein, "Dexray: A simple, yet effective deep learning approach to android malware detection based on image representation of bytecode," in Deployable Machine Learning for Security Defense: Second International Workshop, MLHat 2021, Virtual Event, August 15, 2021, Proceedings 2. Springer, 2021, pp. 81-106.
T. Sun, N. Daoudi, K. Allix, and T. F. Bissyandé, "Android malware detection: Looking beyond dalvik bytecode," in 2021 36th IEEE/ACM International Conference on Automated Software Engineering Workshops (ASEW). IEEE, 2021, pp. 34-39.
T. Sun, N. Daoudi, K. Kim, K. Allix, T. F. Bissyandé, and J. Klein, "Detectbert: Towards full app-level representation learning to detect android malware," in Proceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, 2024, pp. 420-426.
T. Sun, N. Daoudi, K. Allix, J. Samhi, K. Kim, X. Zhou, A. K. Kabore, D. Kim, D. Lo, T. F. Bissyandé et al., "Android malware detection based on novel representations of apps," in Malware: Handbook of Prevention and Detection. Springer, 2024, pp. 197-212.
F. Alswaina and K. Elleithy, "Android malware family classification and analysis: Current status and future directions," Electronics, vol. 9, no. 6, p. 942, 2020.
C. Ding, N. Luktarhan, B. Lu, and W. Zhang, "A hybrid analysis-based approach to android malware family classification," Entropy, vol. 23, no. 8, p. 1009, 2021.
H.-I. Kim, M. Kang, S.-J. Cho, and S.-I. Choi, "Efficient deep learning network with multi-streams for android malware family classification," IEEE Access, vol. 10, pp. 5518-5532, 2021.
S. Freitas, R. Duggal, and D. H. Chau, "Malnet: A large-scale image database of malicious software," in Proceedings of the 31st ACM International Conference on Information & Knowledge Management, 2022, pp. 3948-3952.
T. Sun, N. Daoudi, W. Pian, K. Kim, K. Allix, T. F. Bissyandé, and J. Klein, "Temporal-incremental learning for android malware detection," ACM Transactions on Software Engineering and Methodology, vol. 34, no. 4, pp. 1-30, 2025.
A. Narayanan, M. Chandramohan, L. Chen, and Y. Liu, "A multi-view context-aware approach to android malware detection and malicious code localization," Empirical Software Engineering, vol. 23, pp. 1222-1274, 2018.
T. Sun, K. Allix, K. Kim, X. Zhou, D. Kim, D. Lo, T. F. Bissyandé, and J. Klein, "Dexbert: Effective, task-agnostic and fine-grained representation learning of android bytecode," IEEE Transactions on Software Engineering, vol. 49, no. 10, pp. 4691-4706, 2023.
X. Qian, X. Zheng, Y. He, S. Yang, and L. Cavallaro, "Lamd: Contextdriven android malware detection and classification with llms," arXiv preprint arXiv:2502.13055, 2025.
W. Zhao, J. Wu, and Z. Meng, "Apppoet: Large language model based android malware detection via multi-view prompt engineering," Expert Systems with Applications, vol. 262, p. 125546, 2025.
P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. Küttler, M. Lewis, W.-t. Yih, T. Rocktäschel et al., "Retrievalaugmented generation for knowledge-intensive nlp tasks," Advances in neural information processing systems, vol. 33, pp. 9459-9474, 2020.
J. Chen, H. Lin, X. Han, and L. Sun, "Benchmarking large language models in retrieval-augmented generation," in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 16, 2024, pp. 17 754-17 762.
L.Wang, H.Wang, R. He, R. Tao, G. Meng, X. Luo, and X. Liu, "Malradar: Demystifying android malware in the new era," Proceedings of the ACM on Measurement and Analysis of Computing Systems, vol. 6, no. 2, pp. 1-27, 2022.
E. Mariconti, L. Onwuzurike, P. Andriotis, E. De Cristofaro, G. Ross, and G. Stringhini, "Mamadroid: Detecting android malware by building markov chains of behavioral models," arXiv preprint arXiv:1612.04433, 2016.
J. Liu, J. Zeng, F. Pierazzi, L. Cavallaro, and Z. Liang, "Unraveling the key of machine learning solutions for android malware detection," arXiv preprint arXiv:2402.02953, 2024.
M. Alecci, J. Samhi, L. Li, T. F. Bissyande, and J. Klein, " Improving Logic Bomb Identification in Android Apps via Context-Aware Anomaly Detection," IEEE Transactions on Dependable and Secure Computing, vol. 21, no. 05, pp. 4735-4753, Sep. 2024. [Online]. Available: https://doi.ieeecomputersociety.org/10.1109/TDSC.2024.3358979
Dalvik executable format," https://source.android.com/docs/core/ runtime/dex-format, accessed: June 2025.
Apktool," https://apktool.org/, accessed: June 2025.
M. Alecci, N. Sannier, M. Ceci, S. Abualhaija, J. Samhi, D. Bianculli, T. F. d. A. BISSYANDE, and J. Klein, "Toward llm-driven gdpr compliance checking for android apps," in 33rd ACM International Conference on the Foundations of Software Engineering (FSE Companion'25), 2025.
Chroma, "Chroma: The ai-native open-source embedding database," https: //www.trychroma.com, 2023, accessed: July 2025.
OpenAI, "Gpt-4.1 api," https://openai.com/index/gpt-4-1/, 2025.