WaterLLMarks: In-data User Tracing for Distributed LLMaaS Environments

[en] Due to their size and cost of operation, Large Language Models (LLMs) are often deployed in the cloud on Large Language Model as a Service (LLMaaS) platforms. Following the trends in cloudbased services and multi-tenancy, these platforms are becoming increasingly distributed. Monitoring such environments is challenging, especially without access to the underlying infrastructure. Moreover, applications may rely on multiple services from different providers, further complicating end-to-end observability. In this paper, we propose a novel type of telemetry, termed in-data telemetry, where services are outsourced across multiple LLMaaS providers. We illustrate its application in tracing user interactions with LLMs by embedding user-specific watermarks in the text. These watermarks propagate transparently through multiple services with minimal impact on the semantics of the generated content. We evaluate our approach in a Retrieval-Assisted Generation (RAG) scenario and showcase its potential in a demonstrator.

Disciplines :

Computer science

Author, co-author :

LAVAUR, Léo ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SEDAN

FRANCOIS, Jérôme ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SEDAN

External co-authors :

Language :

English

Title :

WaterLLMarks: In-data User Tracing for Distributed LLMaaS Environments

Publication date :

2025

Event name :

45th IEEE International Conference on Distributed Computing Systems (ICDCS)

Event place :

Glasgow, United Kingdom

Event date :

Jul. 2025

Audience :

International

Main work title :

Proceedings of the 45th IEEE International Conference on Distributed Computing Systems Workshops (ICDCSW)

Publisher :

IEEE

Peer reviewed :

Peer reviewed

Available on ORBilu :

since 29 September 2025

Statistics

Number of views

104 (4 by Unilu)

Number of downloads

97 (5 by Unilu)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

OpenAlex citations

Bibliography

M. Yan, P. Castro, P. Cheng, and V. Ishakian, Building a chatbot with serverless computing. In 1st International Workshop on Mashups of Things and APIs, ACM, 2016.
Y. Liu, S. K. Lo, Q. Lu, L. Zhu, D. Zhao, X. Xu, S. Harrer, and J. Whittle, Agent design pattern catalogue: A collection of architectural patterns for foundation model based agents. Journal of Systems and Software, 2025.
B. H. Sigelman, L. A. Barroso, M. Burrows, P. Stephenson, M. Plakal, D. Beaver, S. Jaspan, and C. Shanbhag, Dapper, a Large-Scale Distributed Systems Tracing Infrastructure., 2010.
D. Gomez Blanco, Opentelemetry fundamentals. In Practical OpenTelemetry: Adopting Open Observability Standards Across Your Organization. 2023.
M. Cinque, R. D. Corte, and A. Pecchia, Microservices monitoring with event logs and black box execution tracing. In IEEE World Congress on Services (SERVICES), 2021.
J. Shen, H. Zhang, Y. Xiang, et al., Network-centric distributed tracing with deepflow: Troubleshooting your microservices in zero code. In ACM SIGCOMM Conference, New York, NY, USA: ACM, 2023.
A. Liu, L. Pan, Y. Lu, J. Li, X. Hu, X. Zhang, L. Wen, I. King, H. Xiong, and P. Yu, A Survey of Text Watermarking in the Era of Large Language Models. ACM Comput. Surv., no. 2, 2024.
S. G. Rizzo, F. Bertini, and D. Montesi, Contentpreserving Text Watermarking through Unicode Homoglyph Substitution. In Proceedings of the 20th International Database Engineering & Applications Symposium, 2016.
J.-P. Aumasson and D. J. Bernstein, SipHash: A Fast Short-Input PRF. In Progress in Cryptology-INDOCRYPT 2012, 2012.
D. Kim, B. Kim, D. Han, and M. Eibich. "AutoRAG: Automated Framework for optimization of Retrieval Augmented Generation Pipeline." (2024), [Online]. Available: Http://arxiv.org/abs/2410.20878.
W. Kwon, Z. Li, S. Zhuang, Y. Sheng, L. Zheng, C. H. Yu, J. E. Gonzalez, H. Zhang, and I. Stoica. "Efficient Memory Management for Large Language Model Serving with PagedAttention." (2023), [Online]. Available: Http://arxiv.org/abs/2309.06180.
R. Friel, M. Belyi, and A. Sanyal. "RAGBench: Explainable Benchmark for Retrieval-Augmented Generation Systems." (2025), [Online]. Available: Http://arxiv.org/abs/2407.11005.
S. Es, J. James, L. Espinosa-Anke, and S. Schockaert. "RAGAS: Automated Evaluation of Retrieval Augmented Generation." (2023), [Online]. Available: Http://arxiv.org/abs/2309.15217.