Abstract :
[en] Due to their size and cost of operation, Large Language Models (LLMs) are often deployed in the cloud on Large Language Model as a Service (LLMaaS) platforms. Following the trends in cloudbased services and multi-tenancy, these platforms are becoming increasingly distributed. Monitoring such environments is challenging, especially without access to the underlying infrastructure. Moreover, applications may rely on multiple services from different providers, further complicating end-to-end observability. In this paper, we propose a novel type of telemetry, termed in-data telemetry, where services are outsourced across multiple LLMaaS providers. We illustrate its application in tracing user interactions with LLMs by embedding user-specific watermarks in the text. These watermarks propagate transparently through multiple services with minimal impact on the semantics of the generated content. We evaluate our approach in a Retrieval-Assisted Generation (RAG) scenario and showcase its potential in a demonstrator.
Scopus citations®
without self-citations
0