Working paper (E-prints, Working papers and Research blog)
DCRuntime: Toward Efficiently Sharing CPU-GPU Architectures
MARCU, Ovidiu-Cristian; DANOY, Grégoire; BOUVRY, Pascal
2025
 

Files


Full Text
DCRuntime_orbilu.pdf
Author preprint (1.29 MB) Creative Commons License - Attribution, Non-Commercial, No Derivative
Download

All documents in ORBilu are protected by a user license.

Send to



Details



Abstract :
[en] Orchestrating distributed data movement and computation for large-scale, data-intensive applications (Big Data, ML/AI) on modern heterogeneous architectures (CPU, GPU) presents significant challenges. Current systems often rely on passive, application-driven (pull-based) data access, leading to inefficient resource utilization, particularly high GPU idle times (up to 70\%), complex manual memory management, and limited opportunities for global optimization and fault tolerance. This paper introduces DCRuntime, a vision for a unified, runtime-orchestrated system designed to actively manage both data and compute streaming. DCRuntime employs a proactive, push-based "compute-follows-data" execution model. It exposes two core abstractions: 1) distributed data streams, representing potentially unbounded sequences of (im)mutable buffers spanning cluster resources, and 2) global compute streams, enabling asynchronous task execution across multiple devices, tightly coupled with data availability. By delegating data sourcing, sinking, shuffling, and compute scheduling to the runtime, DCRuntime gains a global perspective to optimize data placement, minimize I/O stalls, mitigate interference and power jitter, and enable faster, application-aware fault recovery. This approach aims to abstract away low-level complexities, significantly improve resource utilization (especially for GPUs), enhance scalability, and provide a resilient foundation for demanding workloads on shared, heterogeneous high-performance computing infrastructure.
Disciplines :
Computer science
Author, co-author :
MARCU, Ovidiu-Cristian  ;  University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > PCOG
DANOY, Grégoire  ;  University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)
BOUVRY, Pascal ;  University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)
Language :
English
Title :
DCRuntime: Toward Efficiently Sharing CPU-GPU Architectures
Publication date :
03 April 2025
Available on ORBilu :
since 03 April 2025

Statistics


Number of views
160 (14 by Unilu)
Number of downloads
265 (42 by Unilu)

Bibliography


Similar publications



Contact ORBilu