Article (Scientific journals)
Round-Based Mechanism and Job Packing with Model-Similarity-Based Policy for Scheduling DL Training in GPU Cluster
THANAPOL, Panissara; LAVANGNANANDA, Kittichai; LEPREVOST, Franck et al.
2024In Applied Sciences, 14 (6), p. 2349
Peer reviewed
 

Files


Full Text
applsci-14-02349.pdf
Publisher postprint (951.09 kB)
Download

All documents in ORBilu are protected by a user license.

Send to



Details



Keywords :
deep learning; deep learning training; distributed training; GPU cluster; job packing; round-based mechanism; similarity analysis; Materials Science (all); Instrumentation; Engineering (all); Process Chemistry and Technology; Computer Science Applications; Fluid Flow and Transfer Processes
Abstract :
[en] Graphics Processing Units (GPUs) are employed for their parallel processing capabilities, which are essential to train deep learning (DL) models with large datasets within a reasonable time. However, the diverse GPU architectures exhibit variability in training performance depending on DL models. Furthermore, factors such as the number of GPUs for distributed training and batch size significantly impact training efficiency. Addressing the variability in training performance and accounting for these influential factors are critical for optimising resource usage. This paper presents a scheduling policy for DL training tasks in a heterogeneous GPU cluster. It builds upon a model-similarity-based scheduling policy by implementing a round-based mechanism and job packing. The round-based mechanism allows the scheduler to adjust its scheduling decisions periodically, whereas job packing optimises GPU utilisation by fitting additional jobs into a GPU that trains a small model. Results show that implementing a round-based mechanism reduces the makespan by approximately 29%, compared to the scenario without it. Additionally, integrating job packing further decreases the makespan by 5%.
Disciplines :
Computer science
Author, co-author :
THANAPOL, Panissara ;  University of Luxembourg > Faculty of Science, Technology and Medicine > Department of Computer Science > Team Pascal BOUVRY
LAVANGNANANDA, Kittichai ;  University of Luxembourg
LEPREVOST, Franck  ;  University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)
GLAD, Arnaud ;  University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > HPC Platform
SCHLEICH, Julien  ;  University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > HPC Platform
BOUVRY, Pascal ;  University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)
External co-authors :
yes
Language :
English
Title :
Round-Based Mechanism and Job Packing with Model-Similarity-Based Policy for Scheduling DL Training in GPU Cluster
Publication date :
11 March 2024
Journal title :
Applied Sciences
ISSN :
2076-3417
eISSN :
2076-3417
Publisher :
Multidisciplinary Digital Publishing Institute (MDPI)
Volume :
14
Issue :
6
Pages :
2349
Peer reviewed :
Peer reviewed
Available on ORBilu :
since 25 May 2025

Statistics


Number of views
69 (3 by Unilu)
Number of downloads
25 (0 by Unilu)

Scopus citations®
 
2
Scopus citations®
without self-citations
2
OpenCitations
 
0
OpenAlex citations
 
2
WoS citations
 
1

Bibliography


Similar publications



Contact ORBilu