2015 • In Proc. of the 19th Intl. Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP'15), part of the 29th IEEE/ACM Intl. Parallel and Distributed Processing Symposium (IPDPS 2015)
[en] At the advent of a wished (or forced) convergence between High Performance Computing (HPC) platforms, stand-alone accelerators and virtualized resources from Cloud Computing (CC) systems, this ar- ticle unveils the job prediction component of the Evalix project. This framework aims at an improved efficiency of the underlying Resource and Job Management System (RJMS) within heterogeneous HPC facil- ities by the automatic evaluation and characterization of the submitted workload. The objective is not only to better adapt the scheduled jobs to the available resource capabilities, but also to reduce the energy costs. For that purpose, we collected the resource consumption of all the jobs executed on a production cluster for a period of three months. Based on the analysis then on the classification of the jobs, we computed a resource consumption model. The objective is to train a set of predictors based on the aforementioned model, that will give the estimated CPU, mem- ory and IO used by the jobs. The analysis of the resource consumption highlighted that different classes of jobs have different kinds of resource needs and the classification of the jobs enabled to characterize several application patterns of the users. We also discovered that several users whose resource usage on the cluster is considered as too low, are respon- sible for a loss of CPU time on the order of five years over the considered three month period. The predictors, trained from a supervised learning algorithm, were able to correctly classify a large set of data. We evalu- ated them with three performance indicators that gave an information retrieval rate of 71% to 89% and a probability of accurate prediction be- tween 0.7 and 0.8. The results of this work will be particularly helpful for designing an optimal partitioning of the considered heterogeneous plat- form, taking into consideration the real application needs and thus lead- ing to energy savings and performance improvements. Moreover, apart from the novelty of the contribution, the accurate classification scheme offers new insights of users behavior of interest for the design of future HPC platforms.
Research center :
ULHPC - University of Luxembourg: High Performance Computing
Disciplines :
Computer science
Author, co-author :
EMERAS, Joseph ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)
VARRETTE, Sébastien ; University of Luxembourg > Faculty of Science, Technology and Communication (FSTC) > Computer Science and Communications Research Unit (CSC)
GUZEK, Mateusz ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)
BOUVRY, Pascal ; University of Luxembourg > Faculty of Science, Technology and Communication (FSTC) > Computer Science and Communications Research Unit (CSC)
External co-authors :
no
Language :
English
Title :
Evalix: Classification and Prediction of Job Resource Consumption on HPC Platforms
Publication date :
May 2015
Event name :
19th Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP)
Event place :
Hyderabad, India
Event date :
29-05-2015
Audience :
International
Main work title :
Proc. of the 19th Intl. Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP'15), part of the 29th IEEE/ACM Intl. Parallel and Distributed Processing Symposium (IPDPS 2015)