Reference : Evalix: Classification and Prediction of Job Resource Consumption on HPC Platforms
Scientific congresses, symposiums and conference proceedings : Paper published in a book
Engineering, computing & technology : Computer science
http://hdl.handle.net/10993/21136
Evalix: Classification and Prediction of Job Resource Consumption on HPC Platforms
English
Emeras, Joseph mailto [University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) >]
Varrette, Sébastien mailto [University of Luxembourg > Faculty of Science, Technology and Communication (FSTC) > Computer Science and Communications Research Unit (CSC)]
Guzek, Mateusz mailto [University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) >]
Bouvry, Pascal mailto [University of Luxembourg > Faculty of Science, Technology and Communication (FSTC) > Computer Science and Communications Research Unit (CSC)]
May-2015
Proc. of the 19th Intl. Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP'15), part of the 29th IEEE/ACM Intl. Parallel and Distributed Processing Symposium (IPDPS 2015)
IEEE Computer Society
Yes
International
19th Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP)
29-05-2015
Hyderabad
India
[en] At the advent of a wished (or forced) convergence between High Performance Computing (HPC) platforms, stand-alone accelerators and virtualized resources from Cloud Computing (CC) systems, this ar- ticle unveils the job prediction component of the Evalix project. This framework aims at an improved efficiency of the underlying Resource and Job Management System (RJMS) within heterogeneous HPC facil- ities by the automatic evaluation and characterization of the submitted workload. The objective is not only to better adapt the scheduled jobs to the available resource capabilities, but also to reduce the energy costs. For that purpose, we collected the resource consumption of all the jobs executed on a production cluster for a period of three months. Based on the analysis then on the classification of the jobs, we computed a resource consumption model. The objective is to train a set of predictors based on the aforementioned model, that will give the estimated CPU, mem- ory and IO used by the jobs. The analysis of the resource consumption highlighted that different classes of jobs have different kinds of resource needs and the classification of the jobs enabled to characterize several application patterns of the users. We also discovered that several users whose resource usage on the cluster is considered as too low, are respon- sible for a loss of CPU time on the order of five years over the considered three month period. The predictors, trained from a supervised learning algorithm, were able to correctly classify a large set of data. We evalu- ated them with three performance indicators that gave an information retrieval rate of 71% to 89% and a probability of accurate prediction be- tween 0.7 and 0.8. The results of this work will be particularly helpful for designing an optimal partitioning of the considered heterogeneous plat- form, taking into consideration the real application needs and thus lead- ing to energy savings and performance improvements. Moreover, apart from the novelty of the contribution, the accurate classification scheme offers new insights of users behavior of interest for the design of future HPC platforms.
University of Luxembourg: High Performance Computing - ULHPC
Researchers ; Professionals ; Students
http://hdl.handle.net/10993/21136
http://www.cs.huji.ac.il/~feit/parsched/jsspp15/

File(s) associated to this reference

Fulltext file(s):

FileCommentaryVersionSizeAccess
Limited access
BlankPage.pdfAuthor preprint10.75 kBRequest a copy

Bookmark and Share SFX Query

All documents in ORBilu are protected by a user license.