Communication publiée dans un ouvrage (Colloques, congrès, conférences scientifiques et actes)
Management of an Academic HPC Research Computing Facility: The ULHPC Experience 2.0
VARRETTE, Sébastien; CARTIAUX, Hyacinthe; PETER, Sarah et al.
2022In 6th High Performance Computing and Cluster Technologies Conference (HPCCT 2022)
Peer reviewed
 

Documents


Texte intégral
acm-hpcct22-final.pdf
Preprint Auteur (1.29 MB)
Télécharger
Annexes
slides_hpcct2022.pdf
(3.87 MB)
Slides conference
Télécharger

Tous les documents dans ORBilu sont protégés par une licence d'utilisation.

Envoyer vers



Détails



Mots-clés :
High Performance Computing; Management
Résumé :
[en] With the advent of the technological revolution and the digital transformation that made all scientific disciplines becoming computational, the need for High Performance Computing (HPC) has become and a strategic and critical asset to leverage new research and business in all domains requiring computing and storage performance. Since 2007, the University of Luxembourg operates a large academic HPC facility which remains the reference implementation within the country. This paper provides a general description of the current platform implementation as well as its operational management choices which have been adapted to the integration of a new liquid-cooled supercomputer, named Aion, released in 2021. The administration of a HPC facility to provide state-of-art computing systems, storage and software is indeed a complex and dynamic enterprise with the soul purpose to offer an enhanced user experience for intensive research computing and large-scale analytic workflows. Most design choices and feedback described in this work have been motivated by several years of experience in addressing in a flexible and convenient way the heterogeneous needs inherent to an academic environment towards research excellence. The different layers and stacks used within the operated facilities are reviewed, in particular with regards the user software management, or the adaptation of the Slurm Resource and Job Management System (RJMS) configuration with novel incentives mechanisms. In practice, the described and implemented environment brought concrete and measurable improvements with regards the platform utilization (+12,64%), jobs efficiency (average Wall-time Request Accuracy improved by 110,81%), the management and funding (increased by 10%). Thorough performance evaluation of the facility is also presented in this paper through reference benchmarks such as HPL, HPCG, Graph500, IOR or IO500. It reveals sustainable and scalable performance comparable to the most powerful supercomputers in the world, including for energy-efficient metrics (for instance, 5,19 GFlops/W (resp. 6,14 MTEPS/W) were demonstrated for full HPL (resp. Graph500) runs across all Aion nodes).
Centre de recherche :
ULHPC - University of Luxembourg: High Performance Computing
Disciplines :
Sciences informatiques
Auteur, co-auteur :
VARRETTE, Sébastien ;  University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)
CARTIAUX, Hyacinthe ;  University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)
PETER, Sarah ;  University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Bioinformatics Core
KIEFFER, Emmanuel ;  University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)
VALETTE, Teddy ;  University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)
OLLOH, Abatcha ;  University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)
Co-auteurs externes :
no
Langue du document :
Anglais
Titre :
Management of an Academic HPC Research Computing Facility: The ULHPC Experience 2.0
Date de publication/diffusion :
juillet 2022
Nom de la manifestation :
6th ACM High Performance Computing and Cluster Technologies Conf. (HPCCT 2022)
Lieu de la manifestation :
Fuzhou, Chine
Date de la manifestation :
July 8-10, 2022
Manifestation à portée :
International
Titre de l'ouvrage principal :
6th High Performance Computing and Cluster Technologies Conference (HPCCT 2022)
Maison d'édition :
Association for Computing Machinery (ACM), Fuzhou, Chine
ISBN/EAN :
978-1-4503-9664-6
Peer reviewed :
Peer reviewed
Disponible sur ORBilu :
depuis le 04 août 2022

Statistiques


Nombre de vues
569 (dont 133 Unilu)
Nombre de téléchargements
449 (dont 37 Unilu)

citations Scopus®
 
39
citations Scopus®
sans auto-citations
36
OpenCitations
 
3
citations OpenAlex
 
26

Bibliographie


Publications similaires



Contacter ORBilu