Paper published in a book (Scientific congresses, symposiums and conference proceedings)
RESIF 3.0: Toward a Flexible & Automated Management of User Software Environment on HPC facility
VARRETTE, Sébastien; KIEFFER, Emmanuel; PINEL, Frederic et al.
2021In ACM Practice and Experience in Advanced Research Computing (PEARC'21)
Peer reviewed
 

Files


Full Text
acm-pearc21-resif3.pdf
Author postprint (947.52 kB)
Download
Annexes
slides_acm-pearc21-resif3.pdf
(2.51 MB)
Slides presented on July 22, 2021
Download

All documents in ORBilu are protected by a user license.

Send to



Details



Keywords :
HPC; Software Management; Easybuild
Abstract :
[en] High Performance Computing (HPC) is increasingly identified as a strategic asset and enabler to accelerate the research and the business performed in all areas requiring intensive computing and large-scale Big Data analytic capabilities. The efficient exploitation of heterogeneous computing resources featuring different processor architectures and generations, coupled with the eventual presence of GPU accelerators, remains a challenge. The University of Luxembourg operates since 2007 a large academic HPC facility which remains one of the reference implementation within the country and offers a cutting-edge research infrastructure to Luxembourg public research. The HPC support team invests a significant amount of time (i.e., several months of effort per year) in providing a software environment optimised for hundreds of users, but the complexity of HPC software was quickly outpacing the capabilities of classical software management tools. Since 2014, our scientific software stack is generated and deployed in an automated and consistent way through the RESIF framework, a wrapper on top of Easybuild and Lmod [5] meant to efficiently handle user software generation. A large code refactoring was performed in 2017 to better handle different software sets and roles across multiple clusters, all piloted through a dedicated control repository. With the advent in 2020 of a new supercomputer featuring a different CPU architecture, and to mitigate the identified limitations of the existing framework, we report in this state-of-practice article RESIF 3.0, the latest iteration of our scientific software management suit now relying on streamline Easybuild. It permitted to reduce by around 90% the number of custom configurations previously enforced by specific Slurm and MPI settings, while sustaining optimised builds coexisting for different dimensions of CPU and GPU architectures. The workflow for contributing back to the Easybuild community was also automated and a current work in progress aims at drastically decrease the building time of a complete software set generation. Overall, most design choices for our wrapper have been motivated by several years of experience in addressing in a flexible and convenient way the heterogeneous needs inherent to an academic environment aiming for research excellence. As the code base is available publicly, and as we wish to transparently report also the pitfalls and difficulties met, this tool may thus help other HPC centres to consolidate their own software management stack.
Research center :
ULHPC - University of Luxembourg: High Performance Computing
Disciplines :
Computer science
Author, co-author :
VARRETTE, Sébastien ;  University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)
KIEFFER, Emmanuel ;  University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)
PINEL, Frederic ;  University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)
KRISHNASAMY, Ezhilmathi ;  University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > PCOG
PETER, Sarah ;  University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Bioinformatics Core
CARTIAUX, Hyacinthe ;  University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)
BESSERON, Xavier  ;  University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Engineering (DoE)
External co-authors :
no
Language :
English
Title :
RESIF 3.0: Toward a Flexible & Automated Management of User Software Environment on HPC facility
Publication date :
July 2021
Event name :
ACM Practice and Experience in Advanced Research Computing (PEARC'21)
Event date :
July 19-22, 2021
Audience :
International
Main work title :
ACM Practice and Experience in Advanced Research Computing (PEARC'21)
Publisher :
Association for Computing Machinery (ACM), Virtual Event, Unknown/unspecified
Edition :
PEARC'21
Peer reviewed :
Peer reviewed
Available on ORBilu :
since 14 May 2021

Statistics


Number of views
411 (74 by Unilu)
Number of downloads
94 (14 by Unilu)

Scopus citations®
 
2
Scopus citations®
without self-citations
1
OpenCitations
 
1

Bibliography


Similar publications



Contact ORBilu