[en] This Best Practice Guide (BPG) extends the previously developed series of BPGs by providing an update on new technologies and systems for the further support of European High Performance Computing (HPC) user community in achieving a remarkable performance of their large-scale applications. It covers existing systems and aims to provide support for scientists to port, build and run their applications on these systems. While some benchmarking is part of this guide, the results provided are mainly an illustration of the different systems characteristics, and should not be used as guides for the comparison of systems presented nor should be used for system procurement considerations. Procurement and benchmarking are well covered by other PRACE work packages and are out of this BPG's discussion scope. This BPG document has grown to be a hybrid of field guide and a textbook approach. The system and processor coverage provide some relevant technical information for the users who need a deeper knowledge of the system in order to fully utilise the hardware. While the field guide approach provides hints and starting points for porting and building scientific software. For this, a range of compilers, libraries, debuggers, performance analysis tools, etc. are covered. While recommendation for compilers, libraries and flags are covered we acknowledge that there is no magic bullet as all codes are different. Unfortunately there is often no way around the trial and error approach.
Some in-depth documentation of the covered processors is provided. This includes some background on the inner workings of the processors considered; the number of threads each core can handle; how these threads are implemented and how these threads (instruction streams) are scheduled onto different execution units within the core. In addition, this guide describes how the vector units with different lengths (256, 512 or in the case of SVE - variable and generally unknown until execution time) are implemented. As most of HPC work up to now has been done in 64 bit floating point the emphasis is on this data type, specially for vectors. In addition to the processor executing units, memory in its many levels of hierarchy is important. The different implementations of Non-Uniform Memory Access (NUMA) are also covered in this BPG. The guide gives a description of the hardware for a selection of relevant processors currently deployed in some PRACE HPC systems. It includes ARM64(Huawei/HiSilicon and Marvell) and x86-64 (AMD and Intel). It provides information on the programming models and development environment as well as information about porting programs. Furthermore it provides sections about strategies on how to analyze and improve the performance of applications. While this guide does not provide an update on all recent processors, some of the previous BPG releases do cover other processor architectures not discussed in this guide (e.g. Power architecture) and should be considered as a staring point for work.
This guide aims also to increase the user awareness on energy and power consumption of individual applications by providing some analysis on usefulness of maximum CPU frequency scaling based on the type of application considered (e.g. CPU-bound, memory-bound, etc.).
Research center :
ULHPC - University of Luxembourg: High Performance Computing
Disciplines :
Computer science
Author, co-author :
Saastad, Ole Widar; University of Oslo, Norway
Kapanova, Kristina; NCSA, Bulgaria
Markov, Stoyan; NCSA, Bulgaria
Morales, Cristian; BSC, Spain
Shamakina, Anastasiia; HLRS, Germany
Johnson, Nick; EPCC, United Kingdom
KRISHNASAMY, Ezhilmathi ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > PCOG
VARRETTE, Sébastien ; University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)