References of "Morales, Cristian"
     in
Bookmark and Share    
Full Text
See detailPRACE Best Practice Guide 2021: Modern Accelerators
Bispo, João; Barbosa, Jorge G.; Filipe Silva, Pedro et al

Report (2021)

Hardware accelerators are special types of elements designed for boosting the performance of certain application regions requiring large amounts of numerical computations. Several factors contributed to ... [more ▼]

Hardware accelerators are special types of elements designed for boosting the performance of certain application regions requiring large amounts of numerical computations. Several factors contributed to broadening the use and furthering the adoption of these technologies in High-Performance Computing (HPC). One of such is the offered greater computational throughput as compared to stand-alone Central Processing Units (CPUs), which is driven by the highly parallel architectural design of accelerators. This is particularly important in the current era of ever-increasing computational demands featuring high reuse rates of compute-intensive operational patterns. Another contributing factor is that these specialized chips are also capable of delivering much higher compute performance as compared to CPUs under the same power budget, making these technologies even more appealing for system vendors and users. All these led HPC manufacturers and integrators to unleash further the potential of hardware accelerators for delivering the required compute performance more efficiently. In fact, this is one of the main reasons that the current Top500 list [1] continues to be enriched with various accelerated systems. The next generation of HPC systems will also see a considerable amount of accelerator technology used. As a matter of fact, two out of the three European High-Performance Computing Joint Undertaking (EuroHPC JU) [2] pre-exascale HPC sites have already announced that their supercomputers will be equipped with large amount of Graphics Processing Units (GPUs). Thus, in order to achieve a competitive application performance and to be able to use the underlying hardware infrastructure efficiently, HPC application developers should be familiar with various challenges associated with using and orchestrating vast amounts of accelerator devices while being acquainted with the available ecosystem of the supporting tools. This Best Practice Guide (BPG) extends the previously developed series of BPGs [3] by providing an update on new accelerator technologies to further support the European HPC user community in achieving outstanding performance records of their large-scale parallel applications. This guide follows the style of the previously published guide on "Modern Processors" [4], by providing a hybrid approach of a field guide and a textbook. The aim of this BPG is not to replace any of the available in depth textbooks and/or documentations of certain tools, but rather to provide a set of best practices that build upon the available literature and the expertise of authors involved to further ease the process of application porting and performance optimisation. This guide showcases the usability and possibilities of further application tuning given a specific accelerator technology, and does not provide any direct comparisons of different accelerator technologies involved. The guide provides a generic overview on various accelerators and their accompanying programming models/environments and thus should be viewed as complementary to the existing in-depth BPGs provided by hardware vendors that are typically specific to their own product. [less ▲]

Detailed reference viewed: 36 (3 UL)
Full Text
See detailPRACE Best Practice Guide 2020: Modern Processors
Saastad, Ole Widar; Kapanova, Kristina; Markov, Stoyan et al

Report (2020)

This Best Practice Guide (BPG) extends the previously developed series of BPGs by providing an update on new technologies and systems for the further support of European High Performance Computing (HPC ... [more ▼]

This Best Practice Guide (BPG) extends the previously developed series of BPGs by providing an update on new technologies and systems for the further support of European High Performance Computing (HPC) user community in achieving a remarkable performance of their large-scale applications. It covers existing systems and aims to provide support for scientists to port, build and run their applications on these systems. While some benchmarking is part of this guide, the results provided are mainly an illustration of the different systems characteristics, and should not be used as guides for the comparison of systems presented nor should be used for system procurement considerations. Procurement and benchmarking are well covered by other PRACE work packages and are out of this BPG's discussion scope. This BPG document has grown to be a hybrid of field guide and a textbook approach. The system and processor coverage provide some relevant technical information for the users who need a deeper knowledge of the system in order to fully utilise the hardware. While the field guide approach provides hints and starting points for porting and building scientific software. For this, a range of compilers, libraries, debuggers, performance analysis tools, etc. are covered. While recommendation for compilers, libraries and flags are covered we acknowledge that there is no magic bullet as all codes are different. Unfortunately there is often no way around the trial and error approach. Some in-depth documentation of the covered processors is provided. This includes some background on the inner workings of the processors considered; the number of threads each core can handle; how these threads are implemented and how these threads (instruction streams) are scheduled onto different execution units within the core. In addition, this guide describes how the vector units with different lengths (256, 512 or in the case of SVE - variable and generally unknown until execution time) are implemented. As most of HPC work up to now has been done in 64 bit floating point the emphasis is on this data type, specially for vectors. In addition to the processor executing units, memory in its many levels of hierarchy is important. The different implementations of Non-Uniform Memory Access (NUMA) are also covered in this BPG. The guide gives a description of the hardware for a selection of relevant processors currently deployed in some PRACE HPC systems. It includes ARM64(Huawei/HiSilicon and Marvell) and x86-64 (AMD and Intel). It provides information on the programming models and development environment as well as information about porting programs. Furthermore it provides sections about strategies on how to analyze and improve the performance of applications. While this guide does not provide an update on all recent processors, some of the previous BPG releases do cover other processor architectures not discussed in this guide (e.g. Power architecture) and should be considered as a staring point for work. This guide aims also to increase the user awareness on energy and power consumption of individual applications by providing some analysis on usefulness of maximum CPU frequency scaling based on the type of application considered (e.g. CPU-bound, memory-bound, etc.). [less ▲]

Detailed reference viewed: 158 (10 UL)