References of "Varrette, Sebastien 50003258"
     in
Bookmark and Share    
Full Text
Peer Reviewed
See detailObfuscating LLVM Intermediate Representation Source Code with NSGA-II
de la Torre, Juan Carlos; Aragó-Jurado, José Miguel; Jareño, Javier et al

in 15th Intl. Conf. on Computational Intelligence in Security for Information Systems (CISIS'22) (2022, September)

With the generalisation of distributed computing paradigms to sustain the surging demands for massive processing and data-analytic capabilities, the protection of the intellectual property tied to the ... [more ▼]

With the generalisation of distributed computing paradigms to sustain the surging demands for massive processing and data-analytic capabilities, the protection of the intellectual property tied to the executed programs transferred onto these remote shared platforms becomes critical. A more and more popular solution to this problem consists in applying obfuscating techniques, in particular at the source code level. Informally, the goal of obfuscation is to conceal the purpose of a program or its logic without altering its functionality, thus preventing reverse-engineering on the program even with the help of computing resources. This allows to protect software against plagiarism, tampering, or finding vulnerabilities that could be used for different kinds of attacks. The many advantages of code obfuscation, together with its low cost, makes it a popular technique. This paper proposes a novel methodology for source code obfuscation relying on the reference LLVM compiler infrastructure that can be used together with other traditional obfuscation techniques, making the code more robust against reverse engineering attacks. The problem is defined as a Multi-Objective Combinatorial Optimization (MOCO) problem, where the goal is to find sequences of LLVM optimizations that lead to highly obfuscated versions of the original code. These transformations are applied to the back-end pseudo- assembly code (i.e., LLVM Intermediate Representation), thus avoiding any further optimizations by the compiler. Three different problem flavours are defined and solved with popular NSGA-II genetic algorithm. The promising results show the potential of the proposed technique. [less ▲]

Detailed reference viewed: 148 (3 UL)
Full Text
Peer Reviewed
See detailAggregating and Consolidating two High Performant Network Topologies: The ULHPC Experience
Varrette, Sébastien UL; Cartiaux, Hyacinthe UL; Valette, Teddy UL et al

in ACM Practice and Experience in Advanced Research Computing (PEARC'22) (2022, July)

High Performance Computing (HPC) encompasses advanced computation over parallel processing. The execution time of a given simulation depends upon many factors, such as the number of CPU/GPU cores, their ... [more ▼]

High Performance Computing (HPC) encompasses advanced computation over parallel processing. The execution time of a given simulation depends upon many factors, such as the number of CPU/GPU cores, their utilisation factor and, of course, the inter- connect performance, efficiency, and scalability. In practice, this last component and the associated topology remains the most significant differentiators between HPC systems and lesser perfor- mant systems. The University of Luxembourg operates since 2007 a large academic HPC facility which remains one of the reference implementation within the country and offers a cutting-edge re- search infrastructure to Luxembourg public research. The main high-bandwidth low-latency network of the operated facility relies on the dominant interconnect technology in the HPC market i.e., Infiniband (IB) over a Fat-tree topology. It is complemented by an Ethernet-based network defined for management tasks, external access and interactions with user’s applications that do not support Infiniband natively. The recent acquisition of a new cutting-edge supercomputer Aion which was federated with the previous flag- ship cluster Iris was the occasion to aggregate and consolidate the two types of networks. This article depicts the architecture and the solutions designed to expand and consolidate the existing networks beyond their seminal capacity limits while keeping at best their Bisection bandwidth. At the IB level, and despite moving from a non-blocking configuration, the proposed approach defines a blocking topology maintaining the previous Fat-Tree height. The leaf connection capacity is more than tripled (moving from 216 to 672 end-points) while exhibiting very marginal penalties, i.e. less than 3% (resp. 0.3%) Read (resp. Write) bandwidth degradation against reference parallel I/O benchmarks, or a stable and sustain- able point-to-point bandwidth efficiency among all possible pairs of nodes (measured above 95.45% for bi-directional streams). With regards the Ethernet network, a novel 2-layer topology aiming for improving the availability, maintainability and scalability of the interconnect is described. It was deployed together with consistent network VLANs and subnets enforcing strict security policies via ACLs defined on the layer 3, offering isolated and secure net- work environments. The implemented approaches are applicable to a broad range of HPC infrastructures and thus may help other HPC centres to consolidate their own interconnect stacks when designing or expanding their network infrastructures. [less ▲]

Detailed reference viewed: 66 (21 UL)
Full Text
Peer Reviewed
See detailManagement of an Academic HPC Research Computing Facility: The ULHPC Experience 2.0
Varrette, Sébastien UL; Cartiaux, Hyacinthe UL; Peter, Sarah UL et al

in 6th High Performance Computing and Cluster Technologies Conference (HPCCT 2022) (2022, July)

With the advent of the technological revolution and the digital transformation that made all scientific disciplines becoming computational, the need for High Performance Computing (HPC) has become and a ... [more ▼]

With the advent of the technological revolution and the digital transformation that made all scientific disciplines becoming computational, the need for High Performance Computing (HPC) has become and a strategic and critical asset to leverage new research and business in all domains requiring computing and storage performance. Since 2007, the University of Luxembourg operates a large academic HPC facility which remains the reference implementation within the country. This paper provides a general description of the current platform implementation as well as its operational management choices which have been adapted to the integration of a new liquid-cooled supercomputer, named Aion, released in 2021. The administration of a HPC facility to provide state-of-art computing systems, storage and software is indeed a complex and dynamic enterprise with the soul purpose to offer an enhanced user experience for intensive research computing and large-scale analytic workflows. Most design choices and feedback described in this work have been motivated by several years of experience in addressing in a flexible and convenient way the heterogeneous needs inherent to an academic environment towards research excellence. The different layers and stacks used within the operated facilities are reviewed, in particular with regards the user software management, or the adaptation of the Slurm Resource and Job Management System (RJMS) configuration with novel incentives mechanisms. In practice, the described and implemented environment brought concrete and measurable improvements with regards the platform utilization (+12,64%), jobs efficiency (average Wall-time Request Accuracy improved by 110,81%), the management and funding (increased by 10%). Thorough performance evaluation of the facility is also presented in this paper through reference benchmarks such as HPL, HPCG, Graph500, IOR or IO500. It reveals sustainable and scalable performance comparable to the most powerful supercomputers in the world, including for energy-efficient metrics (for instance, 5,19 GFlops/W (resp. 6,14 MTEPS/W) were demonstrated for full HPL (resp. Graph500) runs across all Aion nodes). [less ▲]

Detailed reference viewed: 143 (49 UL)
Full Text
Peer Reviewed
See detailOptimizing the Resource and Job Management System of an Academic HPC and Research Computing Facility
Varrette, Sébastien UL; Kieffer, Emmanuel UL; Pinel, Frederic

in 21st IEEE Intl. Symp. on Parallel and Distributed Computing (ISPDC'22) (2022, July)

High Performance Computing (HPC) is nowadays a strategic asset required to sustain the surging demands for massive processing and data-analytic capabilities. In practice, the effective management of such ... [more ▼]

High Performance Computing (HPC) is nowadays a strategic asset required to sustain the surging demands for massive processing and data-analytic capabilities. In practice, the effective management of such large scale and distributed computing infrastructures is left to a Resource and Job Management System (RJMS). This essential middleware component is responsible for managing the computing resources, handling user requests to allocate resources while providing an optimized framework for starting, executing and monitoring jobs on the allocated resources. The University of Luxembourg has been operating for 15 years a large academic HPC facility which relies since 2017 on the Slurm RJMS introduced on top of the flagship cluster Iris. The acquisition of a new liquid-cooled supercomputer named Aion which was released in 2021 was the occasion to deeply review and optimize the seminal Slurm configuration, the resource limits defined and the sustaining fairsharing algorithm. This paper presents the outcomes of this study and details the implemented RJMS policy. The impact of the decisions made over the supercomputers workloads is also described. In particular, the performance evaluation conducted highlights that when compared to the seminal configuration, the described and implemented environment brought concrete and measurable improvements with regards the platform utilization (+12.64%), the jobs efficiency (as measured by the average Wall-time Request Accuracy, improved by 110.81%) or the management and funding (increased by 10%). The systems demonstrated sustainable and scalable HPC performances, and this effort has led to a negligible penalty on the average slowdown metric (response time normalized by runtime), which was increased by 0.59% for job workloads covering a complete year of exercise. Overall, this new setup has been in production for 18 months on both supercomputers and the updated model proves to bring a fairer and more satisfying experience to the end users. The proposed configurations and policies may help other HPC centres when designing or improving the RJMS sustaining the job scheduling strategy at the advent of computing capacity expansions. [less ▲]

Detailed reference viewed: 73 (11 UL)
Full Text
See detailLes NFT en 40 questions: Des réponses claires et détaillées pour comprendre les Non Fungible Tokens
Dumas, Jean-Guillaume; Lafourcade, Pascal; Roudeix, Etienne et al

Book published by Dunod - 1st (2022)

Apparus en 2017, les jetons non fongibles ont généré en 2021 plusieurs millions d’euros et impacté de nombreux domaines allant de la mode aux paris hippique en passant par l’art et le sport. L’objectif de ... [more ▼]

Apparus en 2017, les jetons non fongibles ont généré en 2021 plusieurs millions d’euros et impacté de nombreux domaines allant de la mode aux paris hippique en passant par l’art et le sport. L’objectif de ce livre est d’expliquer comment fonctionnent les NFT et de présenter simplement leurs caractéristiques spécifiques. Ce livre est conçu pour répondre non seulement aux questions que vous vous posez sur l’univers des NFT… mais aussi à celles que vous ne vous étiez pas encore posées. Qu’est-ce qu’un contrat intelligent ? Quel a été le premier NFT ? Quels sont les jetons fongibles et non fongibles ? Comment les NFT révolutionnent-ils les titres de propriété ? Que peut-on acheter ou vendre grâce aux NFT ? Comment les NFT sont-ils utilisés dans l’art ? Qu’est-ce que le norme ERC-721 pour les NFT sur Ethereum ? Les réponses à toutes ces questions (et à 33 autres) sont dans ce livre. [less ▲]

Detailed reference viewed: 564 (5 UL)
Full Text
See detailLes blockchains en 50 questions
Dumas, Jean-Guillaume; Lafourcade, Pascal; Tichit, Ariane et al

Book published by Dunod - 2eme edition (2022)

La création du bitcoin en 2009 est remarquable à plus d’un titre, tout d’abord parce qu’elle repose sur le mécanisme novateur de la blockchain qui permet d’enregistrer de manière distribuée des ... [more ▼]

La création du bitcoin en 2009 est remarquable à plus d’un titre, tout d’abord parce qu’elle repose sur le mécanisme novateur de la blockchain qui permet d’enregistrer de manière distribuée des informations de façon irréversible et vérifiable par tout le monde. Aujourd’hui les blockchains ne se limitent plus aux cryptomonnaies, et touchent de nombreux autres domaines (contrats intelligents, NFT…). Ce livre est conçu pour répondre non seulement aux questions que vous vous posez sur l’univers des blockchains… mais aussi à celles que vous ne vous étiez pas encore posées. Qu’est-ce qu’une blockchain ? Quel est le lien entre le bitcoin et les blockchains ? Qui sont les mineurs et que font-ils ? Qu’est-ce qu’un consensus ? Qu’est-ce qu’un contrat intelligent ? Peut-on faire une blockchain sans bloc ? Quelle est la part des cryptomonnaies dans l’économie mondiale ? Comment les blockchains vont-elles révolutionner le monde ? Les réponses à toutes ces questions (et à 42 autres) sont dans ce livre. [less ▲]

Detailed reference viewed: 327 (3 UL)
Full Text
Peer Reviewed
See detailA RNN-Based Hyper-Heuristic for Combinatorial Problems
Kieffer, Emmanuel UL; Duflo, Gabriel UL; Danoy, Grégoire UL et al

in A RNN-Based Hyper-Heuristic for Combinatorial Problems (2022)

Designing efficient heuristics is a laborious and tedious task that generally requires a full understanding and knowledge of a given optimization problem. Hyper-heuristics have been mainly introduced to ... [more ▼]

Designing efficient heuristics is a laborious and tedious task that generally requires a full understanding and knowledge of a given optimization problem. Hyper-heuristics have been mainly introduced to tackle this issue and are mostly relying on Genetic Programming and its variants. Many attempts in the literature have shown that an automatic training mechanism for heuristic learning is possible and can challenge human-based heuristics in terms of gap to optimality. In this work, we introduce a novel approach based on a recent work on Deep Symbolic Regression. We demonstrate that scoring functions can be trained using Recurrent Neural Networks to tackle a well-know combinatorial problem, i.e., the Multi-dimensional Knapsack. Experiments have been conducted on instances from the OR-Library and results show that the proposed modus operandi is an alternative and promising approach to human- based heuristics and classical heuristic generation approaches. [less ▲]

Detailed reference viewed: 123 (23 UL)
Full Text
See detailOverview and Challenges of the UL HPC Facility at the EuroHPC Horizon
Varrette, Sébastien UL

Presentation (2021, November)

Detailed reference viewed: 23 (0 UL)
Full Text
See detailThe new AION cluster: Overview, Technical specifications and Capabilities
Varrette, Sébastien UL

Presentation (2021, November)

Detailed reference viewed: 16 (0 UL)
Full Text
See detailSecurity in an evolving European HPC Ecosystem
Pleiter, Dirk; Varrette, Sébastien UL; Krishnasamy, Ezhilmathi UL et al

Report (2021)

The goal of this technical report is to analyse challenges and requirements related to security in the context of an evolving European HPC ecosystem, to provide selected strategies on how to address them ... [more ▼]

The goal of this technical report is to analyse challenges and requirements related to security in the context of an evolving European HPC ecosystem, to provide selected strategies on how to address them, and to come up with a set of forward-looking recommendations. A key assumption made in this technical report is that we are in a transition period from a setup, where HPC resources are operated in a rather independent manner, to centres providing a variety of e-infrastructure services, which are not exclusively based on HPC resources and are increasingly part of federated infrastructures. [less ▲]

Detailed reference viewed: 86 (2 UL)
Full Text
Peer Reviewed
See detailRESIF 3.0: Toward a Flexible & Automated Management of User Software Environment on HPC facility
Varrette, Sébastien UL; Kieffer, Emmanuel UL; Pinel, Frederic UL et al

in ACM Practice and Experience in Advanced Research Computing (PEARC'21) (2021, July)

High Performance Computing (HPC) is increasingly identified as a strategic asset and enabler to accelerate the research and the business performed in all areas requiring intensive computing and large ... [more ▼]

High Performance Computing (HPC) is increasingly identified as a strategic asset and enabler to accelerate the research and the business performed in all areas requiring intensive computing and large-scale Big Data analytic capabilities. The efficient exploitation of heterogeneous computing resources featuring different processor architectures and generations, coupled with the eventual presence of GPU accelerators, remains a challenge. The University of Luxembourg operates since 2007 a large academic HPC facility which remains one of the reference implementation within the country and offers a cutting-edge research infrastructure to Luxembourg public research. The HPC support team invests a significant amount of time (i.e., several months of effort per year) in providing a software environment optimised for hundreds of users, but the complexity of HPC software was quickly outpacing the capabilities of classical software management tools. Since 2014, our scientific software stack is generated and deployed in an automated and consistent way through the RESIF framework, a wrapper on top of Easybuild and Lmod [5] meant to efficiently handle user software generation. A large code refactoring was performed in 2017 to better handle different software sets and roles across multiple clusters, all piloted through a dedicated control repository. With the advent in 2020 of a new supercomputer featuring a different CPU architecture, and to mitigate the identified limitations of the existing framework, we report in this state-of-practice article RESIF 3.0, the latest iteration of our scientific software management suit now relying on streamline Easybuild. It permitted to reduce by around 90% the number of custom configurations previously enforced by specific Slurm and MPI settings, while sustaining optimised builds coexisting for different dimensions of CPU and GPU architectures. The workflow for contributing back to the Easybuild community was also automated and a current work in progress aims at drastically decrease the building time of a complete software set generation. Overall, most design choices for our wrapper have been motivated by several years of experience in addressing in a flexible and convenient way the heterogeneous needs inherent to an academic environment aiming for research excellence. As the code base is available publicly, and as we wish to transparently report also the pitfalls and difficulties met, this tool may thus help other HPC centres to consolidate their own software management stack. [less ▲]

Detailed reference viewed: 346 (39 UL)
Full Text
See detailUL HPC Facility Workload Analysis
Varrette, Sébastien UL

Presentation (2021, June 24)

Detailed reference viewed: 21 (1 UL)
Full Text
Peer Reviewed
See detailProtection of Personal Data in High Performance Computing Platform for Scientific Research Purposes
Paseri, Ludovica; Varrette, Sébastien UL; Bouvry, Pascal UL

in Proc. of the EU Annual Privacy Forum (APF) 2021 (2021, June)

The Open Science projects are also aimed at strongly encouraging the use of Cloud technologies and High Performance Computing (HPC), for the benefit of European researchers and universities. The emerging ... [more ▼]

The Open Science projects are also aimed at strongly encouraging the use of Cloud technologies and High Performance Computing (HPC), for the benefit of European researchers and universities. The emerging paradigm of Open Science enables an easier access to expert knowledge and material; however, it also raises some challenges regarding the protection of personal data, considering that part of the research data are personal data thus subjected to the EU’s General Data Protection Regulation (GDPR). This paper investigates the concept of scientific research in the field of data protection, with regard both to the European (GDPR) and national (Luxembourg Data Protection Law) legal framework for the compliance of the HPC technology. Therefore, it focuses on a case study, the HPC platform of the University of Luxembourg (ULHPC), to pinpoint the major data protection issues arising from the processing activities through HPC from the perspective of the HPC platform operators. Our study illustrates where the most problematic aspects of compliance lie. In this regard, possible solutions are also suggested, which mainly revolve around (1) standardisation of procedures; (2) cooperation at institutional level; (3) identification of guidelines for common challenges. This research is aimed to support legal researchers in the field of data protection, in order to help deepen the understanding of HPC technology’s challenges and universities and research centres holding an HPC platform for research purposes, which have to address the same issues. [less ▲]

Detailed reference viewed: 162 (12 UL)
Full Text
See detailPRACE Best Practice Guide 2021: Modern Accelerators
Bispo, João; Barbosa, Jorge G.; Filipe Silva, Pedro et al

Report (2021)

Hardware accelerators are special types of elements designed for boosting the performance of certain application regions requiring large amounts of numerical computations. Several factors contributed to ... [more ▼]

Hardware accelerators are special types of elements designed for boosting the performance of certain application regions requiring large amounts of numerical computations. Several factors contributed to broadening the use and furthering the adoption of these technologies in High-Performance Computing (HPC). One of such is the offered greater computational throughput as compared to stand-alone Central Processing Units (CPUs), which is driven by the highly parallel architectural design of accelerators. This is particularly important in the current era of ever-increasing computational demands featuring high reuse rates of compute-intensive operational patterns. Another contributing factor is that these specialized chips are also capable of delivering much higher compute performance as compared to CPUs under the same power budget, making these technologies even more appealing for system vendors and users. All these led HPC manufacturers and integrators to unleash further the potential of hardware accelerators for delivering the required compute performance more efficiently. In fact, this is one of the main reasons that the current Top500 list [1] continues to be enriched with various accelerated systems. The next generation of HPC systems will also see a considerable amount of accelerator technology used. As a matter of fact, two out of the three European High-Performance Computing Joint Undertaking (EuroHPC JU) [2] pre-exascale HPC sites have already announced that their supercomputers will be equipped with large amount of Graphics Processing Units (GPUs). Thus, in order to achieve a competitive application performance and to be able to use the underlying hardware infrastructure efficiently, HPC application developers should be familiar with various challenges associated with using and orchestrating vast amounts of accelerator devices while being acquainted with the available ecosystem of the supporting tools. This Best Practice Guide (BPG) extends the previously developed series of BPGs [3] by providing an update on new accelerator technologies to further support the European HPC user community in achieving outstanding performance records of their large-scale parallel applications. This guide follows the style of the previously published guide on "Modern Processors" [4], by providing a hybrid approach of a field guide and a textbook. The aim of this BPG is not to replace any of the available in depth textbooks and/or documentations of certain tools, but rather to provide a set of best practices that build upon the available literature and the expertise of authors involved to further ease the process of application porting and performance optimisation. This guide showcases the usability and possibilities of further application tuning given a specific accelerator technology, and does not provide any direct comparisons of different accelerator technologies involved. The guide provides a generic overview on various accelerators and their accompanying programming models/environments and thus should be viewed as complementary to the existing in-depth BPGs provided by hardware vendors that are typically specific to their own product. [less ▲]

Detailed reference viewed: 86 (9 UL)
Full Text
See detailUni.lu HPC Annual Report 2020
Varrette, Sébastien UL

Report (2021)

2020 was a challenging year for everyone that will stay in our memory. The pandemic disrupted our economies, societies, and all our best laid-out plans. However, COVID-19 also taught us several lessons ... [more ▼]

2020 was a challenging year for everyone that will stay in our memory. The pandemic disrupted our economies, societies, and all our best laid-out plans. However, COVID-19 also taught us several lessons for the future, in particular the (real) necessity to adapt, to be nimble and to expect the unexpected while supporting cutting-edge excellence in science with the best performing and most flexible tools to unleash research potential. One thing is certain - the strategic developments for accelerated digitalisation and the role that HPC will play to ensure a smarter and more connected University will be in focus in 2021 and the years to come. 2020 was thus a very fruitful and productive year for the ULHPC team which has seen unprecedented changes and challenges. [less ▲]

Detailed reference viewed: 68 (3 UL)
Full Text
See detailEdge Computing: An Overview of Framework and Applications
Krishnasamy, Ezhilmathi UL; Varrette, Sébastien UL; Mucciardi, Michael

Report (2020)

This report gives an overview of the Edge Computing paradigm and its applications. Indeed, with the advent of the Internet of Things (IoT) era, many electronic devices and sensors produce a vast volume of ... [more ▼]

This report gives an overview of the Edge Computing paradigm and its applications. Indeed, with the advent of the Internet of Things (IoT) era, many electronic devices and sensors produce a vast volume of data which should be processed in a timely manner and this novel computing model is nowadays seen as a pertinent answer to this open challenge. This report thus explains why Edge Computing is needed and how the edge architecture is typically structured. It further presents the technologies that help this cutting-edge model to function properly. Since Edge Computing involves a heterogeneous architecture, it requires to adapt to a few technological recommendations for optimal performance. In this context, this report reviews the latest hardware technology trends tied to Edge Computing developments and points out technical challenges implementing this innovative computing model. In particular, we analyse how High-Performance Computing and CloudComputing infrastructures can be efficiently organised to design an Edge Computing-based framework able to tackle cutting-edge issues solved by Artificial Intelligence techniques. Finally, this report presents selected real-world applications of the Edge Computing paradigm across multiple domains affecting our daily life, i.e., healthcare, smart city and grids, industry 4.0 and public safety [less ▲]

Detailed reference viewed: 579 (13 UL)
Full Text
See detailPRACE Best Practice Guide 2020: Modern Processors
Saastad, Ole Widar; Kapanova, Kristina; Markov, Stoyan et al

Report (2020)

This Best Practice Guide (BPG) extends the previously developed series of BPGs by providing an update on new technologies and systems for the further support of European High Performance Computing (HPC ... [more ▼]

This Best Practice Guide (BPG) extends the previously developed series of BPGs by providing an update on new technologies and systems for the further support of European High Performance Computing (HPC) user community in achieving a remarkable performance of their large-scale applications. It covers existing systems and aims to provide support for scientists to port, build and run their applications on these systems. While some benchmarking is part of this guide, the results provided are mainly an illustration of the different systems characteristics, and should not be used as guides for the comparison of systems presented nor should be used for system procurement considerations. Procurement and benchmarking are well covered by other PRACE work packages and are out of this BPG's discussion scope. This BPG document has grown to be a hybrid of field guide and a textbook approach. The system and processor coverage provide some relevant technical information for the users who need a deeper knowledge of the system in order to fully utilise the hardware. While the field guide approach provides hints and starting points for porting and building scientific software. For this, a range of compilers, libraries, debuggers, performance analysis tools, etc. are covered. While recommendation for compilers, libraries and flags are covered we acknowledge that there is no magic bullet as all codes are different. Unfortunately there is often no way around the trial and error approach. Some in-depth documentation of the covered processors is provided. This includes some background on the inner workings of the processors considered; the number of threads each core can handle; how these threads are implemented and how these threads (instruction streams) are scheduled onto different execution units within the core. In addition, this guide describes how the vector units with different lengths (256, 512 or in the case of SVE - variable and generally unknown until execution time) are implemented. As most of HPC work up to now has been done in 64 bit floating point the emphasis is on this data type, specially for vectors. In addition to the processor executing units, memory in its many levels of hierarchy is important. The different implementations of Non-Uniform Memory Access (NUMA) are also covered in this BPG. The guide gives a description of the hardware for a selection of relevant processors currently deployed in some PRACE HPC systems. It includes ARM64(Huawei/HiSilicon and Marvell) and x86-64 (AMD and Intel). It provides information on the programming models and development environment as well as information about porting programs. Furthermore it provides sections about strategies on how to analyze and improve the performance of applications. While this guide does not provide an update on all recent processors, some of the previous BPG releases do cover other processor architectures not discussed in this guide (e.g. Power architecture) and should be considered as a staring point for work. This guide aims also to increase the user awareness on energy and power consumption of individual applications by providing some analysis on usefulness of maximum CPU frequency scaling based on the type of application considered (e.g. CPU-bound, memory-bound, etc.). [less ▲]

Detailed reference viewed: 239 (12 UL)
Full Text
Peer Reviewed
See detailPerformance Analysis of Distributed and Scalable Deep Learning
Mahon, S.; Varrette, Sébastien UL; Plugaru, Valentin UL et al

in 20th IEEE/ACM Intl. Symp. on Cluster, Cloud and Internet Computing (CCGrid'20) (2020, May)

With renewed global interest for Artificial Intelligence (AI) methods, the past decade has seen a myriad of new programming models and tools that enable better and faster Machine Learning (ML). More ... [more ▼]

With renewed global interest for Artificial Intelligence (AI) methods, the past decade has seen a myriad of new programming models and tools that enable better and faster Machine Learning (ML). More recently, a subset of ML known as Deep Learning (DL) raised an increased interest due to its inherent ability to tackle efficiently novel cognitive computing applications. DL allows computational models that are composed of multiple processing layers to learn in an automated way representations of data with multiple levels of abstraction, and can deliver higher predictive accuracy when trained on larger data sets. Based on Artificial Neural Networks (ANN), DL is now at the core of state of the art voice recognition systems (which enable easy control over e.g. Internet-of- Things (IoT) smart home appliances for instance), self-driving car engine, online recommendation systems. The ecosystem of DL frameworks is fast evolving, as well as the DL architectures that are shown to perform well on specialized tasks and to exploit GPU accelerators. For this reason, the frequent performance evaluation of the DL ecosystem is re- quired, especially since the advent of novel distributed training frameworks such as Horovod allowing for scalable training across multiple computing resources. In this paper, the scalability evaluation of the reference DL frameworks (Tensorflow, Keras, MXNet, and PyTorch) is performed over up-to-date High Performance Comput- ing (HPC) resources to compare the efficiency of differ- ent implementations across several hardware architectures (CPU and GPU). Experimental results demonstrate that the DistributedDataParallel features in the Pytorch library seem to be the most efficient framework for distributing the training process across many devices, allowing to reach a throughput speedup of 10.11 when using 12 NVidia Tesla V100 GPUs when training Resnet44 on the CIFAR10 dataset. [less ▲]

Detailed reference viewed: 176 (14 UL)
Full Text
Peer Reviewed
See detailEvolving a Deep Neural Network Training Time Estimator
Pinel, Frédéric UL; Yin, Jian-xiong; Hundt, Christian UL et al

in Communications in Computer and Information Science (2020, February)

We present a procedure for the design of a Deep Neural Net- work (DNN) that estimates the execution time for training a deep neural network per batch on GPU accelerators. The estimator is destined to be ... [more ▼]

We present a procedure for the design of a Deep Neural Net- work (DNN) that estimates the execution time for training a deep neural network per batch on GPU accelerators. The estimator is destined to be embedded in the scheduler of a shared GPU infrastructure, capable of providing estimated training times for a wide range of network architectures, when the user submits a training job. To this end, a very short and simple representation for a given DNN is chosen. In order to compensate for the limited degree of description of the basic network representation, a novel co-evolutionary approach is taken to fit the estimator. The training set for the estimator, i.e. DNNs, is evolved by an evolutionary algorithm that optimizes the accuracy of the estimator. In the process, the genetic algorithm evolves DNNs, generates Python-Keras programs and projects them onto the simple representation. The genetic operators are dynamic, they change with the estimator’s accuracy in order to balance accuracy with generalization. Results show that despite the low degree of information in the representation and the simple initial design for the predictor, co-evolving the training set performs better than near random generated population of DNNs. [less ▲]

Detailed reference viewed: 172 (15 UL)