Communication publiée dans un ouvrage (Colloques, congrès, conférences scientifiques et actes)
Aggregating and Consolidating two High Performant Network Topologies: The ULHPC Experience
VARRETTE, Sébastien; CARTIAUX, Hyacinthe; VALETTE, Teddy et al.
2022In ACM Practice and Experience in Advanced Research Computing (PEARC'22)
Peer reviewed
 

Documents


Texte intégral
final_pearc22-64.pdf
Preprint Auteur (1.28 MB)
Télécharger
Annexes
slides_pearc22.pdf
(2.61 MB)
Slides conference
Télécharger

Tous les documents dans ORBilu sont protégés par une licence d'utilisation.

Envoyer vers



Détails



Mots-clés :
HPC Management; Network; Performance Evaluation
Résumé :
[en] High Performance Computing (HPC) encompasses advanced computation over parallel processing. The execution time of a given simulation depends upon many factors, such as the number of CPU/GPU cores, their utilisation factor and, of course, the inter- connect performance, efficiency, and scalability. In practice, this last component and the associated topology remains the most significant differentiators between HPC systems and lesser perfor- mant systems. The University of Luxembourg operates since 2007 a large academic HPC facility which remains one of the reference implementation within the country and offers a cutting-edge re- search infrastructure to Luxembourg public research. The main high-bandwidth low-latency network of the operated facility relies on the dominant interconnect technology in the HPC market i.e., Infiniband (IB) over a Fat-tree topology. It is complemented by an Ethernet-based network defined for management tasks, external access and interactions with user’s applications that do not support Infiniband natively. The recent acquisition of a new cutting-edge supercomputer Aion which was federated with the previous flag- ship cluster Iris was the occasion to aggregate and consolidate the two types of networks. This article depicts the architecture and the solutions designed to expand and consolidate the existing networks beyond their seminal capacity limits while keeping at best their Bisection bandwidth. At the IB level, and despite moving from a non-blocking configuration, the proposed approach defines a blocking topology maintaining the previous Fat-Tree height. The leaf connection capacity is more than tripled (moving from 216 to 672 end-points) while exhibiting very marginal penalties, i.e. less than 3% (resp. 0.3%) Read (resp. Write) bandwidth degradation against reference parallel I/O benchmarks, or a stable and sustain- able point-to-point bandwidth efficiency among all possible pairs of nodes (measured above 95.45% for bi-directional streams). With regards the Ethernet network, a novel 2-layer topology aiming for improving the availability, maintainability and scalability of the interconnect is described. It was deployed together with consistent network VLANs and subnets enforcing strict security policies via ACLs defined on the layer 3, offering isolated and secure net- work environments. The implemented approaches are applicable to a broad range of HPC infrastructures and thus may help other HPC centres to consolidate their own interconnect stacks when designing or expanding their network infrastructures.
Centre de recherche :
ULHPC - University of Luxembourg: High Performance Computing
Disciplines :
Sciences informatiques
Auteur, co-auteur :
VARRETTE, Sébastien ;  University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)
CARTIAUX, Hyacinthe ;  University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)
VALETTE, Teddy ;  University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)
OLLOH, Abatcha ;  University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)
Co-auteurs externes :
no
Langue du document :
Anglais
Titre :
Aggregating and Consolidating two High Performant Network Topologies: The ULHPC Experience
Date de publication/diffusion :
juillet 2022
Nom de la manifestation :
Practice and Experience in Advanced Research Computing (PEARC ’22),
Lieu de la manifestation :
Boston, Etats-Unis
Date de la manifestation :
July 8-14, 2022
Manifestation à portée :
International
Titre de l'ouvrage principal :
ACM Practice and Experience in Advanced Research Computing (PEARC'22)
Maison d'édition :
Association for Computing Machinery (ACM), Boston, Etats-Unis
Peer reviewed :
Peer reviewed
Focus Area :
Computational Sciences
URL complémentaire :
Disponible sur ORBilu :
depuis le 01 août 2022

Statistiques


Nombre de vues
347 (dont 51 Unilu)
Nombre de téléchargements
146 (dont 11 Unilu)

citations Scopus®
 
5
citations Scopus®
sans auto-citations
4
OpenCitations
 
1
citations OpenAlex
 
5

Bibliographie


Publications similaires



Contacter ORBilu