[en] Tree search algorithms, such as the Branch-and-Bound (B&B) method, are essential tools in exact combinatorial optimization. Parallel B&B presents significant challenges in achieving scalability arising due to irregular and fine-grained workload associated and the hardware heterogeneity of modern supercomputers. This work focuses on leveraging GPU heterogeneity in the HPC environment within exact tree search optimization algorithms, through a comparison between a proposed CUDA-based low-level implementation and a high-level PGAS-based one. We revisit the design and implementation of a pool-based GPU-accelerated parallel B&B algorithm tailored for heterogeneous CPU / GPU systems originally developed in Chapel. A low-level counterpart implementation, suited for Nvidia and AMD GPU architectures, is proposed for performance gain, appeasing some of the portability issues usually related to low-level implementations. This comparison provides insights into the trade-offs between different programming paradigms in large-scale, heterogeneous computing environments.
Research center :
Interdisciplinary Centre for Security, Reliability and Trust (SnT) > PCOG - Parallel Computing & Optimization Group
HELBECQUE, Guillaume ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust > PCOG > Team Grégoire DANOY ; Université de Lille, CNRS/CRIStAL UMR 9189, Centre Inria de l'Université de Lille, Lille, France
KRISHNASAMY, Ezhilmathi ; University of Luxembourg > Faculty of Science, Technology and Medicine > HPC Platform > High Level Support Team
Melab, Nouredine; Université de Lille, CNRS/CRIStAL UMR 9189, Centre Inria de l'Université de Lille, Lille, France
Danoy, Grégoire; University of Luxembourg, DCS-FSTM/SnT, Luxembourg
External co-authors :
yes
Language :
English
Title :
Performance and Portability in Multi-GPU Branch-and-Bound: Chapel Versus CUDA and HIP for Tree-Based Optimization
Publication date :
13 August 2025
Event name :
2025 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
Event place :
Milan, Italy
Event date :
03-06-2025 => 07-06-2025
Audience :
International
Main work title :
Performance and Portability in Multi-GPU Branch-and-Bound: Chapel versus CUDA and HIP for Tree-Based Optimization
Publisher :
Institute of Electrical and Electronics Engineers Inc.
ISBN/EAN :
9798331526436
Collection name :
2025 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
This work was partially supported by the French government through the program "France 2030" (SFRI project GRAEL/ ANR-21-SFRI-005, PI: Université de Lille) managed by the National Research Agency; by the Agence Nationale de la Recherche (ref. ANR-22-CE46-0011) and the Luxembourg National Research Fund (FNR) (ref. INTER/ANR/22/17133848), under the UltraBO project; and by the FNR POLLUX program under the SERENITY project (ref. C22/IS/17395419).
Top500 Project, "Top500 International Ranking," https://www.top500. org/, 2024, accessed: 2024-08-12.
B. Gendron and T. G. Crainic, "Parallel Branch-And-Bound Algorithms: Survey and Synthesis," Operations Research, vol. 42, no. 6, pp. 1042-1066, 1994.
I. Chakroun, N. Melab, M. Mezmaz, and D. Tuyttens, "Combining multi-core and GPU computing for solving combinatorial optimization problems," Journal of Parallel and Distributed Computing, vol. 73, no. 12, pp. 1563-1577, 2013. [Online]. Available: https://www. sciencedirect.com/science/article/pii/S0743731513001615
T.-T. Vu and B. Derbel, "Parallel Branch-and-Bound in multi-core multi-CPU multi-GPU heterogeneous environments," Future Generation Computer Systems, vol. 56, pp. 95-109, 2016. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0167739X15003222
J. Gmys, "Exactly Solving Hard Permutation Flowshop Scheduling Problems on Peta-Scale GPU-Accelerated Supercomputers," INFORMS Journal on Computing, vol. 34, pp. 2502-2522, 9 2022.
T. Carneiro, E. Kayraklioglu, G. Helbecque, and N. Melab, "Investigating Portability in Chapel for Tree-Based Optimization on GPU-Powered Clusters," in Euro-Par 2024: Parallel Processing: 30th European Conference on Parallel and Distributed Processing, Madrid, Spain, August 26-30, 2024, Proceedings, Part III. Berlin, Heidelberg: Springer-Verlag, 2024, p. 386-399. [Online]. Available: https://doi.org/10.1007/978-3-031-69583-4 27
G. Helbecque, E. Krishnasamy, N. Melab, and P. Bouvry, "GPUAccelerated Tree-Search in Chapel Versus CUDA and HIP," in 2024 IEEE International Parallel and Distributed Processing Symposium Workshops, 05 2024, pp. 872-879.
G. Helbecque, E. Krishnasamy, T. Carneiro, N. Melab, and P. Bouvry, "A Chapel-based Multi-GPU Branch-and-Bound Algorithm," in Proceedings of Euro-Par Workshops, Madrid, Spain, Aug 2024.
E. Taillard, "Benchmarks for basic scheduling problems," European Journal of Operational Research, vol. 64, no. 2, pp. 278-285, 1993, project Management anf Scheduling.
B. J. Lageweg, J. K. Lenstra, and A. H. G. R. Kan, "A General Bounding Scheme for the Permutation Flow-Shop Problem," Operations Research, vol. 26, no. 1, pp. 53-67, 1978.
J. Milthorpe, X. Wang, and A. Azizi, "Performance portability of the chapel language on heterogeneous architectures," in 2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2024, pp. 6-13.