GPU-Accelerated Tree-Search in Chapel Versus CUDA and HIP

[en] In the context of exascale programming, the PGAS-based Chapel is among the rare languages targeting the holistic handling of high-performance computing issues including the productivity-aware harnessing of Nvidia and AMD GPUs. In this paper, we propose a pioneering proof-of-concept dealing with this latter issue in the context of tree-based exact optimization. Actually, we revisit the design and implementation of a generic multi-pool GPU-accelerated tree-search algorithm using Chapel. This algorithm is instantiated on the backtracking method and experimented on the N-Queens problem. For performance evaluation, the Chapel-based approach is compared to Nvidia CUDA and AMD HIP low-level counterparts. The reported results show that in a single-GPU setting, the high GPU abstraction of Chapel results in a loss of only 8% (resp. 16%) compared to CUDA (resp. HIP). In a multi-GPU setting, up to 80% (resp. 71%) of the baseline speedup is achieved for coarse-grained problem instances on Nvidia (resp. AMD) GPUs.

Disciplines :

Computer science

Author, co-author :

HELBECQUE, Guillaume ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > PCOG ; Université de Lille, CNRS/CRIStAL UMR 9189, Centre Inria de l'Université de Lille, France

KRISHNASAMY, Ezhilmathi ; University of Luxembourg > Faculty of Science, Technology and Medicine > HPC Platform > High Level Support Team

MELAB, Nouredine; Université de Lille, CNRS/CRIStAL UMR 9189, Centre Inria de l'Université de Lille, France

BOUVRY, Pascal ; University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)

External co-authors :

yes

Language :

English

Title :

GPU-Accelerated Tree-Search in Chapel Versus CUDA and HIP

Publication date :

2024

Event name :

14th IEEE Workshop Parallel / Distributed Combinatorics and Optimization

Event place :

San Francisco, United States

Event date :

May 31, 2024

Audience :

International

Main work title :

2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Publisher :

Institute of Electrical and Electronics Engineers Inc.

Pages :

872-879

Peer reviewed :

Peer reviewed

Additional URL :

https://ieeexplore.ieee.org/document/10596398

FnR Project :

FNR17133848 - Ultra-scale Computing For Solving Big Optimization Problems, 2022 (01/01/2023-30/06/2026) - Gregoire Danoy

Funding text :

The second author is supported by FNR CORE (ref. U-AGR-7213-00-V), while the others are supported by the Agence Nationale de la Recherche (ref. ANR-22-CE46-0011) and the Luxembourg National Research Fund (ref. INTER/ANR/22/17133848).

Available on ORBilu :

since 18 November 2024

Statistics

Number of views

110 (0 by Unilu)

Number of downloads

63 (0 by Unilu)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

OpenCitations

OpenAlex citations

WoS citations^™

Bibliography

E. Krishnasamy, "Hybrid CPU-GPU Parallel Simulations of 3D Front Propagation," 2014, Dissertation, Linköping University.
E. Krishnasamy, M. Sourouri, and X. Cai, "Multi-GPU Implementations of Parallel 3D Sweeping Algorithms with Application to Geological Folding," Procedia Computer Science, vol. 51, pp. 1494-1503, 2015.
D. Strigl, K. Kofler, and S. Podlipnig, "Performance and Scalability of GPU-Based Convolutional Neural Networks," in 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, 2010, pp. 317-324.
M. E. Lalami and D. El-Baz, "GPU Implementation of the Branch and Bound Method for Knapsack Problems," in 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012, pp. 1769-1777.
N. Melab, I. Chakroun, M. Mezmaz, and D. Tuyttens, "A GPUaccelerated Branch-and-Bound Algorithm for the Flow-Shop Scheduling Problem," in 2012 IEEE International Conference on Cluster Computing, 2012, pp. 10-17.
I. Chakroun, N. Melab, M. Mezmaz, and D. Tuyttens, "Combining multi-core and GPU computing for solving combinatorial optimization problems," Journal of Parallel and Distributed Computing, vol. 73, no. 12, pp. 1563-1577, 2013.
J. Gmys, M. Mezmaz, N. Melab, and D. Tuyttens, "IVM-based parallel branch-and-bound using hierarchical work stealing on multi-GPU systems," Concurrency and Computation: Practice and Experience, vol. 29, no. 9, p. e4019, 2017.
K. Yelick, D. Bonachea, W.-Y. Chen, P. Colella, K. Datta, J. Duell, S. L. Graham, P. Hargrove, P. Hilfinger, P. Husbands, C. Iancu, A. Kamil, R. Nishtala, J. Su, M. Welcome, and T. Wen, "Productivity and performance using partitioned global address space languages," in Proceedings of the 2007 International Workshop on Parallel Symbolic Computation, 2007, p. 24-32.
T. M. Mintz, O. Hernandez, and D. E. Bernholdt, "A Global View Programming Abstraction for Transitioning MPI Codes to PGAS Languages," in OpenSHMEM and Related Technologies. Experiences, Implementations, and Tools, 2014, pp. 120-133.
D. Cunningham, R. Bordawekar, and V. Saraswat, "GPU programming in a high level language: Compiling X10 to CUDA," in Proceedings of the 2011 ACM SIGPLAN X10 Workshop, 2011, pp. 1-10.
L. Chen, L. Liu, S. Tang, L. Huang, Z. Jing, S. Xu, D. Zhang, and B. Shou, "Unified Parallel C for GPU Clusters: Language Extensions and Compiler Implementation," in Languages and Compilers for Parallel Computing, 2011, pp. 151-165.
A. Hayashi, S. R. Paul, and V. Sarkar, "A Multi-Level Platform- Independent GPU API for High-Level Programming Models," in High Performance Computing. ISC High Performance 2022 International Workshops, 2022, pp. 90-107.
B. L. Chamberlain, E. Ronaghan, B. Albrecht, L. Duncan, M. P. Ferguson, B. Harshbarger, D. Iten, D. Keaton, V. Litvinov, P. Sahabu, and G. Titus, "Chapel Comes of Age: Making Scalable Programming Productive," 2018, Cray Inc.
T. Carneiro, N. Melab, A. Hayashi, and V. Sarkar, "Towards Chapelbased Exascale Tree Search Algorithms: Dealing with multiple GPU accelerators," in 18th International Conference on High Performance Computing & Simulation, 2021.
A. Grama and V. Kumar, "Parallel Search Algorithms for Discrete Optimization Problems," ORSA Journal on Computing, vol. 7, no. 4, pp. 365-385, 1995.
B. Gendron and T. G. Crainic, "Parallel Branch-and-Branch Algorithms: Survey and Synthesis," Operations Research, vol. 42, no. 6, pp. 1042-1066, 1994.
T. B. Preußer and M. R. Engelhardt, "Putting Queens in Carry Chains, No27," J Sign Process Syst, vol. 88, pp. 185-201, 2017.
"HIPIFY Documentation," 2024, Advanced Micro Devices Inc., https://rocm.docs.amd.com/projects/HIPIFY/en/latest/.
G. Helbecque, J. Gmys, N. Melab, T. Carneiro, and P. Bouvry, "Parallel distributed productivity-aware tree-search using Chapel," Concurrency and Computation: Practice and Experience, vol. 35, no. 27, p. e7874, 2023.
G. Helbecque, E. Krishnasamy, N. Melab, and P. Bouvry, "GPUaccelerated tree-search in Chapel," Zenodo, version 1.0, 2024.