Depth-first Tree Search; PGAS; Scalable Data Structures; Work Stealing
Résumé :
[en] The design and implementation of algorithms for increasingly large and complex modern supercomputers requires the definition of data structures and workload distribution mechanisms in a productive and scalable way. In this paper, we propose a PGAS (Partitioned Global Address Space) data structure along with a Work-Stealing mechanism for the class of parallel tree-based algorithms that explore unbalanced trees using the depth-first search strategy. The contribution has been implemented and packaged as an open-source module in the Chapel PGAS language. The experimentation of the contribution in a single-node setting using backtracking applied to fine-grained Unbalanced Tree-Search (UTS) benchmark shows that 68% of the linear speed-up can be achieved. In addition, the scalability of the contribution has been evaluated using the Branch-and-Bound algorithm to solve big instances of the Flowshop Scheduling problem on a large cluster. The reported results reveal that 50% of strong scaling efficiency is achieved using 400 computer nodes (51,200 processing cores).
Disciplines :
Sciences informatiques
Auteur, co-auteur :
HELBECQUE, Guillaume ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > PCOG ; Université de Lille, CNRS/CRIStAL, Centre Inria de l’Université de Lille, France
CARNEIRO, Tiago; Interuniversity Microelectronics Centre, Belgium
MELAB, Nouredine; Université de Lille, CNRS/CRIStAL, Centre Inria de l’Université de Lille, France
GMYS, Jan; Université de Lille, CNRS/CRIStAL, Centre Inria de l’Université de Lille, France
BOUVRY, Pascal ; University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)
Co-auteurs externes :
yes
Langue du document :
Anglais
Titre :
PGAS Data Structure for Unbalanced Tree-Based Algorithms at Scale
Date de publication/diffusion :
29 juin 2024
Nom de la manifestation :
24th International Conference on Computational Science
FNR17133848 - Ultra-scale Computing For Solving Big Optimization Problems, 2022 (01/01/2023-30/06/2026) - Gregoire Danoy
Subventionnement (détails) :
This work is supported by the Agence Nationale de la Recherche (ref. ANR-22-CE46-0011) and the Luxembourg National Research Fund (ref. INTER/ANR/22/17133848), under the UltraBO project.
Blumofe, R.D., Leiserson, C.E.: Scheduling multithreaded computations by work stealing. J. ACM 46(5), 720–748 (1999)
Carneiro Pessoa, T., Gmys, J., de Carvalho Júnior, F.H., et al.: GPU-accelerated backtracking using CUDA Dynamic parallelism. Concurrency Comput. Pract. Experience 30(9), e4374 (2018)
Chamberlain, B.L., Ronaghan, E., Albrecht, B., et al.: Chapel Comes of Age: Making Scalable Programming Productive (2018)
Cong, G., Kodali, S., Krishnamoorthy, S., et al.: Solving large, irregular graph problems using adaptive work-stealing. In: 37th International Conference on Parallel Processing, pp. 536–545 (2008)
van Dijk, T., van de Pol, J.C.: Lace: non-blocking split deque for work-stealing. In: Lopes, L., et al. (eds.) Euro-Par 2014. LNCS, vol. 8806, pp. 206–217. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-14313-2 18
Dinan, J., Larkins, D.B., Sadayappan, P., et al.: Scalable work stealing. In: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis. Association for Computing Machinery (2009)
Gmys, J.: Exactly solving hard permutation flowshop scheduling problems on peta-scale GPU-accelerated supercomputers. INFORMS J. Comput. 34(5), 2502–2522 (2022)
Gmys, J., Leroy, R., Mezmaz, M., et al.: Work stealing with private integer-vector-matrix data structure for multi-core Branch-and-Bound algorithms. Concurrency Comput. Pract. Experience 28(18), 4463–4484 (2016)
Gmys, J., Mezmaz, M., Melab, N., et al.: A computationally efficient Branch-and-Bound algorithm for the permutation flow-shop scheduling problem. Eur. J. Oper. Res. 284(3), 814–833 (2020)
Grama, A., Kumar, V.: Parallel search algorithms for discrete optimization problems. ORSA J. Comput. 7(4), 365–385 (1995)
Helbecque, G., Gmys, J., Carneiro, T., et al.: Productivity-and performance-aware parallel distributed depth-first search, October 2023. https://github.com/Guillaume-Helbecque/P3D-DFS
Lageweg, B.J., Lenstra, J.K., Rinnooy Kan, A.H.G.: A general bounding scheme for the permutation flow-shop problem. Oper. Res. 26(1), 53–67 (1978)
Machado, R., Lojewski, C., Abreu, S., et al.: Unbalanced tree search on a manycore system using the GPI programming model. Comput. Sci. Res. Dev. 26(3), 229–236 (2011)
Olivier, S., et al.: UTS: an unbalanced tree search benchmark. In: Almási, G., Caşcaval, C., Wu, P. (eds.) LCPC 2006. LNCS, vol. 4382, pp. 235–250. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-72521-3 18
Olivier, S., Prins, J.: Scalable dynamic load balancing using UPC. In: 37th International Conference on Parallel Processing, pp. 123–131 (2008)
Taillard, E.: Benchmarks for basic scheduling problems. Eur. J. Oper. Res. 64(2), 278–285 (1993)
Zhang, W.: Branch-and-Bound search algorithms and their computational complexity. Technical report, Defence Technical Information Center (1996)