Abstract :
[en] During the parallel execution of queries in Non-Uniform Memory Access (NUMA) systems, he Operating System (OS) maps the threads (or processes) from modern database systems to the available cores among the NUMA nodes using the standard node-local policy. However, such non-smart mapping may result in inefficient memory activity, because shared data may be accessed by scattered threads requiring large data movements or non-shared data may be allocated to threads sharing the same cache memory, increasing its conflicts. In this paper we present a data-distribution aware and elastic multi-core allocation mechanism to improve the OS mapping of database threads in NUMA systems. Our hypothesis is that we mitigate the data movement if we only hand out to the OS the local optimum number of cores in specific nodes. We propose a mechanism based
on a rule-condition-action pipeline that uses hardware counters to promptly find out the local optimum number of cores. Our mechanism uses a priority queue to track the history of the memory address space used by database threads in order to decide about the allocation/release of cores and its distribution among the NUMA nodes to decrease remote memory access. We implemented and tested a prototype of our mechanism when
executing two popular Volcano-style databases improving their NUMA-affinity. For MonetDB, we show maximum speedup of 1.53 × , due to consistent reduction in the local/remote per-query data traffic ratio of up to 3.87 × running 256 concurrent clients in the 1 GB TPC-H database also showing system energy savings of 26.05%. For the NUMA-aware SQL Server, we observed speedup of up to 1.27 × and reduction on the data traffic ratio of 3.70 ×.
Scopus citations®
without self-citations
4