Internals of Hash-Multi-JoinΒΆ

This document describes the structure and execution of Hash-Multi-Join, implemented as hashMultiJoin function. Hash-Multi-Join is a combination Join algorithm which can perform Hash-Join, Hash-Semi-Join and Hash-Anti-Join with programmable ability. It is based on the framework of Hash-Join-v3 as shown in the picture below:

Hash Join MPU Structure

The number of PU is set to 8, as each PU requires two dedicated bank to temporarily store rows in small table (one for base rows and another for overflow rows). Due to DDR/HBM memory access delay, 4 channels can serve enough data to these PUs. Each PU performs Hash-Join in 3 phases, and the detail is described in Internals of Hash-Join-v3 and Hash-Build-Probe-v3. The control of Multi-Join function is shown below:

Flag Type
JOIN_INNER Hash-Join
JOIN_SEMI Hash-Semi-Join
JOIN_ANTI Hash-Anti-Join

Important

Make sure the size of small table is smaller than the size of HBM buffers. Small table and big table should be fed only ONCE.

Caution

Currently, this primitive expects unique key in small table.

The primitive has only one port for key input and one port for payload input. If your tables are joined by multiple key columns or has multiple columns as payload, please use combineCol to merge the column streams, and use splitCol to split the output to columns.

There is a deep relation in the template parameters of the primitive. In general, the maximum capacity of rows and depth of hash entry is limited by the size of HTB. Each PU has one HTB in this design, and the size of one HTB is equal to the size of one pseudo-channel in HBM. Here is an example of row capacity when PU=8:

HTB Size Key Size Payload Size Row Capacity Hash Entry Max Depth for Hash Entry Overflow Capacity
8x256MB 32 bit 32 bit 64M 1M 63 (base rows take 63M) 1M
8x256MB 128 bit 128 bit 16M 1M 15 (base rows take 15M) 1M

The Number of hash entry is limited by the number of URAM in a single SLR. For example, there are 320 URAMs in a SLR of U280, and 1M hash entry will take 192 URAMs (96 URAMs for base hash counter + 96 URAMs for overflow hash counter). Because the number of hash entry must be the power of 2, 1M hash entry is the maximum for U280 to avoid crossing SLR logic which will lead to bad timing performance of the design.