Hash Semi-Join¶
Hash Semi-Join resides in L1/benchmarks/hash_semi_join
directory.
This project shows FPGA performance of the following query with random-generated data,
implemented with hashSemiJoin primitive.
SELECT SUM(l_extendedprice * (1 - l_discount)) as revenue FROM Lineitem WHERE l_orderkey IN ( SELECT o_orderkey FROM Orders WHERE o_orderdate >= date '1994-01-01' and o_orderdate < date '1995-01-01' ) ;
- Here
Orders
is a self-made table filled with random data, which contains 2 columns namedo_orderkey
ando_orderdate
; Lineitem
is also a table, which contains 3 columns namedl_orderkey
,l_extendedprice
andl_discount
.
Dataset¶
This project uses 32-bit data for numeric fields. To benchmark 64-bit performance, edit host/table_dt.h and make TPCH_INT an int64_t.
Executable Usage¶
- Work Directory(Step 1)
The steps for library download and environment setup can be found in Vitis Database Library. For getting the design,
cd L1/benchmarks/hash_semi_join
- Build kernel(Step 2)
Run the following make command to build your XCLBIN and host binary targeting a specific device. Please be noticed that this process will take a long time, maybe couple of hours.
make run TARGET=hw DEVICE=xilinx_u280_xdma_201920_3 HOST_ARCH=x86
- Run kernel(Step 3)
To get the benchmark results, please run the following command.
./build_dir.hw.xilinx_u280_xdma_201920_3/test_join.exe -xclbin build_dir.hw.xilinx_u280_xdma_201920_3/semi_join.xclbin
Hash Semi-Join Input Arguments:
Usage: test_join.exe -xclbin -xclbin: the kernel name
Note: Default arguments are set in Makefile, you can use other platforms to build and run.
- Example output(Step 4)
------------ TPC-H Q5 Example 2 ------------- Host map buffer has been allocated. Lineitem 6001215 rows Orders 1500000rows Lineitem table has been read from disk Orders table has been read from disk INFO: CPU ref matched 48055 rows, sum = 22623914778545 Found Platform Platform Name: Xilinx Selected Device xilinx_u280_xdma_201920_3 INFO: Importing build_dir.hw.xilinx_u280_xdma_201920_3/semi_join.xclbin Loading: 'build_dir.hw.xilinx_u280_xdma_201920_3/semi_join.xclbin' Kernel has been created DDR buffers have been mapped/copy-and-mapped FPGA result 0: 2262391477.8545 Golden result 0: 2262391477.8545 FPGA execution time of 1 runs: 18914 usec Average execution per run: 18914 usec ---------------------------------------------
Profiling¶
The hash semi-join design is validated on Alveo U280 board at 274 MHz frequency. The hardware resource utilizations are listed in the following table.
Name | LUT | BRAM | URAM | DSP |
Platform | 123976 | 202 | 0 | 4 |
join_kernel | 67562 | 120 | 64 | 3 |
User Budget | 1178744 | 1814 | 960 | 9020 |
Percentage | 5.73% | 6.62% | 6.67% | 0.03% |
- The performance is shown below.
- In above test, table
Lineitem
has 3 columns and 6001215 rows andOrders
does 2 column and 1500000 rows. This means that the design takes 18.914ms to process 80.12MB data, so it achieves 4.14GB/s throughput.