Compound Sort¶
Compound Sort example resides in L1/benchmarks/compound_sort
directory.
This benchmark tests the performance of compoundSort primitive with an array of integer keys. This primitive is named as compound sort, as it combines insertSort and mergeSort, to balance storage and compute resource usage.
The tutorial provides a step-by-step guide that covers commands for building and running kernel.
Executable Usage¶
- Work Directory(Step 1)
The steps for library download and environment setup can be found in Vitis Database Library. For getting the design,
cd L1/benchmarks/compound_sort
- Build kernel(Step 2)
Run the following make command to build your XCLBIN and host binary targeting a specific device. Please be noticed that this process will take a long time, maybe couple of hours.
make run TARGET=hw DEVICE=xilinx_u280_xdma_201920_3
- Run kernel(Step 3)
To get the benchmark results, please run the following command.
./build_dir.hw.xilinx_u280_xdma_201920_3/host.exe -xclbin build_dir.hw.xilinx_u280_xdma_201920_3/SortKernel.xclbin
Compound Sort Input Arguments:
Usage: host.exe -xclbin -xclbin compound sort binary
Note: Default arguments are set in Makefile, you can use other platforms to build and run.
- Example output(Step 4)
-----------Sort Design--------------- key length is 131072 [INFO]Running in hw mode Found Platform Platform Name: Xilinx Found Device=xilinx_u280_xdma_201920_3 INFO: Importing build_dir.hw.xilinx_u280_xdma_201920_3/SortKernel.xclbin Loading: 'build_dir.hw.xilinx_u280_xdma_201920_3/SortKernel.xclbin' kernel has been created kernel start------ PASS! Write DDR Execution time 127.131us Kernel Execution time 1129.78us Read DDR Execution time 83.459us Total Execution time 1340.37us ------------------------------------------------------------
Profiling¶
The compound sort design is validated on Alveo U280 board at 287 MHz frequency. The hardware resource utilizations are listed in the following table.
Name | LUT | BRAM | URAM | DSP |
Platform | 142039 | 285 | 0 | 7 |
SortKernel | 62685 | 18 | 16 | 0 |
User Budget | 1160681 | 1731 | 960 | 9017 |
Percentage | 5.40% | 1.04% | 1.67% | 0 |
- The performance is shown below.
- This design takes 1.130ms to process 0.5MB data, so it achieves 442.56MB/s throughput.