Array Partition¶
This is a simple example of matrix multiplication (Row x Col) to demonstrate how to achieve better performance by array partitioning, using HLS kernel in Vitis Environment.
KEY CONCEPTS: Kernel Optimization, HLS C Kernel, Array Partition
KEYWORDS: #pragma HLS ARRAY_PARTITION, complete
This example demonstrates how array partition
in HLS kernel can help
to improve the performance. In this example matrix multiplication
functionality is used to showcase the benefit of array partition. Design
contains two kernels “matmul” a simple matrix multiplication and
“matmul_partition” a matrix multiplication implementation using array
partition.
#pragma HLS array partition
is used to partition an array into
multiple smaller arrays or memories. Arrays can be partitioned in three
ways, cyclic
, block
and complete
. In this example,
complete
partition is used to partition one of the dimension of
local Matrix array as below
int B[MAX_SIZE][MAX_SIZE];
int C[MAX_SIZE][MAX_SIZE];
#pragma HLS ARRAY_PARTITION variable = B dim = 2 complete
#pragma HLS ARRAY_PARTITION variable = C dim = 2 complete
This array partition helps design to access 2nd dimension of both Matrix B and C concurrently to reduce the overall latency.
To see the benefit of array partition, user can look into system estimate report and see overall latency. Latency Information of normal matmul kernel (without partition):
Compute Unit Kernel Name Module Name Start Interval Best (cycles) Avg (cycles) Worst (cycles) Best (absolute) Avg (absolute) Worst (absolute)
------------ ----------- ----------- -------------- ------------- ------------ -------------- --------------- -------------- ----------------
matmul_1 matmul matmul 2856 ~ 2859 2855 2857 2858 9.516 us 9.522 us 9.526 us
Latency Information for matrix multiplication for kernel with partition:
Compute Unit Kernel Name Module Name Start Interval Best (cycles) Avg (cycles) Worst (cycles) Best (absolute) Avg (absolute) Worst (absolute)
------------------ ---------------- ---------------- -------------- ------------- ------------ -------------- --------------- -------------- ----------------
matmul_partition_1 matmul_partition matmul_partition 1063 ~ 1066 1062 1064 1065 3.540 us 3.546 us 3.550 us
Example generates the following information as output when ran on Alevo U200 Card:
Found Platform
Platform Name: Xilinx
INFO: Reading ./build_dir.hw.xilinx_u200_qdma_201910_1/matmul.xclbin
Loading: './build_dir.hw.xilinx_u200_qdma_201910_1/matmul.xclbin'
|-------------------------+-------------------------|
| Kernel | Wall-Clock Time (ns) |
|-------------------------+-------------------------|
| matmul: | 396685 |
| matmul: partition | 256367 |
|-------------------------+-------------------------|
Note: Wall Clock Time is meaningful for real hardware execution only, not for emulation.
Please refer to profile summary for kernel execution time for hardware emulation.
TEST PASSED
DESIGN FILES¶
Application code is located in the src directory. Accelerator binary files will be compiled to the xclbin directory. The xclbin directory is required by the Makefile and its contents will be filled during compilation. A listing of all the files in this example is shown below
src/host.cpp
src/matmul.cpp
src/matmul_partition.cpp
COMMAND LINE ARGUMENTS¶
Once the environment has been configured, the application can be executed by
./array_partition <matmul XCLBIN>