Port Width Widening¶

This example shows how HLS introduces the capability of resizing the port width of the kernel interface ports for better resource utilization maintaining the performance.

KEY CONCEPTS: Interface port width auto widening

KEYWORDS: m_axi_max_widen_bitwidth

This example introduces the capability of how Vitis HLS can configure the size of kernel interface ports.

A few rules must be kept in mind by the user -

Pragma option value defined by user has higher priority than TCL.
The max_widen_bitwidth value should be in range [0, 1024], and it must be either 0 or a power of 2. If not satisfied, this value setting will be ignored.
If some ports are bundled together, one bundle name can have only one max_widen_bitwidth value. Therefore, if each port of a bundle has a different width, the maximum width under the bundle will be taken as the width for each of the ports.

Vitis kernel can have m_axi interface which will be used by host application to configure the kernel. We have 5 kernels here each having the port width set in a a different way -

KERNEL 1 - Default case (no explict settings) - By default, HLS gives single M_AXI interface to access all pointer arguments (i.e. a,b and res here) and default width would be the maximum width datatype (i.e. 64bit here due to uint64_t).

void dot_product_1(const uint32_t *a, const uint32_t *b, uint64_t *res,
                   const int size, const int reps){
loop_reps: for (int i = 0; i < reps; i++) {
 dot_product: for (int j = 0; j < size; j++) {
         res[j] = a[j] * b[j];
     }
 }
}

KERNEL 2 - Auto port width widening when pipeline loop is fixed bound (i.e. DATA_WIDTH), HLS does auto port width widening when pipeline loop is fixed bound. Here pipeline loop dot_product_inner has fixed iteration of DATA_WIDTH, as a result, HLS is widening M_AXI port width to 512bit (Maximum).

#define DATA_WIDTH 16
void dot_product_2(const uint32_t *a, const uint32_t *b, uint64_t *res,
                   const int size, const int reps){
     dot_product_outer: for (int j = 0; j < size; j += DATA_WIDTH) {
     dot_product_inner: for (int k = 0; k < DATA_WIDTH; k++) {
             res[j + k] = a[j + k] * b[j + k];
         }
     }
 }

KERNEL 3 - pragmas specifying multiple bundles to infer multiple M_AXI interfaces. Here we are providing gmem0 to pointer a (Read) and res (write) and gmem1 to pointer b(read).

#define DATA_WIDTH 16
void dot_product_3(const uint32_t *a, const uint32_t *b, uint64_t *res,
                   const int size, const int reps) {
#pragma HLS INTERFACE m_axi port=a bundle=gmem0
#pragma HLS INTERFACE m_axi port=b bundle=gmem1
#pragma HLS INTERFACE m_axi port=res bundle=gmem0
dot_product_outer: for (int j = 0; j < size; j += DATA_WIDTH) {
     dot_product_inner: for (int k = 0; k < DATA_WIDTH; k++) {
             res[j + k] = a[j + k] * b[j + k];
         }
     }
 }

KERNEL 4 - Along with pragma in kernel, user can explicitly provide port width in tcl file (hls_config.tcl) as specified below:

config_interface -m_axi_max_widen_bitwidth 512

The interface size setting need to be specified in hls_config.tcl file. We included this tcl file in our krnl_dot_product_4.cfg file and by using the --config tag in the kernel compile stage we specify the m_axi interface size.

Following is the content of krnl_dot_product_4.cfg file

[hls]
pre_tcl=hls_config.tcl

KERNEL 5 - Interface pragma based port width allocation to each bundle. User can directly specifying portwidth to each M_AXI ports. Here user is setting 512 bit width to gmem0 and 256 bitwidth to gmem1.

void dot_product_5(const uint32_t *a, const uint32_t *b, uint64_t *res,
                   const int size, const int reps) {

#pragma HLS INTERFACE m_axi port=a bundle=gmem0 max_widen_bitwidth=512
#pragma HLS INTERFACE m_axi port=b bundle=gmem1 max_widen_bitwidth=256
#pragma HLS INTERFACE m_axi port=res bundle=gmem0

Below are the resource numbers while running the design on U200 platform:

Design	port_size_a	port_size_b	port_size_res	Bundle_Count	BRAM	LUT	DSP
dot_product_1	64	64	64	1	2	2237	3
dot_product_2	512	512	512	1	15	3665	48
dot_product_3	512	512	512	2	23	5319	48
dot_product_4	512	512	512	2	23	5316	48
dot_product_5	512	256	512	2	19	4939	48

Following is the real log reported while running the design on U200 platform:

Kernel(1000000 iterations)	Wall-Clock Time (sec)
dot_product_1	66.8994
dot_product_2	2.57683
dot_product_3	1.14736
dot_product_4	1.14755
dot_product_5	1.26024

EXCLUDED PLATFORMS:

All NoDMA Platforms, i.e u50 nodma etc

Samsung U.2 SmartSSD

Versal VCK190

All ZCU102 Base Platforms

DESIGN FILES¶

Application code is located in the src directory. Accelerator binary files will be compiled to the xclbin directory. The xclbin directory is required by the Makefile and its contents will be filled during compilation. A listing of all the files in this example is shown below

src/dot_product_1.cpp
src/dot_product_2.cpp
src/dot_product_3.cpp
src/dot_product_4.cpp
src/dot_product_5.cpp
src/host.cpp

Access these files in the github repo by clicking here.

COMMAND LINE ARGUMENTS¶

Once the environment has been configured, the application can be executed by

./port_width_widening <krnl_port_widen XCLBIN>