Hello World

This is simple example of vector addition to describe how to use HLS kernels in Vitis Environment.

KEY CONCEPTS: HLS C Kernel, OpenCL Host APIs

KEYWORDS: gmem, bundle, #pragma HLS INTERFACE, m_axi, s_axilite

This example a simple hello world example to explain the Host and Kernel code structure. Here a simple vadd kernel is used to explain the same.

Vitis kernel can have one s_axilite interface which will be used by host application to configure the kernel. Here bundle=control is defined which is s_axilite interface and associated with all the arguments (in1, in2, out_r and size). control interface must also be associated with return.

void vadd(const unsigned int *in1,
          const unsigned int *in2,
          unsigned int *out_r,
          int size)
#pragma HLS INTERFACE s_axilite port = in1 bundle = control
#pragma HLS INTERFACE s_axilite port = in2 bundle = control
#pragma HLS INTERFACE s_axilite port = out_r bundle = control
#pragma HLS INTERFACE s_axilite port = size bundle = control
#pragma HLS INTERFACE s_axilite port = return bundle = control

All the global memory access arguments are associated to m_axi(AXI Master Interface) as below:

#pragma HLS INTERFACE m_axi port=in1  offset=slave bundle=gmem
#pragma HLS INTERFACE m_axi port=in2  offset=slave bundle=gmem
#pragma HLS INTERFACE m_axi port=out_r  offset=slave bundle=gmem

Here all three arguments in1, in2, out_r are associated to bundle gmem which means that one AXI master interface named gmem will be created in Kernel and all these variables will be accessing global memory through this interface. Multiple interfaces can also be created based on the requirements. For example when multiple memory accessing arguments need access to global memory simultaneously, user can create multiple master interfaces and can connect to different arguments.

Rather than reading individual items for addition, local buffers are created in kernel local memory and multiple items are read in a single burst. This is done to achieve low memory access latency and also for efficient use of bandwidth provided by the m_axi interface.

Similarly, results are stored in a buffer and are written to global memory in a burst. The for loops used have the following requirements to implement burst read/write:

  • Pipeline the loop : Loop pipeline must have II (Initiation interval) = 1

  • contiguous memory : Memory addresses for read/write should be contiguous.

read1: for (int j = 0 ; j < chunk_size ; j++){
    #pragma HLS PIPELINE II=1
        v1_buffer[j] = in1[i + j];
    }
write: for (int j = 0 ; j < chunk_size ; j++){
    #pragma HLS PIPELINE II=1
        out_r[i + j] = vout_buffer[j];
    }

DESIGN FILES

Application code is located in the src directory. Accelerator binary files will be compiled to the xclbin directory. The xclbin directory is required by the Makefile and its contents will be filled during compilation. A listing of all the files in this example is shown below

src/host.cpp
src/vadd.cpp

COMMAND LINE ARGUMENTS

Once the environment has been configured, the application can be executed by

./host <vadd XCLBIN>