Two Parallel Read/Write on Local Memory (OpenCL Kernel)¶

This is simple example of vector addition to demonstrate how to utilized both ports of Local Memory.

KEY CONCEPTS: Kernel Optimization, 2port BRAM Utilization, two read/write Local Memory

This is a simple example to demonstrate how to utilize both ports of local memory in kernels.

Kernel’s local memory is usually BRAM which has two ports for read/write. In loops where one iteration doesn’t depend on previous iterations, two ports can be used to improve the performance of the kernel.

Two ports can be utilized concurrently by using opencl_unroll_hint. The unroll attribute transforms loops by creating multiples copies of the loop body in the register transfer level (RTL) design, which allows some or all loop iterations to occur in parallel.

__attribute__((opencl_unroll_hint(2)))

Here loop is unrolled by a factor of 2 thus two iterations of the loop are executed concurrently. In this case, two ports of BRAM will be utilized rather than 1 reducing the total loop latency by half approximately.

EXCLUDED PLATFORMS:

All NoDMA Platforms, i.e u50 nodma etc

Samsung U.2 SmartSSD

DESIGN FILES¶

Application code is located in the src directory. Accelerator binary files will be compiled to the xclbin directory. The xclbin directory is required by the Makefile and its contents will be filled during compilation. A listing of all the files in this example is shown below

src/host.cpp
src/vadd.cl

Access these files in the github repo by clicking here.

COMMAND LINE ARGUMENTS¶

Once the environment has been configured, the application can be executed by

./cl_lmem_2rw <vadd XCLBIN>