Streaming Host to Kernel Bandwidth

This is a simple Vector Increment C Kernel design with 1 Stream input and 1 Stream output that demonstrates on how to process an input stream of data for computation in an application and the Host to Device streaming bandwidth test.

KEY CONCEPTS: Read/Write Stream, Create/Release Stream

KEYWORDS: cl_stream, CL_STREAM_EOT, CL_STREAM_NONBLOCKING

This example tests the bandwidth of blocking and non-blocking stream interface between host and device.

To measure the maximum Bandwidth a wider stream width is used at the kernel level as below:

#include "ap_axi_sdata.h"
typedef qdma_axis<512, 0, 0, 0> pkt;
void krnl_stream_adder1(hls::stream<pkt> &a, hls::stream<pkt> &output) {
    #pragma HLS INTERFACE axis port=a
    #pragma HLS INTERFACE axis port=output
    #pragma HLS INTERFACE s_axilite port=return bundle=control
...
}

Kernel is enqueued with event object to measure the kernel execution time as below:

cl::Event nb_wait_event;
q.enqueueTask(krnl_adder1, NULL, &nb_wait_event);
...
q.finish(); // wait for all pending enqueue
unsigned long start, stop;
wait_event.getProfilingInfo<unsigned long>(CL_PROFILING_COMMAND_START, &start));
wait_event.getProfilingInfo<unsigned long>(CL_PROFILING_COMMAND_END, &stop));
unsigned long duration = stop - start;

Following is the log reported while running the design on U200 platform:

Platform Name: Xilinx
INFO: Reading ./build_dir.hw.xilinx_u200_qdma_201910_1/krnl_stream_adder1.xclbin
Loading: './build_dir.hw.xilinx_u200_qdma_201910_1/krnl_stream_adder1.xclbin'
############################################################
Blocking Stream
############################################################
[ Case: 1 ] -> Throughput = 5.06 GB/s
TEST PASSED
############################################################
Non-Blocking Stream
############################################################
[ Case: 2 ] -> Throughput = 5.25 GB/s
TEST PASSED

EXCLUDED PLATFORMS

Platforms containing following strings in their names are not supported for this example :

zc
xdma
xilinx_u250_qep
aws
samsung

DESIGN FILES

Application code is located in the src directory. Accelerator binary files will be compiled to the xclbin directory. The xclbin directory is required by the Makefile and its contents will be filled during compilation. A listing of all the files in this example is shown below

src/host.cpp
src/krnl_stream_adder1.cpp

COMMAND LINE ARGUMENTS

Once the environment has been configured, the application can be executed by

./vadd_stream <krnl_stream_adder1 XCLBIN>