Streaming Host to Kernel Bandwidth 
==================================

This is a simple Vector Increment C Kernel design with 1 Stream input
and 1 Stream output that demonstrates on how to process an input stream
of data for computation in an application and the Host to Device
streaming bandwidth test.

**KEY CONCEPTS:** Read/Write Stream, Create/Release Stream

**KEYWORDS:** cl_stream, CL_STREAM_EOT, CL_STREAM_NONBLOCKING

This example tests the bandwidth of blocking and non-blocking stream
interface between host and device.

To measure the maximum Bandwidth a wider stream width is used at the
kernel level as below:

.. code:: cpp

   #include "ap_axi_sdata.h"
   typedef qdma_axis<512, 0, 0, 0> pkt;
   void krnl_stream_adder1(hls::stream<pkt> &a, hls::stream<pkt> &output) {
       #pragma HLS INTERFACE axis port=a
       #pragma HLS INTERFACE axis port=output
       #pragma HLS INTERFACE s_axilite port=return bundle=control
   ...
   }

Kernel is enqueued with event object to measure the kernel execution
time as below:

.. code:: cpp

   cl::Event nb_wait_event;
   q.enqueueTask(krnl_adder1, NULL, &nb_wait_event);
   ...
   q.finish(); // wait for all pending enqueue
   unsigned long start, stop;
   wait_event.getProfilingInfo<unsigned long>(CL_PROFILING_COMMAND_START, &start));
   wait_event.getProfilingInfo<unsigned long>(CL_PROFILING_COMMAND_END, &stop));
   unsigned long duration = stop - start;

Following is the log reported while running the design on U200 platform:

::

   Platform Name: Xilinx
   INFO: Reading ./build_dir.hw.xilinx_u200_qdma_201910_1/krnl_stream_adder1.xclbin
   Loading: './build_dir.hw.xilinx_u200_qdma_201910_1/krnl_stream_adder1.xclbin'
   ############################################################
   Blocking Stream
   ############################################################
   [ Case: 1 ] -> Throughput = 5.06 GB/s
   TEST PASSED
   ############################################################
   Non-Blocking Stream
   ############################################################
   [ Case: 2 ] -> Throughput = 5.25 GB/s
   TEST PASSED

EXCLUDED PLATFORMS
------------------

Platforms containing following strings in their names are not supported
for this example :

::

   zc
   xdma
   xilinx_u250_qep
   aws
   samsung

DESIGN FILES
------------

Application code is located in the src directory. Accelerator binary
files will be compiled to the xclbin directory. The xclbin directory is
required by the Makefile and its contents will be filled during
compilation. A listing of all the files in this example is shown below

::

   src/host.cpp
   src/krnl_stream_adder1.cpp

COMMAND LINE ARGUMENTS
----------------------

Once the environment has been configured, the application can be
executed by

::

   ./vadd_stream <krnl_stream_adder1 XCLBIN>