Hello World (OpenCL Kernel) =========================== This example is a simple OpenCL application. It will highlight the basic flow of an OpenCL application. **KEY CONCEPTS:** `OpenCL Host APIs <https://docs.xilinx.com/r/en-US/ug1393-vitis-application-acceleration/OpenCL-Programming>`__ This example introduces the basic structure of host and kernel code. Kernel ``vadd`` optimizes a simple vector addition. Kernel arguments with ``global`` keyword specify the inputs read from the global memory and outputs written back to it. If the number of loop iterations are unknown, tool cannot report the loop latency. In these cases, ``xcl_loop_tripcount`` can be used to specify minimum and maximum number of iterations to help the tool to analyze how the loop’s latency contributes to the total latency of the application. .. code:: cpp __attribute__((xcl_loop_tripcount(c_len, c_len))) Loop iterations are done sequentially on a CPU, i.e. it will take multiple clock cycles for a new iteration to begin execution. FPGA can pipeline these iterations to increase the throughput of the application by beginning a new iteration every cycle. ``xcl_pipeline_loop`` attribute is used to implement pipeline and to mention the *Initiation interval* (No. of clock cycles before a new loop iteration can be started). .. code:: cpp __attribute__((xcl_pipeline_loop(1))) Rather than reading individual items for addition, buffers are created in kernel’s local memory and multiple items are read in a single burst. This is done to achieve low memory access latency and also for efficient use of bandwidth provided by the ``AXI`` interface. Similarly, results are stored in a buffer and are written to global memory in a burst. The for loops used have the following requirements to implement burst read/write: - Pipeline the loop : Loop pipeline must have ``II`` (Initiation interval) = 1 - Aligned memory : Memory addresses for read/write should be contiguous. .. code:: cpp __attribute__((xcl_pipeline_loop(1))) readB: for (int j = 0 ; j < size ; j++) { arrayB[j] = b[i+j]; } **EXCLUDED PLATFORMS:** - All NoDMA Platforms, i.e u50 nodma etc - Alveo U250 DESIGN FILES ------------ Application code is located in the src directory. Accelerator binary files will be compiled to the xclbin directory. The xclbin directory is required by the Makefile and its contents will be filled during compilation. A listing of all the files in this example is shown below :: src/host.cpp src/vector_addition.cl Access these files in the github repo by `clicking here <https://github.com/Xilinx/Vitis_Accel_Examples/tree/master/ocl_kernels/cl_helloworld>`__. COMMAND LINE ARGUMENTS ---------------------- Once the environment has been configured, the application can be executed by :: ./cl_helloworld <vector_addition XCLBIN>