AI Engine Development

See Vitis™ Development Environment on xilinx.com
See Vitis-AI™ Development Environment on xilinx.com

# Asynchronous Array RTP Update and Read for AI Engine Kernel This step demonstrates: * The Array RTP update for AI Engine kernels * Asynchronous read of Array RTP for AI Engine kernels * C++ version of XRT API to control PL kernels * C++ version of XRT API to control graph execution The example design is similar to [Asynchronous Update of Scalar RTP for PL inside a Graph, and Array RTP Update for AI Engine Kernel](./step4_async_aie_array.md), except that the AI Engine kernel has an asynchronous output, and the PL kernel inside the graph is pulled out of the graph. The example shows how to perform asynchronous reads of array RTP by the ADF API and the XRT API. The PL kernels and graph execution are controlled by the C++ version of the XRT API. This differs from the previous steps which used the OpenCL API or the C version of the XRT API. The system to be implemented is as follows: ![missing image](./images/figure9.PNG) __Note__: The default working directory in this step is "step5", unless explicitly specified otherwise. ### Review Graph and RTP Code In the AI Engine kernel code (`aie/kernels/hb24.cc`), the interface is declared as: void fir24_sym(input_window_cint16 *iwin, output_window_cint16 *owin, const int32 (&coeffs)[12], int32 (&coeffs_readback)[12]); For the RTP array input, `const` is used for the array reference. From the graph, the RTP port can only be `input` or `inout`. The `inout` port in the graph can only be read by the PS program, it cannot be written by the PS program. Therefore, another port `coeffs_readback` is defined to read back the coefficient. In the graph definition (`aie/graph.h`), the RTP declaration and connection are added as follows: port coefficients; port coefficients_readback; connect< parameter >(coefficients, async(fir24.in[1])); connect< parameter >(async(fir24.inout[0]),coefficients_readback); In `aie/graph.cpp` (for AI Engine simulator), the RTP update and read commands are: gr.update(gr.coefficients, narrow_filter, 12); gr.run(16); // start PL kernel & AIE kernel gr.read(gr.coefficients_readback,coeffs_readback,12); std::cout<<"Coefficients read back are:"; for(int i=0;i<12;i++)std::cout<*>(); //kernel run auto s2mm_run = s2mm(out_bo, nullptr, OUTPUT_SIZE);//1st run for s2mm has started auto random_noise_run = random_noise(nullptr, OUTPUT_SIZE); ... //About graph control // wait for s2mm done auto state = s2mm_run.wait(); std::cout << "s2mm completed with status(" << state << ")\n"; out_bo.sync(XCL_BO_SYNC_BO_FROM_DEVICE); ... //Post-processing The `adf` API to control graph execution is similar to that used in the previous step. In this step, the C++ verion of XRT API to control graph execution is introduced. They can be switched by a user-defined macro `__USE_ADF_API__`. The C++ XRT API to update and read the array RTP is as follows: int narrow_filter[12] = {180, 89, -80, -391, -720, -834, -478, 505, 2063, 3896, 5535, 6504}; int wide_filter[12] = {-21, -249, 319, -78, -511, 977, -610, -844, 2574, -2754, -1066, 18539}; std::cout<<"size of cofficient read back:"<^{XD001 | © Copyright 2021 Xilinx, Inc.}