AI Engine Development

See Vitis™ Development Environment on xilinx.com
See Vitis-AI™ Development Environment on xilinx.com

# Asynchronous Update of Array RTP Update for AI Engine Kernel This step demonstrates: * The Array RTP update for AI Engine kernels * C XRT API to control graph and PL kernels executions The example is similar to [Asynchronous Update of Array RTP](./step3_async_array.md), except that the random noise generator is not free-running in this example. The system to be implemented is as follows. ![missing image](./images/figure8.PNG) __Note__: The default working directory for this step is "step4", unless explicitly specified otherwise. ### Review Graph and RTP Code In the graph definition (`aie/graph.h`), the RTP declaration and connection are added as follows: port coefficients; connect< parameter >(size, async(fir24.in[1])); In `aie/graph.cpp` (for AI Engine simulator): gr.init(); //run for 16 iterations, update narrow filter coefficients, wait, update wide filter coefficients, run for 16 iterations gr.update(gr.coefficients, narrow_filter, 12); gr.run(16); // 16 iterations for AIE kernel gr.wait(); gr.update(gr.coefficients, wide_filter, 12); gr.run(16); gr.end(); ### Run AI Engine Compiler and AI Engine Simulator Compile the AI Engine graph (`libadf.a`) using the AI Engine compiler: make aie The corresponding AI Engine compiler command is: aiecompiler -platform=/xilinx_vck190_es1_base_202110_1/xilinx_vck190_es1_base_202110_1.xpfm -include="./aie" -include="./data" -include="./aie/kernels" -include="./" --pl-axi-lite=true -workdir=./Work aie/graph.cpp After the AI Engine graph (`libadf.a`) has been generated, verify for correctness using the AI Engine simulator: make aiesim ## Run Hardware Cosimulation and Hardware Flow For the PL kernel (`random_noise`) , an additional parameter `size` has been added, which is different from the free-running kernel in [previous step](https://gitenterprise.xilinx.com/brucey/AIE_RTP_tutorials/blob/master/step3_async_array.md). A loop is also added in the kernel to iterate `size` times. The code in `pl_kernels/random_noise.cpp` is as follows: extern "C" void random_noise(hls::stream > & out, int size) { #pragma HLS INTERFACE axis port=out #pragma HLS INTERFACE s_axilite port=return bundle=control #pragma HLS INTERFACE ap_ctrl_hs port=return #pragma HLS interface s_axilite port=size bundle=control for(int i=0;i *host_out = (std::complex*)xrtBOMap(out_bohdl); // s2mm ip xrtKernelHandle s2mm_khdl = xrtPLKernelOpen(dhdl, uuid, "s2mm"); // Open kernel handle xrtRunHandle s2mm_rhdl = xrtRunOpen(s2mm_khdl); xrtRunSetArg(s2mm_rhdl, 0, out_bohdl); // set kernel arg xrtRunSetArg(s2mm_rhdl, 2, OUTPUT_SIZE); // set kernel arg xrtRunStart(s2mm_rhdl); //launch s2mm kernel // random_noise ip xrtKernelHandle random_noise_khdl = xrtPLKernelOpen(dhdl, uuid, "random_noise"); xrtRunHandle random_noise_rhdl = xrtRunOpen(random_noise_khdl); xrtRunSetArg(random_noise_rhdl, 1, OUTPUT_SIZE); xrtRunStart(random_noise_rhdl); printf("run random_noise\n"); // update graph parameters (RTP) & run ... // wait for s2mm done auto state = xrtRunWait(s2mm_rhdl); // Transfer data from global memory in device back to host memory xrtBOSync(out_bohdl, XCL_BO_SYNC_BO_FROM_DEVICE , output_size_in_bytes,/*OFFSET=*/ 0); // Post-processing of data After post-processing the data, release the allocated objects: gr.end(); xrtRunClose(s2mm_rhdl); xrtKernelClose(s2mm_khdl); xrtRunClose(random_noise_rhdl); xrtKernelClose(random_noise_khdl); xrtBOFree(out_bohdl); xrtDeviceClose(dhdl); The XRT API version to control graph exectuion is as follows: auto ghdl=xrtGraphOpen(dhdl,uuid,"gr"); ret=xrtGraphUpdateRTP(ghdl,"gr.fir24.in[1]",(char*)narrow_filter,12*sizeof(int)); ret=xrtGraphRun(ghdl,16); ret=xrtGraphWait(ghdl,0); ret=xrtGraphUpdateRTP(ghdl,"gr.fir24.in[1]",(char*)wide_filter,12*sizeof(int)); ret=xrtGraphRun(ghdl,16); You can see that the XRT API to open, run, wait, and RTP update graph are: xrtGraphOpen(), xrtGraphRun(), xrtGraphWait(), xrtGraphUpdateRTP(). For more information about XRT APIs on graphs, see the *Versal ACAP AI Engine Programming Environment User Guide* (UG1076). Run the following `make` command to build the host exectuable file: make host Notice the following linker script links the libraries `adf_api_xrt` and `xrt_coreutil`. these are necessary for the `adf` API to work together with the XRT API. ${CXX} -o ../host.exe aie_control_xrt.o host.o -ladf_api_xrt -lgcc -lc -lxilinxopencl -lxrt_coreutil -lpthread -lrt -ldl -lcrypt -lstdc++ -L${SDKTARGETSYSROOT}/usr/lib/ --sysroot=${SDKTARGETSYSROOT} -L$(XILINX_VITIS)/aietools/lib/aarch64.o Run the following `make` command to build all necessary files and launch HW cosimulation: make run_hw_emu In the Linux prompt, run following commands: mount /dev/mmcblk0p1 /mnt cd /mnt export XILINX_XRT=/usr export XCL_EMULATION_MODE=hw_emu ./host.exe a.xclbin To exit QEMU press Ctrl+A, x For hw mode, run following `make` command to generate an SD card package: make package TARGET=hw In hardware, after booting Linux from the SD card, run following commands in the Linux prompt: export XILINX_XRT=/usr cd /mnt/sd-mmcblk0p1 ./host.exe a.xclbin The host code is self-checking. It will check the output data against the golden data. If the output matches the golden data, after the run is complete, it will print the following: TEST PASSED ### Conclusion In this step you learned about: * Asynchronous array RTP * Launching AI Engine simulator, HW cosimulation, and HW Next, review [Asynchronous Array RTP Update and Read for AI Engine Kernel](./step5_async_array_update_read.md).