AI Engine Versal Integration for Hardware Emulation and Hardware

Introduction¶

The Xilinx Versal ACAP is a fully software-programmable, heterogeneous compute platform that combines the PS (Scalar Engine which includes the Arm processors), PL (Adaptable Engines which includes the FPGA fabric) and AI Engines (AI Engines; Intelligent Engines).

This tutorial demonstrates creating a system design running on the AI Engine, PS, and PL and validating the design running on these heterogeneous domains by running Hardware Emulation.

This tutorial steps through hardware emulation and hardware flow in the context of a complete Versal ACAP system integration. A Makefile is provided, which can be modified to suit your own needs in a different context. By default the Makefile is set for hw_emu. If you need to build for hw, add TARGET=hw to the make commands.

IMPORTANT: Before beginning the tutorial make sure you have read and followed the Vitis Software Platform Release Notes (v2020.2) for setting up software and installing the platform. Also, the variable PLATFORM_REPO_PATHS is used to find the platform installation path. Set this variable appropriately. Make sure to run the environment-setup-aarch64-xilinx-linux script as to setup the SDKTARGETSYSROOT environment variable used to define the SYSROOT location.

This tutorial targets the VCK190 ES board (see https://www.xilinx.com/products/boards-and-kits/vck190.html). This board is currently available via early access. If you have already purchased this board, download the necessary files from the lounge and ensure you have the correct licenses installed. If you do not have a board and ES license please contact your Xilinx sales contact.

Objectives¶

After completing this tutorial, you should be able to:

Add input/output ports in an ADF dataflow graph and define their names which will be visible during system integration
Compile HLS functions for integration in the Programmable Logic (PL)
Compile ADF graphs
Create a configuration file that describes system connections and use it during the link stage
Create a software application that runs on Linux
Package the design into an easy-to-boot SD card image

Tutorial Overview¶

Section 1: Compile AI Engine code using the AI Engine compiler and HLS code using v++.

Section 2: Link the AI Engine kernels, and HLS PL kernels with an extensible platform provided.

Section 3: Compile the A72 host code.

Section 4: Create the bootable image.

Section 5: Run the hardware emulation.

The design that will be used is shown in the following figure:

System Diagram

Section 1: Compile PL Kernels and AI Engine Graph¶

The first step is to take any v++ kernels (HLS C) and your AI Engine kernels and graph and compile them into their respective .xo and .o files. You can compile the kernels and graph in parallel because they do not rely on each other at this step.

This tutorial design has three AI Engine kernels (weightsum, average, and classifier), one HLS PL kernel (polar_clip) and two HLS PL kernels (s2mm and mm2s):

Compiling HLS Kernels Using v++¶

To compile the mm2s, s2mm, and polar_clip PL HLS kernels, use the v++ compiler command - which takes in an HLS kernel source and produces an .xo file.

To compile the kernels, run the following command:

make kernels

Or

v++ -c --platform $PLATFORM_REPO_PATHS/xilinx_vck190_es1_base_202020_1/xilinx_vck190_es1_base_202020_1.xpfm --save-temps -g -k s2mm s2mm.cpp -o s2mm.xo
v++ -c --platform $PLATFORM_REPO_PATHS/xilinx_vck190_es1_base_202020_1/xilinx_vck190_es1_base_202020_1.xpfm --save-temps -g -k mm2s mm2s.cpp -o mm2s.xo
v++ -c --platform $PLATFORM_REPO_PATHS/xilinx_vck190_es1_base_202020_1/xilinx_vck190_es1_base_202020_1.xpfm --save-temps -g -k polar_clip polar_clip.cpp -o polar_clip.xo

Looking at the v++ command line, you will see several options. The following table describes each option:

Compiling an AI Engine ADF Graph for V++ Flow¶

An ADF Graph can be connected to an extensible Vitis platform. That is, the graph I/Os can be connected either to platform ports or to ports on Vitis kernels through the v++ connectivity directives.

An AI Engine ADF C++ graph contains AI Engine kernels only.
All interconnections between AI Engine kernels are defined in the C++ graph (graph.h).
All interconnections to external I/Os are fully specified in the C++ simulation testbench (graph.cpp) that instantiates the C++ ADF graph object (this is strictly only used in aiesimulator which is another tutorial). All platform connections from the graph to the “PLIO” map onto ports on the AI Engine subsystem graph that are connected via v++ connectivity directives.
No dangling ports or implicit “connections” are allowed by v++.
Stream connections are specified through the v++ --sc option, including employment of PL-based data movers, either in the platform or defined outside the ADF graph as Vitis PL kernels.

To compile the graph type to be used in either HW or HW_EMU, use:

make aie

Or

aiecompiler --target=hw -include="$XILINX_VITIS/aietools/include" -include="./aie" -include="./data" -include="./aie/kernels" -include="./" --pl-freq=100 -workdir=./Work  aie/graph.cpp

The generated output from aiecompiler is the Work directory, and the libadf.a file. This file contains the compiled AI Engine configuration, graph, and Kernel .elf files.

Section 2: Use V++ to Link AI Engine, HLS Kernels with the Platform¶

After the AI Engine kernels, graph, PL kernel, and HLS kernels have been compiled, you can use v++ to link them with the platform to generate an .xclbin.

Section 2

v++ lets you integrate your AI Engine, HLS, and RTL kernels into an existing extensible platform. This step is where the platform chosen is provided by the hardware designer (or you can opt to use one of the many extensible base platforms provide by Xilinx) and v++ builds the hardware design for you in addition to integrating the AI Engine and PL kernels in the design.

You have a number of kernels at your disposal, but you need to tell the linker how you want to connect them together (from the AI Engine array to PL and vice versa). These connections are described in a configuration file: system.cfg in this tutorial.

[connectivity]
nk=mm2s:1:mm2s
nk=s2mm:1:s2mm
stream_connect=mm2s.s:ai_engine_0.DataIn1
stream_connect=ai_engine_0.DataOut1:s2mm.s

| Option/Flag | Description | | — | — | | nk | This specifies the kernel and how many are there be instantiated. As example, the nk=mm2s:1:mm2s means that the kernel mm2s will instantiate one kernel with the name of mm2s.| | stream_connect/sc | This specifies the streaming connections to be made between PL/AIE or PL/PL. In this case, it should always be an output of a kernel to the input of a kernel.|

NOTE: The v++ command-line can get unruly, and using the system.cfg file can help contain it.

For ai_engine_0 the names are provided in the graph.cpp when instantiating a PLIO object. For this design, as an example, this line PLIO *in0 = new PLIO("DataIn1", adf::plio_32_bits,"data/input.txt"); has the name DataIn1 which is the interface name.

Notice that the polar_clip kernel is not specified in the system.cfg file. This is because the generated graph (libadf.a) contains the kernel information and knows how to connect it up to the AI Engine.

You can see the v++ switches in more detail in the Vitis Unified Software Platform Documentation.

To build the design you can run the follow command:

make xclbin

Or

v++ -l --platform $PLATFORM_REPO_PATHS/xilinx_vck190_es1_base_202020_1/xilinx_vck190_es1_base_202020_1.xpfm s2mm.xo mm2s.xo polar_clip.xo libadf.a -t hw_emu --save-temps -g --config system.cfg -o tutorial.xclbin 

Now you have a generated .xclbin that will be used to execute your design on the platform.

Section 3: Compile the A72 Host Application¶

After all the new AI Engine outputs are created, you can compile your host application by following the typical cross-compilation flow for the Cortex-A72. As you might notice, the host code is using XRT (Xilinx Run Time) as an API to talk to the AI Engine and PL kernels. Notice that in the linker that it is using the the libraries: -ladf_api_xrt -lxrt_coreutil.

Open sw/main.cpp and familiarize yourself with the contents. Pay close attention to API calls and the comments provided.

Note that XRT is used in the host application. This API layer is used to communicate with the programmable logic, specifically the PLIO kernels for reading and writing data. To understand how to use this API in an AI Engine application refer to the “Programming the PS Host Application”.
Open the Makefile, and familiarize yourself with the contents. Take note of the GCC_FLAGS, GCC_INCLUDES.
1. GCC_FLAGS: Self-explanatory that you will be compiling this code with C++ 14. More explanation will be provided in the packaging step.
2. GCC_INCLUDES: Has the list of all the necessary include files from the SYSROOT as well as the AI Engine tools.

Close the Makefile, and run the command:

make host

Or

cd ./sw 
aarch64-linux-gnu-g++ -Wall -c -std=c++14 -Wno-int-to-pointer-cast --sysroot=$SYSROOT -I$SYSROOT/usr/include/xrt -I$SYSROOT/usr/include -I./ -I../aie -I$XILINX_VITIS/aietools/include -I$XILINX_VITIS/include -o aie_control_xrt.o ../Work/ps/c_rts/aie_control_xrt.cpp
aarch64-linux-gnu-g++ -Wall -c -std=c++14 -Wno-int-to-pointer-cast --sysroot=$SYSROOT -I$SYSROOT/usr/include/xrt -I$SYSROOT/usr/include -I./ -I../aie -I$XILINX_VITIS/aietools/include -I$XILINX_VITIS/include -o main.o main.cpp
aarch64-linux-gnu-g++ main.o aie_control_xrt.o -ladf_api_xrt -lxrt_coreutil -L$SYSROOT/usr/lib --sysroot=$SYSROOT -L$XILINX_VITIS/aietools/lib/aarch64.o -o host.exe 
cd ..

The follow table describes some of the GCC options being used:

Section 4: Package the Design¶

With all the AI Engine outputs and the new platform created, you can now generate the Programmable Device Image (PDI) and a package to be used on an SD card. The PDI contains all executables, bitstreams, and configurations of every element of the device, and the packaged SD card directory contains everything to boot Linux and have your generated application and .xclbin.

To package the design, run the following command:

make package

Or

cd ./sw
v++ --package -t hw_emu \
	-f $PLATFORM_REPO_PATHS/xilinx_vck190_es1_base_202020_1/xilinx_vck190_es1_base_202020_1.xpfm \
	--package.rootfs=$PLATFORM_REPO_PATHS/sw/versal/xilinx-versal-common-v2020.2/rootfs.ext4 \
	--package.image_format=ext4 \
	--package.boot_mode=sd \
	--package.kernel_image=$PLATFORM_REPO_PATHS/sw/versal/xilinx-versal-common-v2020.2/Image \
	--package.defer_aie_run \
	--package.sd_file host.exe ../tutorial.xclbin ../libadf.a
cd ..

NOTE: By default the --package flow will create a a.xclbin automatically if the -o switch is not set.

The following table describes the packager options:

Section 5: Run Hardware Emulation¶

After packaging, everything is set to run emulation or hardware.

To run emulation use the following command:

make run_emu

Or

cd ./sw
./launch_hw_emu.sh
cd ..

When launched, use the Linux prompt presented to run the design.

Execute the following command when the emulated Linux prompt displays:

cd /mnt/sd-mmcblk0p1
export XILINX_XRT=/usr
dmesg -n 4 && echo "Hide DRM messages..."

This will set up the design to run emulation. Run the design using the following command:

./host.exe a.xclbin

You should see an output displaying TEST PASSED. When this is shown, run the keyboard command: Ctrl+A x to end the QEMU instance.

Section 6: Build and Run on Hardware¶

To build for hardware run the following command:

make xclbin TARGET=hw

Or

v++ -l --platform $PLATFORM_REPO_PATHS/xilinx_vck190_es1_base_202020_1/xilinx_vck190_es1_base_202020_1.xpfm s2mm.xo mm2s.xo polar_clip.xo libadf.a -t hw --save-temps -g --config system.cfg -o tutorial.xclbin 

Then re-run the packaging step with:

make package TARGET=hw

Or

cd ./sw
v++ --package -t hw \
	-f $PLATFORM_REPO_PATHS/xilinx_vck190_es1_base_202020_1/xilinx_vck190_es1_base_202020_1.xpfm \
	--package.rootfs=$PLATFORM_REPO_PATHS/sw/versal/xilinx-versal-common-v2020.2/rootfs.ext4 \
	--package.image_format=ext4 \
	--package.boot_mode=sd \
	--package.kernel_image=$PLATFORM_REPO_PATHS/sw/versal/xilinx-versal-common-v2020.2/Image \
	--package.defer_aie_run \
	--package.sd_file host.exe ../tutorial.xclbin ../libadf.a
cd ..

When you run on hardware, ensure you have a supported SD card. Format the SD card with the sw/sd_card.img file. Then plug the SD card into the board and power it up.

When a Linux prompt appears, run the following commands:

dmesg -n 4 && echo "Hide DRM messages..."
cd /mnt/sd-mmcblk0p1
export XILINX_XRT=/usr
./host.exe a.xclbin

You should see TEST PASSED. You have successfully run your design on hardware.

IMPORTANT: To rerun the application you need to power cycle the board.

Summary¶

In this tutorial you learned the following:

How to compile PLIO and PL Kernels using v++ -c
How to link the libadf.a, PLIO and PL kernels to the xilinx_vck190_es1_202020_1 platform
How to package your host code, and the generated xclbin and libadf.a into an SD card directory
How to execute the design on the board
How to execute the design for hardware emulation

To read more about the use of Vitis in the AI Engine flow see: UG1076: Versal ACAP AI Engine Programming Environment Chapter 13: Integrating the Application Using the Vitis Tool Flow.

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

^XD002