AI Engine DevelopmentSee Vitis™ Development Environment on xilinx.com See Vitis™ AI Development Environment on xilinx.com
Python and C++ External Traffic Generators for AI Engine Simulation and Emulation Flows¶
Version: Vitis 2022.1
Versal™ adaptive compute acceleration platforms (ACAPs) combine Scalar Engines, Adaptable Engines, and Intelligent Engines with leading-edge memory and interfacing technologies to deliver powerful heterogeneous acceleration for any application. In a bottom-up approach, each part of system is simulated independently before being integrated in a more complete simulation. An heterogeneous device as the Versal ACAP authorizes dataflows to go through a diversity of engines for completion.
This tutorial develops a case in which the dataflow goes back and forth multiple times between the programmable logic (PL) and the AI Engine array. Some PL blocks are only source or destination kernels, whereas others are processing kernels within the dataflow. This tutorial demonstrates how to create external traffic generators as Python scripts or C++ applications to exercise the AI Engine kernels in the x86 simulator (
x86simulator), AI Engine simulator (
aiesimulator), and in hardware emulation (
After completing the tutorial, you will be able to do the following:
Create a Python script or C++ application.
Pass data between the traffic generator and the AI Engine through specific ports.
Capture and send data and display using Python.
Capture and send data using C++.
Compile and simulate a design.
Understand the necessary code changes in the graph and host to make the design work seamlessly between
Bring up results in Vitis Analyzer.
Before You Begin¶
This tutorial uses Python. In addition to having the Xilinx tools installed, you also need a valid installation of Python 3.
Python Environment Setup¶
The external traffic generators use Python and require non-standard packages to be installed. Perform the following steps to install these packages.
Make sure you are using the latest version of Python 3. This tutorial has been developed with version 3.6.5. Run the following command to check the version:
Install the appropriate packges using
pip -r install requirements.txt
This file contains the following packages:
numpy multiprocessing struct matplotlib
Validate your environment by running the following command. If errors are reported during import, rerun
pipand install the packages manually.
python3 -c 'import numpy, matplotlib, struct, multiprocessing'
Add the provided packages for external traffic generators to the
PYTHONPATH. Run the following command:
Xilinx Tools Initialization¶
Before starting this tutorial, run the following steps.
Set up the following paths in the script
export XILINX_TOOLS_LOCATION=<Path to Vitis Build - Directory>/Vitis/2022.1 export PLATFORM_REPO_PATHS=<YOUR-PLATFORMS-DIRECTORY> export XILINX_VERSAL_SW=<Path to xilinx-versal-common-v2022.1 - Directory> export XILINX_XRT=/<user-path>/opt/xilinx/xrt export PYTHON3_LOCATION=<user-path>
The script sets the necessary paths to run the tutorial:
SDKTARGETSYSROOTfor host software compilation.
LIBRARY_PATHto handle external traffic generator handles.
PYTHONPATHfor the Python-based external traffic generator.
This tutorial targets the VCK190 board, which is currently available on the Xilinx website. Because this tutorial is about simulation, you do not need to have the real hardware.
source env_setup.shscript in the console.
This tutorial is based on a basic design, as shown below. This design contains two AI Engine kernels with an intermediate kernel in the PL. The overall system is fed and flushed from kernels that are also in the PL.
In a standard simulation scheme, you would have to perform several steps:
aiesimulator, you would have to replace
mm2sand the output of
polar_clipwith text test vectors and verify that the input of
s2mmis as intended.
hw_emu, you would have to build the three PL kernels
s2mmeither from RTL code or HLS, link them with Vitis compiler to create the XCLBIN file, create a complete host application that also verifies the output of the system, and then simulate using
This tutorial shows you how to write Python scripts and C++ traffic generators to replace these text files that you would have to create offline. It also allows you to simulate the design in various different ways:
With text files as test vectors
With external traffic generators in Python or C++
A standard simulation with text files would be represented as shown below:
When using external traffic generators, communication with the simulator is achieved through Unix sockets, as shown below:
Step 1: ADF Graph Modifications¶
To use external traffic generators for any kind of simulation, you need to make modifications to the graph code, specifically the
graph.h file. This file contains the PLIO constructors, which are used to connect the graph to the programmable logic.
Notice that the
#ifdef EXTERNAL_IOis used and the lines of code under it do not have the data file in the PLIO constructors. This is needed for the external traffic generator to work properly, because the data file (seen on lines 85 to 88) take precedence.
Take note of the names (first argument) of the PLIO constructors. These will be used to hook up the external traffic generators.
Note: Code guarding this is optional. It is used in this instance to show the changes needed. These modifications are simple. Remove the filenames of the test vectors, and the simulator (AI Engine or x86) automatically takes responsibility for creating ports to connect Unix sockets.
Step 3: External Traffic Generators¶
The overall goal of the external traffic generator is to send or receive data to or from the AI Engine array through a specific port. The sender can generate data on the fly or read it from a file. The receiver can keep data and save it somewhere, or process it in a function. When the external traffic generator takes the place of a PL kernel that performs processing, it can use a Python/C++ model of the functionality or even use the original HLS function.
To test this design, you can use a Python script as an external traffic generator. Each kernel of the AI Engine is tested separately to show that multiple traffic generators can be run in parallel. The following image shows the general flow of the Python to be used in
The general overview of what the script does as follows:
Read in two input files (
Convert the data from the input files into byte arrays.
Transmit the byte arrays using AXI transactions from the XTLM Python package.
Read the data from AXI transactions and convert it to
numpyarrays for plotting.
Save output data and plot the input/output data to see the data transformation.
The script contains a class
ExternalTraffic that contains functions that communicate with the XTLM utilities objects.
Instantiating the XTLM Utilies¶
TrafficGenerator/Pythonand open the file
Scroll to lines 155-159. This is where you instantiate the master/slave utilities for communicating with the simulator. To allow for direct connections to be made between the functions that will be generating the data and the simulator, the name of the utility being instantiated is the same as the name used in the following files:
system_etg.cfgfor hardware emulation
graph.hwith the PLIO constructors for AI Engine simulation
These utility objects contain the functions to transport and receive packets of data for processing, and require the data to be converted into a byte array.
Transmitting Data through the ipc_axis_master_util Utility¶
xtg_aie.py, go to line 35, the
mm2s function. This function performs the following operations:
1. It reads the input text file (`mm2s.txt`) into the variable `L`. 2. It transforms the data into a byte array by using two's complement and OR-ing the real and imaginary values together. 3. It sends the data in 128-sample packets to the utility using the `b_transport()` function.
Receiving Data through the ipc_axis_slave_util Utility¶
Navigate to line 69, the
s2mm function. This function performs the following operations:
1. It receives data using the `sample_transaction` function. This is a blocking function, so it will not continue until it sees data from the utility. 2. It parses the data, which is still a byte array, and transports it back to the `run` function using the `self.child1.send()` function.
A C++ application has also been created to test this feature. On top of reading and writing files, this application uses the original HLS code of the
polar_clip function to test it in situ. The following image shows the general flow of the Python to be used in
main.cpp file, the function
main() is defined along with the three classes which are instantiated in it:
These three classes inherit from their three counterpart classes:
Depending on the type of simulation (AI Engine only or complete system simulation), the instances of
s2mm_impl must target different names (hence the separation in two sections in the
main.cpp as in the Python counterpart). While the socket names are different, the application itself is exactly the same.
run method is called for the three instances, which creates a thread with the data handler:
m_thread = std::thread(&mm2s::sock_data_handler, this);
m_thread = std::thread(&s2mm::sock_data_handler, this);
polar_clip(a thread is created for the input and the output ports):
m_thread = std::thread(&polar_clip::in_data_handler, this);
m_thread_1 = std::thread(&polar_clip::out_data_handler, this);
data_handler method calls the send/receive data functions (defined in the implementation class) and then uses the transaction method implemented in the sockets to communicate in between the threads. The class
mm2s reads data from an array initialized in
s2mm writes the data in the file
polar_clip is different. It reads and writes to a different socket, and in the middle it processes the data using the core processing function of the HLS IP of
Step 4: Testing the Functionalities¶
For simplicity, all the tests can be conducted using variables defined in the command line of the Makefile. The three variables are as follows:
_sw_emu_allows you to run
_hw_emu_allows you to run
_false_: the input and output files are defined in the AI Engine design. The
aiesimulatorare run using these predefined test vectors.
hw_emuruns after having synthesized
polar_clipin the PL.
_true_: external traffic generators are hooked up to the various simulators.
_Python_: a Python external traffic generator is used if necessary (
_Cpp_: a C++ external traffic generator is used if necessary.
The rules defined in the Makefile are as follows:
clean: Clean up all subdirectories.
aie: Compile the AI Engine graph (the
EXTERNAL_IOmacro value is defined depending on the
aiesim: Simulate at the AI Engine array level (
aiesimulator) and check results.
polar_clipinto XO files, or copy
sim_ipc_slave_32.xointo the kernels directory.
xclbin: Create the XCLBIN files with the kernels defined by the XOs in the kernels directory.
host: Create the host application. No test is defined in this application because the verification is done on the output of the external traffic generator.
package: Prepare for the
traffic_gen: Compile the C++ traffic generator if it has been selected.
hw_emuand checks results.
Some simulations you can try are shown in the following examples.
make TARGET=sw_emu EXTIO=false clean aie aiesim --> Runs x86simulator without external TG. make TARGET=hw_emu EXTIO=true TRAFFIC_GEN=PYTHON clean aie aiesim --> Runs aiesimulator with a Python external TG. You have to close the two plots to finish the simulation. make TARGET=hw_emu EXTIO=true TRAFFIC_GEN=CPP clean aie traffic_gen aiesim --> Same but with a C++ external TG make TARGET=hw_emu EXTIO=true TRAFFIC_GEN=PYTHON clean aie xclbin host package hw_emu --> Runs hw_emu with Python External TG. You have to get out from Qemu (CTRL A + X) and close the 2 plots to finish the simulation. make TARGET=hw_emu EXTIO=true TRAFFIC_GEN=CPP traffic_gen hw_emu --> Runs hw_emu with C++ external TG. Here, you take advantage of the already compiled xclbin and created package.
In this tutorial, you have learned about the following:
The required modifications to enable external traffic generators in the graph code.
The format and layout of the provided Python script and C++ application to use.
The changes needed to the Vitis compiler configuration to run external traffic generators in
The advantages of using external traffic generators are as follows:
You can use Python script and design between
You can use discrete standalone input files, or use data generated within the Python script.
You do not need to write PL kernels to communicate with the AI Engine (only reference specific Simulator IPC IPs provided by Xilinx for hardware emulation).
You can mimic the flow of a PL kernel that is needed by the AI Engine (for example,
GitHub issues will be used for tracking requests and bugs. For questions go to forums.xilinx.com.
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License.
You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
XD136 | © Copyright 2022 Xilinx, Inc.