AI Engine DevelopmentSee Vitis™ Development Environment on xilinx.com See Vitis™ AI Development Environment on xilinx.com |
Python and C++ External Traffic Generators for AI Engine Simulation and Emulation Flows¶
Version: Vitis 2022.1
Introduction¶
Versal™ adaptive compute acceleration platforms (ACAPs) combine Scalar Engines, Adaptable Engines, and Intelligent Engines with leading-edge memory and interfacing technologies to deliver powerful heterogeneous acceleration for any application. In a bottom-up approach, each part of system is simulated independently before being integrated in a more complete simulation. An heterogeneous device as the Versal ACAP authorizes dataflows to go through a diversity of engines for completion.
This tutorial develops a case in which the dataflow goes back and forth multiple times between the programmable logic (PL) and the AI Engine array. Some PL blocks are only source or destination kernels, whereas others are processing kernels within the dataflow. This tutorial demonstrates how to create external traffic generators as Python scripts or C++ applications to exercise the AI Engine kernels in the x86 simulator (x86simulator
), AI Engine simulator (aiesimulator
), and in hardware emulation (hw_emu
).
Objectives¶
After completing the tutorial, you will be able to do the following:
Create a Python script or C++ application.
Pass data between the traffic generator and the AI Engine through specific ports.
Capture and send data and display using Python.
Capture and send data using C++.
Compile and simulate a design.
Understand the necessary code changes in the graph and host to make the design work seamlessly between
x86simulator
,aiesimulator
, andhw_emu
.Bring up results in Vitis Analyzer.
Before You Begin¶
This tutorial uses Python. In addition to having the Xilinx tools installed, you also need a valid installation of Python 3.
Python Environment Setup¶
The external traffic generators use Python and require non-standard packages to be installed. Perform the following steps to install these packages.
Make sure you are using the latest version of Python 3. This tutorial has been developed with version 3.6.5. Run the following command to check the version:
python3 --version
Install the appropriate packges using
pip
.pip -r install requirements.txt
This file contains the following packages:
numpy multiprocessing struct matplotlib
Validate your environment by running the following command. If errors are reported during import, rerun
pip
and install the packages manually.python3 -c 'import numpy, matplotlib, struct, multiprocessing'
Add the provided packages for external traffic generators to the
PYTHONPATH
. Run the following command:export PYTHONPATH=${XILINX_VIVADO}/data/emulation/hw_em/lib/python/:${XILINX_VIVADO}/data/emulation/ip_utils/xtlm_ipc/xtlm_ipc_v1_0/python/:${PYTHONPATH}
Xilinx Tools Initialization¶
Before starting this tutorial, run the following steps.
Set up the following paths in the script
env_setup.sh
:
export XILINX_TOOLS_LOCATION=<Path to Vitis Build - Directory>/Vitis/2022.1
export PLATFORM_REPO_PATHS=<YOUR-PLATFORMS-DIRECTORY>
export XILINX_VERSAL_SW=<Path to xilinx-versal-common-v2022.1 - Directory>
export XILINX_XRT=/<user-path>/opt/xilinx/xrt
export PYTHON3_LOCATION=<user-path>
The script sets the necessary paths to run the tutorial:
ROOTFS
,IMAGE
,SYSROOT
,CXX
, andSDKTARGETSYSROOT
for host software compilation.LIBRARY_PATH
to handle external traffic generator handles.PATH
andPYTHONPATH
for the Python-based external traffic generator.
This tutorial targets the VCK190 board, which is currently available on the Xilinx website. Because this tutorial is about simulation, you do not need to have the real hardware.
Run the
source env_setup.sh
script in the console.
Design Overview¶
This tutorial is based on a basic design, as shown below. This design contains two AI Engine kernels with an intermediate kernel in the PL. The overall system is fed and flushed from kernels that are also in the PL.
In a standard simulation scheme, you would have to perform several steps:
In
x86simulator
andaiesimulator
, you would have to replacemm2s
and the output ofpolar_clip
with text test vectors and verify that the input ofpolar_cip
ands2mm
is as intended.In
hw_emu
, you would have to build the three PL kernelsmm2s
,polar_clip
, ands2mm
either from RTL code or HLS, link them with Vitis compiler to create the XCLBIN file, create a complete host application that also verifies the output of the system, and then simulate usinghw_emu
.
This tutorial shows you how to write Python scripts and C++ traffic generators to replace these text files that you would have to create offline. It also allows you to simulate the design in various different ways:
In
x86simulator
,aiesimulator
, andhw_emu
With text files as test vectors
With external traffic generators in Python or C++
A standard simulation with text files would be represented as shown below:
When using external traffic generators, communication with the simulator is achieved through Unix sockets, as shown below:
Tutorial Steps¶
Step 1: ADF Graph Modifications¶
To use external traffic generators for any kind of simulation, you need to make modifications to the graph code, specifically the graph.h
file. This file contains the PLIO constructors, which are used to connect the graph to the programmable logic.
Navigate to
aie/src
and opengraph.h
.Notice that the
#ifdef EXTERNAL_IO
is used and the lines of code under it do not have the data file in the PLIO constructors. This is needed for the external traffic generator to work properly, because the data file (seen on lines 85 to 88) take precedence.Take note of the names (first argument) of the PLIO constructors. These will be used to hook up the external traffic generators.
Close
graph.h
.
Note: Code guarding this is optional. It is used in this instance to show the changes needed. These modifications are simple. Remove the filenames of the test vectors, and the simulator (AI Engine or x86) automatically takes responsibility for creating ports to connect Unix sockets.
Step 2: Linking for Hardware Emulation¶
In a standard simulation, the various XO files (mm2s
, polar_clip
, and s2mm
) would be created and linked to the AI Engine array, leaving no room for the external traffic generators. A configuration file for such a simulation can be seen in system.cfg
:
[connectivity]
nk=mm2s:1:mm2s
nk=s2mm:1:s2mm
nk=polar_clip:1:polar_clip
sc=mm2s.s:ai_engine_0.DataIn1
sc=ai_engine_0.clip_in:polar_clip.sin
sc=polar_clip.sout:ai_engine_0.clip_out
sc=ai_engine_0.DataOut1:s2mm.s
To work with external traffic generators in hardware emulation, introduce hooks in the PL. For that purpose, Xilinx provides a complete set of XO files with various bitwidths in $XILINX_VITIS/data/emulation/XO/
:
sim_ipc_master_NNN.xo
withNNN
in 8, 16, 32, 64, 128, 256, 512sim_ipc_slave_NNN.xo
withNNN
in 8, 16, 32, 64, 128, 256, 512
In this tutorial, there are 32-bit interfaces. Copy the files sim_ipc_master_32.xo
and sim_ipc_slave_32.xo
into the pl_kernels
directory. The configuration file is the one in system_etg.cfg
:
[connectivity]
nk=sim_ipc_axis_master_32:2:mm2s.polar_clip_out
nk=sim_ipc_axis_slave_32:2:s2mm.polar_clip_in
sc=mm2s.M00_AXIS:ai_engine_0.DataIn1
sc=ai_engine_0.clip_in:polar_clip_in.S00_AXIS
sc=polar_clip_out.M00_AXIS:ai_engine_0.clip_out
sc=ai_engine_0.DataOut1:s2mm.S00_AXIS
Step 3: External Traffic Generators¶
The overall goal of the external traffic generator is to send or receive data to or from the AI Engine array through a specific port. The sender can generate data on the fly or read it from a file. The receiver can keep data and save it somewhere, or process it in a function. When the external traffic generator takes the place of a PL kernel that performs processing, it can use a Python/C++ model of the functionality or even use the original HLS function.
Python Scripts¶
To test this design, you can use a Python script as an external traffic generator. Each kernel of the AI Engine is tested separately to show that multiple traffic generators can be run in parallel. The following image shows the general flow of the Python to be used in aiesimulator
, x86simulator
, and hw_emu
:
Script Analysis¶
The general overview of what the script does as follows:
Read in two input files (
mm2s.txt
,polar_clip_out.txt
).Convert the data from the input files into byte arrays.
Transmit the byte arrays using AXI transactions from the XTLM Python package.
Read the data from AXI transactions and convert it to
numpy
arrays for plotting.Save output data and plot the input/output data to see the data transformation.
The script contains a class ExternalTraffic
that contains functions that communicate with the XTLM utilities objects.
Instantiating the XTLM Utilies¶
Navigate to
TrafficGenerator/Python
and open the filextg_aie.py
.Scroll to lines 155-159. This is where you instantiate the master/slave utilities for communicating with the simulator. To allow for direct connections to be made between the functions that will be generating the data and the simulator, the name of the utility being instantiated is the same as the name used in the following files:
system_etg.cfg
for hardware emulationgraph.h
with the PLIO constructors for AI Engine simulation
These utility objects contain the functions to transport and receive packets of data for processing, and require the data to be converted into a byte array.
Transmitting Data through the ipc_axis_master_util Utility¶
In xtg_aie.py
, go to line 35, the mm2s
function. This function performs the following operations:
1. It reads the input text file (`mm2s.txt`) into the variable `L`.
2. It transforms the data into a byte array by using two's complement and OR-ing the real and imaginary values together.
3. It sends the data in 128-sample packets to the utility using the `b_transport()` function.
Receiving Data through the ipc_axis_slave_util Utility¶
Navigate to line 69, the s2mm
function. This function performs the following operations:
1. It receives data using the `sample_transaction` function. This is a blocking function, so it will not continue until it sees data from the utility.
2. It parses the data, which is still a byte array, and transports it back to the `run` function using the `self.child1.send()` function.
C++ Application¶
A C++ application has also been created to test this feature. On top of reading and writing files, this application uses the original HLS code of the polar_clip
function to test it in situ. The following image shows the general flow of the Python to be used in aiesimulator
, x86simulator
, and hw_emu
:
In the main.cpp
file, the function main()
is defined along with the three classes which are instantiated in it:
mm2s_impl
polar_clip_impl
s2mm_impl
These three classes inherit from their three counterpart classes:
mm2s
polar_clip
s2mm
Depending on the type of simulation (AI Engine only or complete system simulation), the instances of mm2s_impl
, polar_clip_impl
and s2mm_impl
must target different names (hence the separation in two sections in the main.cpp
as in the Python counterpart). While the socket names are different, the application itself is exactly the same.
The run
method is called for the three instances, which creates a thread with the data handler:
mm2s
ands2mm
:m_thread = std::thread(&mm2s::sock_data_handler, this);
m_thread = std::thread(&s2mm::sock_data_handler, this);
polar_clip
(a thread is created for the input and the output ports):m_thread = std::thread(&polar_clip::in_data_handler, this);
m_thread_1 = std::thread(&polar_clip::out_data_handler, this);
The data_handler
method calls the send/receive data functions (defined in the implementation class) and then uses the transaction method implemented in the sockets to communicate in between the threads. The class mm2s
reads data from an array initialized in input.h
, and s2mm
writes the data in the file DataOut1.txt
. The polar_clip
is different. It reads and writes to a different socket, and in the middle it processes the data using the core processing function of the HLS IP of polar_clip
.
Step 4: Testing the Functionalities¶
For simplicity, all the tests can be conducted using variables defined in the command line of the Makefile. The three variables are as follows:
TARGET
:_sw_emu_
allows you to runx86simulator
._hw_emu_
allows you to runaiesimulator
orhw_emu
.
EXTIO
_false_
: the input and output files are defined in the AI Engine design. Thex86simulator
andaiesimulator
are run using these predefined test vectors.hw_emu
runs after having synthesizedmm2s
,s2mm
, andpolar_clip
in the PL._true_
: external traffic generators are hooked up to the various simulators.
TRAFFIC_GEN
_Python_
: a Python external traffic generator is used if necessary (EXTIO=true
)._Cpp_
: a C++ external traffic generator is used if necessary.
The rules defined in the Makefile are as follows:
clean
: Clean up all subdirectories.aie
: Compile the AI Engine graph (theEXTERNAL_IO
macro value is defined depending on theEXTIO
variable value).aiesim
: Simulate at the AI Engine array level (x86simulator
oraiesimulator
) and check results.kernels
: Compiles2mm
,mm2s
, andpolar_clip
into XO files, or copysim_ipc_master_32.xo
andsim_ipc_slave_32.xo
into the kernels directory.xclbin
: Create the XCLBIN files with the kernels defined by the XOs in the kernels directory.host
: Create the host application. No test is defined in this application because the verification is done on the output of the external traffic generator.package
: Prepare for thehw_emu
.traffic_gen
: Compile the C++ traffic generator if it has been selected.run_emu
: Runhw_emu
and checks results.
Some simulations you can try are shown in the following examples.
make TARGET=sw_emu EXTIO=false clean aie aiesim
--> Runs x86simulator without external TG.
make TARGET=hw_emu EXTIO=true TRAFFIC_GEN=PYTHON clean aie aiesim
--> Runs aiesimulator with a Python external TG. You have to close the two plots to finish the simulation.
make TARGET=hw_emu EXTIO=true TRAFFIC_GEN=CPP clean aie traffic_gen aiesim
--> Same but with a C++ external TG
make TARGET=hw_emu EXTIO=true TRAFFIC_GEN=PYTHON clean aie xclbin host package hw_emu
--> Runs hw_emu with Python External TG. You have to get out from Qemu (CTRL A + X) and close the 2 plots to finish the simulation.
make TARGET=hw_emu EXTIO=true TRAFFIC_GEN=CPP traffic_gen hw_emu
--> Runs hw_emu with C++ external TG. Here, you take advantage of the already compiled xclbin and created package.
Summary¶
In this tutorial, you have learned about the following:
The required modifications to enable external traffic generators in the graph code.
The format and layout of the provided Python script and C++ application to use.
The changes needed to the Vitis compiler configuration to run external traffic generators in
hw_emu
.
The advantages of using external traffic generators are as follows:
You can use Python script and design between
aiesimulator
andhw_emu
.You can use discrete standalone input files, or use data generated within the Python script.
You do not need to write PL kernels to communicate with the AI Engine (only reference specific Simulator IPC IPs provided by Xilinx for hardware emulation).
You can mimic the flow of a PL kernel that is needed by the AI Engine (for example,
polar_clip
).
Support¶
GitHub issues will be used for tracking requests and bugs. For questions go to forums.xilinx.com.
License¶
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License.
You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
XD136 | © Copyright 2022 Xilinx, Inc.