2021.1 Versal™ AI Engine LeNet Tutorial

LeNet Tutorial¶

Table of Contents¶

Introduction

Before You Begin

Building the Lenet Design

Hardware Design Details

Software Design Details

Throughput Measurement Details

References

Introduction¶

The Xilinx® Versal ACAP is a fully software-programmable, heterogeneous compute platform that combines the Processor System (PS) (Scalar Engines that include the Arm® processors), Programmable Logic (PL) (Adaptable Engines that include the programmable logic blocks and memory) and AI Engines which belong in the Intelligent Engine category.

This tutorial uses the LeNet algorithm to implement a system-level design to perform image classification using the AI Engine and PL logic, including block RAM (BRAM). The design demonstrates functional partitioning between the AI Engine and PL. It also highlights memory partitioning and hierarchy among DDR memory, PL (BRAM) and AI Engine memory.

The tutorial takes you through hardware emulation and hardware flow in the context of a complete Versal ACAP system integration. A Makefile is provided that you can modify to suit your own needs in a different context.

Objectives

Objectives¶

After completing the tutorial, you should be able to:

Build a complete system design by going through the various steps in the Vitis™ unified software platform flow, including creating the AI Engine Adaptive Data Flow API (ADF) graph, compiling the A72 host application and compiling PL kernels, using the Vitis compiler (V++) to link the AI Engine and HLS kernels with the platform, and packaging the design. You will also be able to run the design through the hardware emulation and hardware flow in a mixed System C/RTL cycle-accurate/QEMU-based simulator.
Develop an understanding of CNN (Convolutional Neural Network) layer details using the LeNet algorithm and how the layers are mapped into data processing and compute blocks.
Develop an understanding of the kernels developed in the design - AI Engine kernels to process fully connected convolutional layers and PL kernels to process the input rearrange and max pool and rearrange functions.
Develop an understanding of the AI Engine IP interface using the AXI4-Stream interface.
Develop an understanding of memory hierarchy in a system-level design involving DDR memory, PL BRAM, and AI Engine memory.
Develop an understanding of graph control APIs to enable run-time updates using the run-time parameter (RTP) interface.
Develop an understanding of performance measurement and functional/throughput debug at the application level.

Tutorial Overview

Tutorial Overview¶

In this application tutorial, the LeNet algorithm is used to perform image classification on an input image using five AI Engine tiles and PL resources including block RAM. A top level block diagram is shown in the following figure. An image is loaded from DDR memory through the NoC to block RAM and then to the AI Engine. The PL input pre-processing unit receives the input image and sends the output to the first AI Engine tile to perform matrix multiplication. The output from the first AI Engine tile goes to a PL unit to perform the first level of max pool and data rearrangement (M1R1). The output is fed to the second AI Engine tile and the output from that tile is sent to the PL to perform the second level max pooling and data rearrangement (M2R2). The output is then sent to a fully connected layer (FC1) implemented in two AI Engine tiles and uses the rectified linear unit layer (ReLu) as an activation function. The outputs from the two AI Engine tiles are then fed into a second fully connected layer implemented in the fifth AI Engine tile. The output is sent to a data conversion unit in the PL and then to the DDR memory through the NoC. In between the AI Engine and PL units is a datamover module (refer to the Lenet Controller in the following figure) that contains the following kernels:

mm2s: a memory mapped to stream kernel to feed data from DDR memory through the NoC to the AI Engine Array
s2mm: a stream to memory mapped kernel to feed data from the AI Engine Array through NoC to DDR memory

Image of LeNet Block Diagram

In the design, there are two major PL kernels. The input pre-processing unit, M1R1 and M2R2 are contained in the lenet_kernel RTL kernel which has already been packaged as a Xilinx object .xo (XO) file. The datamover kernel dma_hls provides the interface between the AI Engine and DDR memory. The five AI Engine kernels all implement matrix multiplication. The matrix dimensions depend on the image dimension, weight dimension, and number of features.

Directory Structure

Directory Structure¶

lenet
|____design......................contains AI Engine kernel, HLS kernel source files, and input data files
|    |___aie_src
|    |   |___data
|    |___pl_src
|___images......................contains images that appear in the README.md
|___Makefile
|___system.cfg...................configuration (.cfg) file
|___xrt.ini

Before You Begin¶

Note: This tutorial targets the VCK190 ES board (see https://www.xilinx.com/products/boards-and-kits/vck190.html). This board is currently available via early access. If you have already purchased this board, download the necessary files from the lounge and ensure you have the correct licenses installed. If you do not have a board and ES license please contact your Xilinx sales contact.

Documentation: Explore AI Engine Architecture

Documentation: Explore AI Engine Architecture¶

Tools: Installing the Tools

Tools: Installing the Tools¶

Tools Documentation:

To build and run the Lenet tutorial, you will need the following tools downloaded/installed:

Install the Vitis Software Platform 2021.1
Obtain a license to enable Beta Devices in Xilinx tools (to use the xilinx_vck190_base_202110_1 platform)
Obtain licenses for AI Engine tools
Follow the instructions in Installing Xilinx Runtime and Platforms (XRT)
Download and set up the VCK190 Vitis Platform for 2021.1

Environment: Setting Up the Shell Environment

Environment: Setting Up the Shell Environment¶

When the elements of the Vitis software platform are installed, update the shell environment script. Set the environment variables to your system specific paths.

Edit env_setup.sh script with your file paths:

export XILINX_XRT=<XRT-LOCATION>
export PLATFORM_REPO_PATHS=<YOUR-PLATFORM-DIRECTORY> 

source <XILNX-TOOLS-LOCATION>/Vitis/<TOOLS-BUILD>/settings64.sh
source $XILINX_XRT/setup.sh

Then source the environment script:

source env_setup.sh

Validation: Confirming Tool Installation

Validation: Confirming Tool Installation¶

which vitis
which aiecompiler

Confirm you have the VCK190 Production Base Platform.

platforminfo --list | grep -m 1 -A 9 vck190_base

Output of the above command should be as follows:

"baseName": "xilinx_vck190_base_202110_1",
            "version": "1.0",
            "type": "sdsoc",
            "dataCenter": "false",
            "embedded": "true",
            "externalHost": "false",
            "serverManaged": "false",
            "platformState": "pre_synth",
            "usesPR": "false",

Output Objects	Description
build/libadf.a	Compiled AI Engine design graph.
build/Work/	Directory that contains all outputs of the AI Engine compiler.

Switch	Description
--platform \| -f	Specifies the name of a supported acceleration platform as specified by the $PLATFORM_REPO_PATHS environment variable or the full path to the platform XPFM file.
--save-temps \| -s	Directs the `v++` command to save intermediate files/directories created during the compilation and link process. Use the `--temp_dir` option to specify a location to write the intermediate files to.
--temp_dir	This allows you to manage the location where the tool writes temporary files created during the build process. The temporary results are written by the Vitis compiler, and then removed, unless the `--save-temps` option is also specified.
--verbose	Display verbose/debug information.
--config	Specifies a configuration file containing `v++` switches.
--output \| -o	Specifies the name of the output file generated by the `v++` command. In this design the outputs of the DMA HLS kernels and the PL kernels interfacing with the AI Engine are in XO files.

Switch	Comment
--connectivity.nk	Number of kernels. `mm2s:2:mm2s_0.mm2s_1` means that the Vitis compiler should instantiate two MM2S kernels and name those instances `mm2s_0` and `mm2s_1`.
--connectivity.stream_connect	How the kernels will connect to IPs, platforms, or other kernels. The output of the AI Engine compiler tell you the interfaces that need to be connected. `mm2s_0.s:ai_engine_0.lte_0` means that the Vitis compiler should connect the port `s` of `mm2s` to the port `lte_0` of AI Engine port 0. The name of the AI Engine port is one that has been defined in `graph.cpp` PLIO instantiation.
param=compiler.addOutputTypes=hw_export	This option tells the Vitis compiler that besides creating an XCLBIN file, it also outputs an XSA file which is needed to create a post-Vivado fixed platform for Vitis software development.

Inputs Sources	Description
design/aie_src/main.cpp	Source application file for the `lenet_xrt.elf` that will run on an A72 processor.
build/Work/ps/c_rts/aie_control_xrt.cpp	This is the AI Engine control code generated implementing the graph APIs for the LeNet graph.

Output Objects	Description
build/lenet_xrt.elf	The executable that will run on an A72 processor.

2021.1 Versal™ AI Engine LeNet Tutorial

LeNet Tutorial¶

Table of Contents¶

Introduction¶

Objectives¶

Tutorial Overview¶

Directory Structure¶

Before You Begin¶

Documentation: Explore AI Engine Architecture¶

Tools: Installing the Tools¶

Environment: Setting Up the Shell Environment¶

Validation: Confirming Tool Installation¶

Building the LeNet Design¶

LeNet Design Build¶

Make Steps¶

Build the Entire Design with a Single Command¶

make kernels: Compile PL Kernels¶

make graph: Creating the AI Engine ADF Graph for Vitis Compiler Flow¶

make xclbin: Use Vitis Tools to Link AI Engine and HLS Kernels with the Platform¶

Platform¶

make application: Compile the Host Application¶

make package: Package the Design¶

make run_emu: Run Hardware Emulation¶

TARGET=hw: Run on Hardware¶

Hardware Design Details¶

LeNet Architecture and AI Engine/PL Function Partitioning¶

Design Platform Details¶

AI Engine and PL Kernel Details¶

Design Implementation¶

Software Design Details¶

AI Engine Kernels and Graph Representation¶

Data Flow Graph¶

Define the Graph Class¶

Define the Graph Constructor¶

Add Connectivity Information¶

Set the Source File and Tile Use¶

LeNet Top level Application¶

PL Kernels¶

PS Host Application¶

1. Include graph.cpp¶

2. Check Command Line Argument¶

3. Open XCLBIN and Create Data Mover Kernel Handles¶

4. Allocate Buffers for Input Data and Results in Global Memory¶

5. Open Graph, Obtain Handle and Execute Graph¶

6. Execute the Data Mover Kernels and Generate the Output Results¶

7. Verify Output Results¶

8. Release Allocated Resources¶

Throughput Measurement Details¶

References¶

AI Engine Documentation¶

Xilinx Runtime (XRT) Architecture¶

Vitis Unified Software Development Platform 2021.1 Documentation¶

Revision History¶

AI Engine Documentation ¶

Xilinx Runtime (XRT) Architecture ¶

Vitis Unified Software Development Platform 2021.1 Documentation ¶