2021.1 Versal™ AI Engine LeNet Tutorial

LeNet Tutorial

Table of Contents

Introduction

Before You Begin

Building the Lenet Design

Hardware Design Details

Software Design Details

Throughput Measurement Details

References

Introduction

The Xilinx® Versal ACAP is a fully software-programmable, heterogeneous compute platform that combines the Processor System (PS) (Scalar Engines that include the Arm® processors), Programmable Logic (PL) (Adaptable Engines that include the programmable logic blocks and memory) and AI Engines which belong in the Intelligent Engine category.

This tutorial uses the LeNet algorithm to implement a system-level design to perform image classification using the AI Engine and PL logic, including block RAM (BRAM). The design demonstrates functional partitioning between the AI Engine and PL. It also highlights memory partitioning and hierarchy among DDR memory, PL (BRAM) and AI Engine memory.

The tutorial takes you through hardware emulation and hardware flow in the context of a complete Versal ACAP system integration. A Makefile is provided that you can modify to suit your own needs in a different context.

Objectives

Objectives

After completing the tutorial, you should be able to:

  • Build a complete system design by going through the various steps in the Vitis™ unified software platform flow, including creating the AI Engine Adaptive Data Flow API (ADF) graph, compiling the A72 host application and compiling PL kernels, using the Vitis compiler (V++) to link the AI Engine and HLS kernels with the platform, and packaging the design. You will also be able to run the design through the hardware emulation and hardware flow in a mixed System C/RTL cycle-accurate/QEMU-based simulator.

  • Develop an understanding of CNN (Convolutional Neural Network) layer details using the LeNet algorithm and how the layers are mapped into data processing and compute blocks.

  • Develop an understanding of the kernels developed in the design - AI Engine kernels to process fully connected convolutional layers and PL kernels to process the input rearrange and max pool and rearrange functions.

  • Develop an understanding of the AI Engine IP interface using the AXI4-Stream interface.

  • Develop an understanding of memory hierarchy in a system-level design involving DDR memory, PL BRAM, and AI Engine memory.

  • Develop an understanding of graph control APIs to enable run-time updates using the run-time parameter (RTP) interface.

  • Develop an understanding of performance measurement and functional/throughput debug at the application level.

Tutorial Overview

Tutorial Overview

In this application tutorial, the LeNet algorithm is used to perform image classification on an input image using five AI Engine tiles and PL resources including block RAM. A top level block diagram is shown in the following figure. An image is loaded from DDR memory through the NoC to block RAM and then to the AI Engine. The PL input pre-processing unit receives the input image and sends the output to the first AI Engine tile to perform matrix multiplication. The output from the first AI Engine tile goes to a PL unit to perform the first level of max pool and data rearrangement (M1R1). The output is fed to the second AI Engine tile and the output from that tile is sent to the PL to perform the second level max pooling and data rearrangement (M2R2). The output is then sent to a fully connected layer (FC1) implemented in two AI Engine tiles and uses the rectified linear unit layer (ReLu) as an activation function. The outputs from the two AI Engine tiles are then fed into a second fully connected layer implemented in the fifth AI Engine tile. The output is sent to a data conversion unit in the PL and then to the DDR memory through the NoC. In between the AI Engine and PL units is a datamover module (refer to the Lenet Controller in the following figure) that contains the following kernels:

  • mm2s: a memory mapped to stream kernel to feed data from DDR memory through the NoC to the AI Engine Array

  • s2mm: a stream to memory mapped kernel to feed data from the AI Engine Array through NoC to DDR memory

Image of LeNet Block Diagram

In the design, there are two major PL kernels. The input pre-processing unit, M1R1 and M2R2 are contained in the lenet_kernel RTL kernel which has already been packaged as a Xilinx object .xo (XO) file. The datamover kernel dma_hls provides the interface between the AI Engine and DDR memory. The five AI Engine kernels all implement matrix multiplication. The matrix dimensions depend on the image dimension, weight dimension, and number of features.

Directory Structure

Directory Structure

lenet
|____design......................contains AI Engine kernel, HLS kernel source files, and input data files
|    |___aie_src
|    |   |___data
|    |___pl_src
|___images......................contains images that appear in the README.md
|___Makefile
|___system.cfg...................configuration (.cfg) file
|___xrt.ini

Before You Begin

Note: This tutorial targets the VCK190 ES board (see https://www.xilinx.com/products/boards-and-kits/vck190.html). This board is currently available via early access. If you have already purchased this board, download the necessary files from the lounge and ensure you have the correct licenses installed. If you do not have a board and ES license please contact your Xilinx sales contact.

Documentation: Explore AI Engine Architecture

Documentation: Explore AI Engine Architecture

Tools: Installing the Tools

Tools: Installing the Tools

Tools Documentation:

To build and run the Lenet tutorial, you will need the following tools downloaded/installed:

Environment: Setting Up the Shell Environment

Environment: Setting Up the Shell Environment

When the elements of the Vitis software platform are installed, update the shell environment script. Set the environment variables to your system specific paths.

Edit env_setup.sh script with your file paths:

export XILINX_XRT=<XRT-LOCATION>
export PLATFORM_REPO_PATHS=<YOUR-PLATFORM-DIRECTORY> 

source <XILNX-TOOLS-LOCATION>/Vitis/<TOOLS-BUILD>/settings64.sh
source $XILINX_XRT/setup.sh

Then source the environment script:

source env_setup.sh
Validation: Confirming Tool Installation

Validation: Confirming Tool Installation

which vitis
which aiecompiler

Confirm you have the VCK190 Production Base Platform.

platforminfo --list | grep -m 1 -A 9 vck190_base

Output of the above command should be as follows:

"baseName": "xilinx_vck190_base_202110_1",
            "version": "1.0",
            "type": "sdsoc",
            "dataCenter": "false",
            "embedded": "true",
            "externalHost": "false",
            "serverManaged": "false",
            "platformState": "pre_synth",
            "usesPR": "false",