AI Engine Development

See Vitis™ Development Environment on xilinx.com
See Vitis-AI™ Development Environment on xilinx.com

Beamforming Tutorial¶

Introduction¶

The Versal™ adaptive compute acceleration platform (ACAP) is a fully software programmable, heterogeneous compute platform that combines the processing system (PS) (Scalar Engines that include Arm™ processors), programmable logic (PL) (Adaptable Engines that include the programmable logic), and AI Engines which belong in the Intelligent Engine category.

This tutorial demonstrates the creation of a beamforming system running on the AI Engine, PL, and PS, and the validation of the design running on this heterogeneous domain.

The tutorial has been divided into modules. It takes you through creating a custom embedded platform, a bare-metal host application, and a custom PetaLinux-based Linux host application, as well as hardware emulation and hardware build flows in the context of a complete Versal ACAP system integration. Each module uses a Makefile to build the relevant aspect of the design.

Tutorial Overview¶

This beamforming tutorial is a system-level design that uses the AI Engine, PL, and PS resources. This design showcases the following features:

A high utilization of PL and AI Engine resources which require advanced timing closure techniques
Custom platform creation
An AI Engine graph that implements matrix multiplication functions for uplink and downlink beamforming
RTL kernels that interface with the AI Engine and operate at 400 MHz
A scalable architecture that only needs a small number of kernels to be developed and can be copied multiple times to extend compute power
Bare-metal and PetaLinux PS host application development process
Timing closure methods for a high utilization design
Hardware emulation and VCK190 board flows
A hierarchical Makefile structure to highlight dependencies between build steps and showcase a way for multiple developers to work on the same repository at the same time (AI Engine developers, RTL designers, and software developers)

Assumptions¶

To fully grasp the design, it is assumed that you have the following knowledge and resources:

Ability to read Tcl scripts
Ability to read C++ based source code to understand the AI Engine kernels and host application source code (bare metal and Linux)
Ability to read Verilog RTL to understand the Vivado® projects created for the RTL PL kernels
A base bootable design (that is to say, you have brought up your board, and have a working hardware and board through a simple Vivado design)

Before You Begin¶

This tutorial targets the VCK190 ES board. This board is currently available through early access. If you have already purchased this board, download the necessary files from the lounge and ensure you have the correct licenses installed. If you do not have a board and ES license, get in touch with your Xilinx sales contact.

Documentation: Explore AI Engine Architecture¶

Tools: Installing the Tools¶

Obtain a license to enable beta devices in Xilinx tools (to use the VCK190 platform).
Obtain licenses for AI Engine tools.
Follow the instructions in Installing Xilinx Runtime and Platforms (XRT).
Follow the instructions to install PetaLinux tools in the PetaLinux Tools Documentation (UG1144).
Download the VCK190 PetaLinux 2021.1 BSP from the Versal AI Core Series VCK190 HeadStart Early Access Site .

To build and run the Beamforming tutorial, you will also need to have downloaded and installed the following tools:

Environment: Setting Up Your Shell Environment¶

When the elements of the Vitis software platform are installed, update the shell environment script. Set the necessary environment variables to your system specific paths.

Edit the env_setup.sh script with your file paths:

export XILINX_XRT=/opt/xilinx/xrt
export PATH_TO_BSP=<path-to-bsps> #(the folder that contains xilinx-vck190-v2021.1-final.bsp)

source <XILINX-INSTALL-LOCATION>/Vitis/2021.1/settings64.sh
source $XILINX_XRT/setup.sh
source <path-to-installed-PetaLinux>/settings.sh

Source the environment script:

source env_setup.sh

Validation: Confirming Tool Installation¶

Make sure you are using the 2021.1 version of the Xilinx tools.

which vitis
which aiecompiler

System Design Overview¶

This tutorial showcases a beamforming system with 32 layers and 64 antennas implemented on an XCVC1902 Versal ACAP device in the VCK190 board. The beamforming system consists of a downlink subsystem which contains the DL64A32L AI Engine subgraph and the dlbf_data, dlbf_coeff, and dlbf_slave PL RTL kernels. The beamforming system also consists of the uplink subsystem, which contains the UL64A32L AI Engine subgraph and the ulbf_data, ulbf_coeff, and ulbf_slave PL RTL kernels. Together, the downlink and uplink subsystems implement the uplink and downlink matrix multiplication equations for M=32 layers and N=64 antennas and compute sample data. The results are compared to reference downlink and uplink result data for verification. The entire beamforming system is copied three times to make full use of the available AI Engine and PL resources.

Block Diagram¶

Beamforming Block Diagram

Modules¶

Module 01 - Custom Platform¶

The module shows when to create a custom platform rather than a base platform. It also shows how to create a custom platform, using a beamforming platform as an example.

Module 02 - AI Engine Design¶

Teaches AI Engine developers how to:
- Map beamforming functions to AI Engine kernels.
- Design AI Engine graphs with beamforming source code as example.
- Use the AI Engine compilers (and understand why unique options are used for this design).
- Use the AI Simulator to test against reference output data.

Module 03 - PL Design¶

This module shows RTL designers how to:

Map data storage and data capture functions to Custom RTL PL kernels which will connect to the AI Engine and custom platform.
Design PL kernels with the beamforming PL source RTL as an example.
Package RTL PL kernels in to XO files.

Module 04 - AI Engine and PL Integration¶

This module shows developers how to:

Combine an AI Engine graph (libadf.a) and *.xo PL kernels into an XCLBIN.
Guide the Vivado tool to close timing on a high utilization design.

Module 05 - Bare-Metal PS Host Application¶

This module shows software developers how to create a bare-metal application for beamforming.

Module 06 - System Integration - Bare Metal¶

This module shows developers how to:

Package their design using the Vitis compiler for hardware or hardware emulation.
Run hardware emulation.
Run their bare metal design on hardware (VCK190 board).

Module 07 - PetaLinux¶

This module shows developers how to:

Build a custom PetaLinux software platform.
Package the linked XSA and custom Petalinux software platform into a new Versal Custom Platform (.xpfm).

Module 08 - Linux SW Application¶

This module shows developers how to create a Linux PS host application for functional and performance tests.

Module 09 - System Integration - Linux¶

This module shows developers how to:

Package their design using the Vitis compiler for a hardware run with a Linux PS host application.
Run their design on hardware (VCK190 board).

Beamforming Introduction¶

This tutorial shows efficient implementation of beamforming functionality on AI Engine arrays in the Xilinx Versal AI Engine devices. The design methodology is applicable to many use cases needing high throughput matrix multiplication, such as 5G wireless communication. The following figure shows an example illustration of how matrix multiplication is used in the beamforming of an orthogonal frequency division multiplex (OFDM) system with four layers and six antennas.

Beamforming General Image

Downlink Beamforming¶

A single symbol of an OFDM system contains a frequency component and time component allocated to a single user (X_0,0). Multiple symbols in different layers of an OFDM system (X_0,0,X_0,1,X_0,2,X_0,3) are multiplied by a specific set of complex weights (H_0,0,H_1,0,H_2,0,H_3,0) so the data between layers becomes “orthogonal” to each other. This orthogonality allows the layers to be summed together into a single signal (Y_0,0) which is sent to an antenna. A second antenna signal (X_0,1) can be created by multiplying another set of weights (H_0,1,H_1,1,H_2,1,H_3,1) to each layer (X_0,0,X_0,1,X_0,2,X_0,3). The same is done to create the rest of the antenna signals (Y_0,2,Y_0,3,Y_0,4, and Y_0,5).

Downlink Beamforming Formulation¶

The example and generalized downlink matrix multiplication formulas are given below.

Example Equations for a Single Subcarrier¶

Downlink Example Formulas

Generalized Downlink Beamforming Equations¶

Downlink Generalized Formulas

Uplink Beamforming¶

At the receiving end, the antenna data (Y_0,0-Y_0,5) can be demultiplexed back into their original layers (X_0,0-Y_0,3) because of their orthogonal feature.

Generalized Uplink Beamforming Equations¶

Uplink Generalized Formulas

AI Engine Development

Beamforming Tutorial¶

Introduction¶

Tutorial Overview¶

Assumptions¶

Before You Begin¶

Documentation: Explore AI Engine Architecture¶

Tools: Installing the Tools¶

Environment: Setting Up Your Shell Environment¶

Validation: Confirming Tool Installation¶

Other Tutorials: Learn Basic V++ and AI Engine Concepts¶

System Design Overview¶

Block Diagram¶

Modules¶

Module 01 - Custom Platform¶

Module 02 - AI Engine Design¶

Module 03 - PL Design¶

Module 04 - AI Engine and PL Integration¶

Module 05 - Bare-Metal PS Host Application¶

Module 06 - System Integration - Bare Metal¶

Module 07 - PetaLinux¶

Module 08 - Linux SW Application¶

Module 09 - System Integration - Linux¶

Beamforming Introduction¶

Downlink Beamforming¶

Downlink Beamforming Formulation¶

Example Equations for a Single Subcarrier¶

Generalized Downlink Beamforming Equations¶

Uplink Beamforming¶

Generalized Uplink Beamforming Equations¶