2021.1 Versal AI Engine/HLS FIR Filter Tutorial

Table of Contents¶

Introduction

Before You Begin

Design Implementations

Choosing between AI Engine and HLS Implementations

AI Engine Specific Design Considerations

Measuring Resources, Throughput, Latency, and Power

Conclusion

Revision History

Introduction¶

The Xilinx® Versal™ adaptive compute acceleration platform (ACAP) is a fully software-programmable, heterogeneous compute platform that combines the processor system (PS) (Scalar Engines that include the Arm® processors), programmable logic (PL) (Adaptable Engines that include the programmable logic blocks and memory) and the Intelligent Engines comprising of both the AI and DSP Engines.

This tutorial is one of several to perform two implementations of a system-level design using AI Engines and HLS with DSP Engines in the Versal device plus PL including LUTs, flip-flops (FFs), and block RAMs. In each implementation, the tutorial takes you through hardware emulation and hardware flow in the context of a complete Versal ACAP system design. A makefile is provided so that you can modify it to suit your needs in a different context.

An important goal and criteria of this tutorial is the use of C++ based kernels for AI Engine and HLS library kernels for DSP Engine and data movers. The use of Vitis™ application acceleration development flow and library kernels is illustrated throughout the tutorial to demonstrate the ease of kernel integration and scalability in a system design. In the Vitis application acceleration development flow, the Vitis HLS tool automates much of the code modifications required to implement and optimize the C/C++ code in PL, including the ease of data mover kernel coding. The inference of required pragmas to produce the right interface for users’ function arguments and to pipeline loops and functions is the foundation of the Vitis HLS in the application acceleration flow. Vitis HLS also supports customization of your code to implement different interface standards or specific optimizations to achieve design objectives, enable scaling, and leverage automation.Note: Alternative design methods to Vitis HLS may increase PL based performance, e.g. using LogiCORE™ FIR Compiler IP and RTL based data movers could increase raw performance but will increase dynamic power and design time.

A frequently asked question is whether using AI Engines,HLS or RTL targeting DSPs produces the better implementation. The answer depends on the design objectives, complexity, and characteristics of every individual design. A section in this tutorial is provided which discusses the trade-offs and provides guidance in helping to determine the best choice for your design.In addition another section discusses AI Engine specific design considerations because AI Engines are a relatively new technology compared to the mature FPGA fabric or PL with DSPs.

Objectives

Objectives¶

After completing the tutorial, you should be able to:

Develop a system level design (FIR filter in this case) by identifying the algorithm and deploying the same algorithm on AI Engine and DSP Engines using Vitis HLS.
Build a complete system design by going through the various steps in the Vitis unified software platform flow, including creating the AI Engine adaptive data flow (ADF) API graph, compiling the A72 host application, and compiling PL kernels, using the Vitis compiler (v++) to link the AI Engine and HLS kernels with the platform, and packaging the design. You will also be able to run the design through the hardware emulation and hardware flow in a mixed System C/RTL cycle-accurate/QEMU-based simulator
Develop a consistent harness to have the data mover kernels maintain a similar interface with AI Engine/HLS kernels (with AXI4-stream) and DDR memory (memory-mapped AXI4)
Develop an understanding of graph control APIs to enable run-time updates using the run-time parameter (RTP) interface for the AI Engine implementation and HLS APIs for controlling HLS/PL kernels
Develop an understanding of the various factors that influence the performance, resources, latency, and power of AI Engine and HLS using DSP implementations, so that an informed choice can be made between the two implementations.

Overview

Overview¶

This tutorial implements a FIR filter chain, one implementation targeted at AI Engines and another targeted at DSP Engines using Vitis HLS.

FIR filters provide a large design space to explore. For the purposes of this tutorial, the following parameters are held fixed/constant:

Data Type: cint16
Coefficient type: int16
Symmetric FIR
Fixed (i.e., non-reloadable) coefficients

The number of filter taps in the filters and the number of cascaded filters in the chain can be specified as parameters in the build process. Each filter in the chain consists of an identical number of taps with identical coefficients. While this is not necessarily a realistic design situation, it provides a simple means for generating, scaling and managing the filter chain. One further simplification is the use of a triangular window for the filter coefficients, allowing the taps to be generated simply through linear interpolation. (See https://www.recordingblogs.com/wiki/triangular-window or https://en.wikipedia.org/wiki/Window_function#Triangular_window)

The same filter chain is deployed in the two implementations using AI and DSP Engines. The design will compile through v++, and create a Petalinux-based platform via a script as well as generate the PDI and host application.

The makefile based build process can be directed to build different length chains with a specified number of taps. A similar set of harnesses are developed and maintained between the two implementations to store input/output vectors in DDR memory and use the data mover kernels to move data to and from AI Engine and HLS FIR kernels. In both cases, XRT running A-72 controls data flow in compute and data mover kernels (graph control APIs control AI Engine kernels and HLS APIs control HLS/PL kernels).

Directory Structure

Directory Structure¶

filter_AIEvsHLS
+-- AIE......................contains AI Engine implementation
|   +-- build ...................created and contains subfolders from design build
|   +-- design ..................contains source and include files
|   |	+-- aie_src .................AI Engine source code
|   |	+-- app_src .................A72 application source code
|   |	+-- pl_src ..................PL (HLS) source code
|   +--run_dir...................contains bootable image files to run HW flow
+-- HLS......................contains HLS FIR implementation, targeting DSP Engines
|   +-- build ......................created and contains subfolders from design build
|   +-- design......................contains source and include files
|   |	+-- app_src .................A72 application source code
|   |	+-- pl_src ..................PL (HLS) source code
|   +--run_dir...................contains bootable image files to run HW flow
+-- report_dir...............contains the generated resource and power utilization reports for both AI Engine and DSP implementations

Before You Begin¶

Documentation: Explore AI Engine Architecture

Documentation: Explore AI Engine Architecture¶

Tools: Installing the Tools

Tools: Installing the Tools¶

Tools Documentation:

To build and run the FIR filter tutorial (AI Engine and DSP implementations), you will need the following tools downloaded/installed:

Install the Vitis Software Platform 2021.1
Obtain licenses for AI Engine tools
Follow the instructions in Installing Xilinx Runtime and Platforms (XRT)
Download and setup the VCK190 Vitis Platform for 2021.1
DSP Library (DSPLib) Documentation
Download the DSP Library

Environment: Setting Up the Shell Environment

Environment: Setting Up the Shell Environment¶

When the elements of the Vitis software platform are installed, update the shell environment script. Set the environment variables to your system specific paths.

Edit env_setup.sh script with your file paths:

export XILINX_XRT=<XRT-LOCATION>
export PLATFORM_REPO_PATHS=<YOUR-PLATFORM-DIRECTORY>
export DSPLIB_ROOT=<PATH-TO-DSP-LIBRARY>

source <XILNX-TOOLS-LOCATION>/Vitis/<TOOLS-BUILD>/settings64.sh
source $XILINX_XRT/setup.sh

Then source the environment script:

source env_setup.sh

Validation: Confirming Tool Installation

Validation: Confirming Tool Installation¶

which vitis
which aiecompiler

Confirm that you have the VCK190 production base platform.

platforminfo --list | grep -m 1 -A 9 vck190_base

Output of the previous command should be as follows:

"baseName": "xilinx_vck190_base_202110_1",
            "version": "1.0",
            "type": "sdsoc",
            "dataCenter": "false",
            "embedded": "true",
            "externalHost": "false",
            "serverManaged": "false",
            "platformState": "pre_synth",
            "usesPR": "false",

Taps	Throughput
15	986.1 MSPS(*)
64	266.3 MSPS
129	171.4 MSPS
240	105.9 MSPS

Impl	Filters	Taps	Param	Throughput	LUTS	Flops	DSP	AIE
AIE	1	64	win=256	266.3 MSPS	213	586	0	1
HLS	1	64	ck_per_sam=1	299.8 MSPS	1025	4912	64	0
AIE	10	64	win=256	266.3 MSPS	211	586	0	10
HLS	10	64	ck_per_sam=1	299.8 MSPS	8787	46995	640	0
AIE	1	240	win=256	112.6 MSPS	217	586	0	1
HLS	1	240	ck_per_sam=4	75.0 MSPS	1616	6243	64	0
AIE	10	240	win=256	112.6 MSPS	213	586	0	10
HLS	10	240	ck_per_sam=4	74.9 MSPS	14760	60209	640	0

Taps	Throughput (CASC_LEN=1)	Throughput (CASC_LEN=2)	Throughput (CASC_LEN=4)
15	986.1 MSPS(*)	Too small to cascade	Too small to cascade
64	266.3 MSPS	352.6 MSPS	450.0 MSPS
129	171.4 MSPS	254.8 MSPS	324.1 MSPS
240	105.9 MSPS	179.8 MSPS	234.4 MSPS

Impl	Filters	Taps	Window Size	Throughput	Latency
AIE	1	64	64	200.0 MSPS	0.453 us
AIE	1	64	256	266.3 MSPS	1.287 us
AIE	1	64	1024	297.8 MSPS	4.533 us

2021.1 Versal AI Engine/HLS FIR Filter Tutorial

Table of Contents¶

Introduction¶

Objectives¶

Overview¶

Directory Structure¶

Before You Begin¶

Documentation: Explore AI Engine Architecture¶

Tools: Installing the Tools¶

Environment: Setting Up the Shell Environment¶

Validation: Confirming Tool Installation¶

Design Implementations¶

Choosing between AI Engine and HLS Implementations¶

Meeting Throughput Requirements¶

Resource Utilization¶

Power Utilization¶

Computational Efficiency¶

AI Engine Specific Design Considerations¶

Assigning Multiple AI Engines per Filter¶

Window Size¶

Measuring Resources, Throughput, Latency, and Power¶

Resource Utilization¶

Throughput and Latency Measurements¶

Power Utilization¶

Conclusion¶

Revision History¶

Support¶

License¶