Features and Capabilities

The Alveo V70 accelerator card is the first AMD Alveo production card leveraging AMD XDNA™ architecture with AI Engines. It is designed for AI inference efficiency and is tuned for video analytics and natural language processing applications. Providing low power and a small form factor, the V70 helps reduce cost per AI channel and provides high channel density for video applications allowing you to meets your demanding AI performance requirements.

The Alveo V70 is pre-equipped with best in class Versal high-throughput DPU (DPUCV2DX8G) implemented using next generation AI engine, powerful Video Decoder Unit (VDU), along with other capable hardware accelerator such as ABR Scaler that are used for media processing.

Specification of Video Decoder Unit (VDU)

The Xilinx® LogiCORE™ IP H.264/H.265 Video Decode Unit (VDU) is a Hard IP in Alveo V70. The VDU has multiple instances of decoder cores (up to four cores) and the V70 solution uses two instances of the decoder.

The features of each instance of the decoder core in VDU are as follows:

  • Multi-standard decoding support, including:

    • ISO MPEG-4 Part 10: Advanced Video Coding (AVC)/ITU H.264

    • ISO MPEG-H Part 2: High Efficiency Video Coding (HEVC)/ITU H.265

  • Resolution: upto 4K (3840x2160)

  • Framerate: upto 60 Hz

  • Decoder output in semi-planar formats of YCbCr 4:2:0 (NV12)

  • Supports 8 bit per color channel.

  • Supports simultaneous decoding of up to 8 streams of 1080p30.

  • Progressive support for H.264 and H.265

  • Profiles:

    • HEVC: Main, Main Intra up to Level 5.1 High Tier

    • AVC: Baseline, Main, High up to Level 5.2

NOTE:

  1. V70 solution uses 2 instances of VDU and thus it can support upto 16 streams of 1080p30.

Specification of Image Processing Accelerator

The image processing accelerator kernel is based on the Xilinx® LogiCORE™ IP Video Multi-Scaler core. It provides functionalities of resize, color space conversion, mean subtraction and normalization. It will be used as a pre-processing block before inferencing in AI/ML use-cases.

The features of image processing accelerator are as follows:

  • Supports spatial resolutions from 64 × 64 up to 3184 × 2160

  • Supports pixel-width (PPC) = 4

  • Supports RGB, BGR and NV12

  • Supports 8-bit per color component on memory interface

  • Dynamically configurable source and destination buffer addresses

  • Supports 2 taps in both H and V domains

  • Supports BILINEAR Scale-mode

  • Supports cropping

  • Supports Pre-processing (Mean subtraction, and Mean scale)

Specification of Deep learning Processing Unit (DPU)

The Xilinx® Versal® Deep Learning Processing Unit (DPUCV2DX8G) is a configurable computation engine optimized for convolution neural networks in Versal ACAP devices with AI Engines. The DPUCV2DX8G is targeted specifically for Versal devices that leverage the AI Engine-ML version of the AI Engine. The degree of parallelism used in the engine is a design parameter and can be selected according to the target device and application. The DPU supports a set of highly optimized instructions, and supports most convolutional neural networks, such as VGG, ResNet, GoogLeNet, YOLO, SSD, MobileNet, FPN, and others.

The DPUCV2DX8G has the following features:

  • One AXI4-Lite slave interface for accessing configuration and status registers

  • One AXI4 master interface for DPU instruction fetch

  • Two configurations supporting 20 AI Engines per batch handler, supporting BATCH_N = {1,14}.

The following list highlights key supported operators for the DPUCV2DX8G:

  • Convolution and transposed convolution

  • Depthwise convolution and depthwise transposed convolution

  • Max pooling

  • Average pooling

  • ReLU, ReLU6, Leaky ReLU, Hard Sigmoid, and Hard Swish

  • Elementwise-Sum and Elementwise-Multiply

  • Dilation

  • Reorg

  • Fully connected layer

  • Concat, Batch Normalization

The DPU is driven by instructions generated by the Vitis AI compiler. When the target neural network (NN) or DPU hardware architecture is changed, the related .xmodel file that contains these instructions must be regenerated with the updated arch.json file.