Features and Capabilities¶
The Alveo V70 accelerator card is the first AMD Alveo production card leveraging AMD XDNA™ architecture with AI Engines. It is designed for AI inference efficiency and is tuned for video analytics and natural language processing applications. Providing low power and a small form factor, the V70 helps reduce cost per AI channel and provides high channel density for video applications allowing you to meets your demanding AI performance requirements.
The Alveo V70 is pre-equipped with best in class Versal high-throughput DPU (DPUCV2DX8G) implemented using next generation AI engine, powerful Video Decoder Unit (VDU), along with other capable hardware accelerator such as ABR Scaler that are used for media processing.
Specification of Video Decoder Unit (VDU)¶
The Xilinx® LogiCORE™ IP H.264/H.265 Video Decode Unit (VDU) is a Hard IP in Alveo V70. The VDU has multiple instances of decoder cores (up to four cores) and the V70 solution uses two instances of the decoder.
The features of each instance of the decoder core in VDU are as follows:
Multi-standard decoding support, including:
ISO MPEG-4 Part 10: Advanced Video Coding (AVC)/ITU H.264
ISO MPEG-H Part 2: High Efficiency Video Coding (HEVC)/ITU H.265
Resolution: upto 4K (3840x2160)
Framerate: upto 60 Hz
Decoder output in semi-planar formats of YCbCr 4:2:0 (NV12)
Supports 8 bit per color channel.
Supports simultaneous decoding of up to 8 streams of 1080p30.
Progressive support for H.264 and H.265
Profiles:
HEVC: Main, Main Intra up to Level 5.1 High Tier
AVC: Baseline, Main, High up to Level 5.2
NOTE:
V70 solution uses 2 instances of VDU and thus it can support upto 16 streams of 1080p30.
Specification of Image Processing Accelerator¶
The image processing accelerator kernel is based on the Xilinx® LogiCORE™ IP Video Multi-Scaler core. It provides functionalities of resize, color space conversion, mean subtraction and normalization. It will be used as a pre-processing block before inferencing in AI/ML use-cases.
The features of image processing accelerator are as follows:
Supports spatial resolutions from 64 × 64 up to 3184 × 2160
Supports pixel-width (PPC) = 4
Supports RGB, BGR and NV12
Supports 8-bit per color component on memory interface
Dynamically configurable source and destination buffer addresses
Supports 2 taps in both H and V domains
Supports BILINEAR Scale-mode
Supports cropping
Supports Pre-processing (Mean subtraction, and Mean scale)
Specification of Deep learning Processing Unit (DPU)¶
The Xilinx® Versal® Deep Learning Processing Unit (DPUCV2DX8G) is a configurable computation engine optimized for convolution neural networks in Versal ACAP devices with AI Engines. The DPUCV2DX8G is targeted specifically for Versal devices that leverage the AI Engine-ML version of the AI Engine. The degree of parallelism used in the engine is a design parameter and can be selected according to the target device and application. The DPU supports a set of highly optimized instructions, and supports most convolutional neural networks, such as VGG, ResNet, GoogLeNet, YOLO, SSD, MobileNet, FPN, and others.
The DPUCV2DX8G has the following features:
One AXI4-Lite slave interface for accessing configuration and status registers
One AXI4 master interface for DPU instruction fetch
Two configurations supporting 20 AI Engines per batch handler, supporting BATCH_N = {1,14}.
The following list highlights key supported operators for the DPUCV2DX8G:
Convolution and transposed convolution
Depthwise convolution and depthwise transposed convolution
Max pooling
Average pooling
ReLU, ReLU6, Leaky ReLU, Hard Sigmoid, and Hard Swish
Elementwise-Sum and Elementwise-Multiply
Dilation
Reorg
Fully connected layer
Concat, Batch Normalization
The DPU is driven by instructions generated by the Vitis AI compiler. When the target neural network (NN) or DPU hardware architecture is changed, the related .xmodel file that contains these instructions must be regenerated with the updated arch.json file.