Kria™ K260 SOM Starter Kit NLP SmartVision Tutorial

Hardware Architecture of the Accelerator

Hardware Architecture of the Accelerator

Preprocessing IPs and DPU

The Vitis™ software platform overlay includes DPU, as shown in the following figure.

IP_overlay_and_connection_block_diagram

The DPU IP can be configured, and for this design, the following features should be enabled:

  • Channel augmentation

  • Depth-wise convolution

  • Average pooling

  • Relu, LeakyRelu and Relu6

  • UltraRAM enable

To learn more about the DPU, refer the DPUCZDX8G for Zynq UltraScale+ MPSoCs Product Guide (PG338).

As shown in the following table, the DPU is integrated in the nlp_smartvision platform. The utilization is analyzed, and some optimizations of the whole hardware design is done.

Resource Usage of Current Design (Estimated)
LUTs BRAM DSP URAM
K26 Resource 117120 144 1248 64
Platform(4K) 14410 43.5 47 1
DPU B3136 43366 67 548 44
Total used 44% 76.7% 47.6% 70.3%

As shown in the following table, the DPU performance and overall power on the K26 chip (including all the other IPs) is estimated. The DPU is assumed to run at 300 MHz.

DPU Performance and Power (Estimated)
TOPS (Peak) TOPS (DenseBox)1 Power (Overall)2
B3136 0.92 0.25 7.9W

NOTE:

  1. The DenseBox_640x360 model is used to estimate the real performance of DPU, and this model has 1.1GO Ps.

  2. The overall power of K26 (including DPU and other IPs) is only estimated.

This is shown in Table 3, DPU B3136 Bandwidth Requirements.

Table 3 – DPU B3136 Bandwidth Requirements
Operation Peak Average
Write (MB/s) 1300 440
Read (MB/s) 6200 2600

References

DPU

Copyright © 2021-2024 Advanced Micro Devices, Inc

Terms and Conditions