Kria™ K260 SOM Starter Kit NLP SmartVision Tutorial

Hardware Architecture of the Accelerator

Hardware Architecture of the Accelerator

Preprocessing IPs and DPU

The Vitis™ software platform overlay includes DPU, as shown in the following figure.


The DPU IP can be configured, and for this design, the following features should be enabled:

  • Channel augmentation

  • Depth-wise convolution

  • Average pooling

  • Relu, LeakyRelu and Relu6

  • URAM enable

To learn more about the DPU, please refer the PG338

As shown in the following table, we integrate the DPU in the nlp_smartvision platform. We analysis the utilization and do some optimizations of the whole hardware design.

Resource usage of current design (estimated)
K26 Resource 117120 144 1248 64
Platform(4K) 14410 43.5 47 1
DPU B3136 43366 67 548 44
Total used 44% 76.7% 47.6% 70.3%

As shown in the following table, we estimated DPU performance and overall power on K26 chip (including all the other IPs). The DPU is assumed to run at 300MHz.

DPU performance and power (estimated)
TOPS (Peak) TOPS (DenseBox)1 Power (Overall)2
B3136 0.92 0.25 7.9W


  1. We use DenseBox_640x360 model to estimate the real performance of DPU, and this model has 1.1GOPs;

  2. We can only estimate the overall power of K26 (including DPU and other IPs)

As shown in Table 3, DPU B3136 bandwidth requirements.

Table 3 – DPU B3136 bandwidth requirements
Operation Peak Average
Write (MB/s) 1300 440
Read (MB/s) 6200 2600




Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License.

You may obtain a copy of the License at

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Copyright© 2021 Xilinx