Benchmark

1. Performance

  • Kernel execution time only includes kernel running in fpga device time
  • api execution time include Kernel execution time + memory copy between host and kernel time

1.1 gemv

This benchmark performs the matrix-vecotr multiplication, M is number of rows of matrix, N is number of columns of matrix

gemv with OpenCL in u280

M N Kernel execution time [s] api execution time [s] efficiency [%]
512 256 1.4316e-05 0.00330468 42.9173
512 512 1.9998e-05 0.00337302 61.4461
1024 1024 6.5904e-05 0.0035207 74.5812
2048 2048 0.000235251 0.00365028 83.5737
4096 4096 0.000939699 0.00452506 83.6898
8192 8192 0.00332612 0.0105467 94.5764

For more details on this benchmark, see:

1.2 gemm

This benchmark performs the matrix-matrix multiplication (A * B = C), M is number of rows of matrix A/C, K is number of columns of matrix A/number of rows of matrix B, N is number of columns of matrix B/C

gemm with OpenCL in u250

M N K Kernel execution time [ms] api execution time [ms] Kernel Eff [%]
64 64 64 0.010905 1.750123 38.802577
128 128 128 0.048517 13.802416 69.772592
256 256 256 0.328314 14.645931 82.485022
512 512 512 3.213388 18.199255 67.420400
1024 1024 1024 24.113855 45.519852 71.875005
2048 2048 2048 186.688153 264.195138 74.270743
4096 4096 4096 1469.773731 1708.938204 75.469945

For more details on this benchmark, see:

gemm with XRT in u250

M N K api execution time [ms] api Eff [%] PerfApiTops
256 256 256 2.295277 11.798572 0.058818
512 512 512 7.185994 30.148638 0.149859
1024 1024 1024 33.357721 51.957490 0.257887
2048 2048 2048 218.662946 63.410230 0.314501
4096 4096 4096 1594.648667 69.559988 0.344877
8192 8192 8192 12695.637510 69.897233 0.346485

gemm with XRT (one CU, streaming Kernel) in u250

M N K api execution time [ms] api Eff [%] PerfApiTops
256 256 256 1.370527 19.127241 0.024626
512 512 512 4.517989 46.417820 0.059589
1024 1024 1024 29.500145 56.871639 0.072902
2048 2048 2048 217.555482 61.693563 0.079026
4096 4096 4096 1685.337895 63.710774 0.081580

For more details on the benchmarks, see:

2. Benchmark Test Overview

Here are benchmarks of the Vitis BLAS library using the Vitis environment. It supports software and hardware emulation as well as running hardware accelerators on the Alveo U250.

2.1 Prerequisites

2.1.1 Vitis BLAS Library

2.2 Building

2.2.1 Download code

These blas benchmarks can be downloaded from [vitis libraries](https://github.com/Xilinx/Vitis_Libraries.git) master branch.

git clone https://github.com/Xilinx/Vitis_Libraries.git
cd Vitis_Libraries
git checkout master
cd blas

2.2.2 Setup environment

Setup and build envrionment using the Vitis and XRT scripts:

source <install path>/Vitis/2021.1/settings64.sh
source /opt/xilinx/xrt/setup.sh