.. Copyright 2019 Xilinx, Inc. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. ************ Benchmark ************ .. _performance: Performance ################ Conjugate Gradient Algorithm **************************** Here are benchmarks of the Vitis HPC Library using the Vitis environment and comparing results on several FPGA and CPU platforms. It supports software and hardware emulation as well as running hardware accelerators on the Alveo U250, U280 or U50. GEMV-based CG ^^^^^^^^^^^^^^^^^^^^ The following table lists the resource utilization for GEMV-based CG kernel with 16 HBM channels storing the matrix. .. table:: Resource Utilization on U50 :align: center +----------------------------+------------------+------------------+-------------------+----------------+---------------+----------------+ | Name | LUT | LUTAsMem | REG | BRAM | URAM | DSP | +============================+==================+==================+===================+================+===============+================+ | User Budget | 699619 [100.00%] | 369603 [100.00%] | 1447189 [100.00%] | 1112 [100.00%] | 640 [100.00%] | 5936 [100.00%] | +----------------------------+------------------+------------------+-------------------+----------------+---------------+----------------+ | Used Resources | 186448 [ 26.65%] | 17334 [ 4.69%] | 325149 [ 22.47%] | 128 [ 11.51%] | 0 [ 0.00%] | 1262 [ 21.26%] | +----------------------------+------------------+------------------+-------------------+----------------+---------------+----------------+ .. table:: Benchmark Results on U50 :align: center +-------------+-------------------------+---------------------------+----------------------------------+--------------------------+--------------------+ | Vector Size | Time per Iteration [ms] | U50 Performance [GFLOPS] | U50 Energy Efficiency [GFLOPS/W] | CPU Performance [GFLOPS] | Acceleration Ratio | +=============+=========================+===========================+==================================+==========================+====================+ | 1024 | 0.073 | 26.938 | 0.723 | 12.996 | 2.073 | +-------------+-------------------------+---------------------------+----------------------------------+--------------------------+--------------------+ | 2048 | 0.2557 | 30.658 | 0.766 | 27.469 | 1.116 | +-------------+-------------------------+---------------------------+----------------------------------+--------------------------+--------------------+ | 4096 | 0.9202 | 34.018 | 0.812 | 7.776 | 4.375 | +-------------+-------------------------+---------------------------+----------------------------------+--------------------------+--------------------+ | 8192 | 3.405 | 36.742 | 0.839 | 8.226 | 4.467 | +-------------+-------------------------+---------------------------+----------------------------------+--------------------------+--------------------+ SPMV-based CG ^^^^^^^^^^^^^^^^^^^^^^^ The following table lists the resource utilization for SPMV-based CG kernel. .. table:: Resource Utilization on U280 :align: center +----------------------------+-------------------+------------------+-------------------+----------------+---------------+----------------+ | Name | LUT | LUTAsMem | REG | BRAM | URAM | DSP | +============================+===================+==================+===================+================+===============+================+ | User Budget | 1104369 [100.00%] | 552814 [100.00%] | 2217989 [100.00%] | 1693 [100.00%] | 896 [100.00%] | 9020 [100.00%] | +----------------------------+-------------------+------------------+-------------------+----------------+---------------+----------------+ | Used Resources | 285372 [ 25.84%] | 36605 [ 6.62%] | 442368 [ 19.94%] | 267 [ 15.77%] | 64 [ 7.14%] | 1192 [ 13.22%] | +----------------------------+-------------------+------------------+-------------------+----------------+---------------+----------------+ .. table:: Benchmark Results on U280 :align: center +----------------+-----------+---------+------------------+-------------+---------------+----------------+--------------------+---------------------------+--------------------+ | Matrix Name | Rows/Cols | NNZs | Padded Rows/Cols | Padded NNZs | Padding Ratio | No. iterations | Time per Iter [ms] | Time per Iter on CPU [ms] | Acceleration Ratio | +================+===========+=========+==================+=============+===============+================+====================+===========================+====================+ | nasa2910 | 2910 | 174296 | 2912 | 297952 | 1.70946 | 1777 | 0.0511172 | 0.0692836 | 1.36 | +----------------+-----------+---------+------------------+-------------+---------------+----------------+--------------------+---------------------------+--------------------+ | ex9 | 3363 | 99471 | 3364 | 199328 | 2.00388 | 5000 | 0.0497677 | 0.0559332 | 1.12 | +----------------+-----------+---------+------------------+-------------+---------------+----------------+--------------------+---------------------------+--------------------+ | bcsstk24 | 3562 | 159910 | 3564 | 222656 | 1.39238 | 5000 | 0.0598962 | 0.0581827 | 0.97 | +----------------+-----------+---------+------------------+-------------+---------------+----------------+--------------------+---------------------------+--------------------+ | bcsstk15 | 3948 | 117816 | 3948 | 267488 | 2.27039 | 658 | 0.0927269 | 0.125615 | 1.35 | +----------------+-----------+---------+------------------+-------------+---------------+----------------+--------------------+---------------------------+--------------------+ | bcsstk28 | 4410 | 219024 | 4412 | 319264 | 1.45767 | 4878 | 0.0586356 | 6.92198 | 118.05 | +----------------+-----------+---------+------------------+-------------+---------------+----------------+--------------------+---------------------------+--------------------+ | s3rmt3m3 | 5357 | 207695 | 5360 | 330624 | 1.59187 | 5000 | 0.0744822 | 6.55229 | 87.97 | +----------------+-----------+---------+------------------+-------------+---------------+----------------+--------------------+---------------------------+--------------------+ | s2rmq4m1 | 5489 | 281111 | 5492 | 427648 | 1.52128 | 1779 | 0.084562 | 6.75384 | 79.87 | +----------------+-----------+---------+------------------+-------------+---------------+----------------+--------------------+---------------------------+--------------------+ | nd3k | 9000 | 3279690 | 9000 | 4277792 | 1.30433 | 5000 | 0.363479 | 4.66861 | 12.84 | +----------------+-----------+---------+------------------+-------------+---------------+----------------+--------------------+---------------------------+--------------------+ | ted_B | 10605 | 144579 | 10608 | 548416 | 3.79319 | 30 | 0.984467 | 6.53108 | 6.63 | +----------------+-----------+---------+------------------+-------------+---------------+----------------+--------------------+---------------------------+--------------------+ | ted_B_unscaled | 10605 | 144579 | 10608 | 548416 | 3.79319 | 16 | 1.75354 | 8.59891 | 4.90 | +----------------+-----------+---------+------------------+-------------+---------------+----------------+--------------------+---------------------------+--------------------+ | msc10848 | 10848 | 1229778 | 10848 | 2050720 | 1.66755 | 5000 | 0.230942 | 5.43921 | 23.55 | +----------------+-----------+---------+------------------+-------------+---------------+----------------+--------------------+---------------------------+--------------------+ | cbuckle | 13681 | 676515 | 13684 | 924832 | 1.36705 | 1282 | 0.16427 | 5.48588 | 33.40 | +----------------+-----------+---------+------------------+-------------+---------------+----------------+--------------------+---------------------------+--------------------+ | olafu | 16146 | 1015156 | 16148 | 1452320 | 1.43064 | 5000 | 0.169174 | 5.05108 | 29.86 | +----------------+-----------+---------+------------------+-------------+---------------+----------------+--------------------+---------------------------+--------------------+ | gyro_k | 17361 | 1021159 | 17364 | 1932384 | 1.89234 | 5000 | 0.254172 | 4.85938 | 19.12 | +----------------+-----------+---------+------------------+-------------+---------------+----------------+--------------------+---------------------------+--------------------+ | bodyy4 | 17546 | 121938 | 17548 | 710112 | 5.82355 | 230 | 0.174435 | 4.73164 | 27.13 | +----------------+-----------+---------+------------------+-------------+---------------+----------------+--------------------+---------------------------+--------------------+ | nd6k | 18000 | 6897316 | 18000 | 9415552 | 1.3651 | 5000 | 0.809868 | 4.25772 | 5.26 | +----------------+-----------+---------+------------------+-------------+---------------+----------------+--------------------+---------------------------+--------------------+ | raefsky4 | 19779 | 1328611 | 19780 | 2268704 | 1.70758 | 5000 | 0.268956 | 4.22843 | 15.72 | +----------------+-----------+---------+------------------+-------------+---------------+----------------+--------------------+---------------------------+--------------------+ | bcsstk36 | 23052 | 1143140 | 23052 | 1833056 | 1.60353 | 5000 | 0.253049 | 3.9882 | 15.76 | +----------------+-----------+---------+------------------+-------------+---------------+----------------+--------------------+---------------------------+--------------------+ These are details for benchmark result and usage steps. .. toctree:: :maxdepth: 1 user_guide/L2/benchmark/cg_gemv_jacobi.rst user_guide/L2/benchmark/cg_spmv_jacobi.rst Benchmark Overview ################### .. _l2_vitis_hpc: Vitis HPC Library ***************** * **Download code** These hpc benchmarks can be downloaded from `vitis libraries `_ ``master`` branch. .. code-block:: bash git clone https://github.com/Xilinx/Vitis_Libraries.git cd Vitis_Libraries git checkout master cd hpc * **Setup environment** Specifying the corresponding Vitis, XRT, and path to the platform repository by running following commands. Set up Python environment with :doc:`Python environment setup guide <../pyenvguide>` .. code-block:: bash source /installs/lin64/Vitis/2021.1/settings64.sh source /opt/xilinx/xrt/setup.sh export PLATFORM_REPO_PATHS=/opt/xilinx/platforms