Singular Value Decomposition for symmetric matrix (GESVDJ)

The hardware resources and performance for double and float datatype gesvdj are listed in Table 2 and Table 3.

Table 2 double Type GESVDJ performance table
Matrix Size Unroll URAM BRAM DSP Register LUT Kernel time (ms) Frequency(MHz)
8x8 4 20 6 216 46245 39365 0.0711 250
512x512 8 128 333 408 120837 115121 2100.833 208.3
Table 3 float Type GESVDJ performance table
Matrix Size Unroll URAM BRAM DSP Register LUT Kernel time (ms) Frequency(MHz)
8x8 4 20 4 114 23647 18529 0.0533 250
512x512 8 128 307 210 65569 65003 1687.274 208.3

Note

Board: Xilinx Alveo U250 Data Center Accelerator Card

The accuracy of GESVDJ implementation has been verified with Lapack dgesvd (QR based SVD) and dgesvj (Jacobi SVD) functions.

Caution

The unroll factor is limited by 2 factors, the matrix size and URAM port. The maximum unroll factor should be less than half of matrix size, and \(2 \times {Unroll}^{2}\) should also be less than available URAM on board. Besides, unroll factor can only be the factorization of 2