gemx.py

class gemx.GEMXManager(libFile)

This class will load the C++ shared library and then specify the required argument and return types for each function in the shared library to use in python side.

Note

All the PE in the functions has default value = 0, so no need to put them when xclbin is built with one kernel in it.

createFCNHandle(xclbin, numHandles)

create FCN Handle

Parameters
  • xclbin – file path to FPGA bitstream

  • numHandles – number of kernels in the xclbin

createGEMMHandle(xclbin, numHandles)

create GEMM Handle

Parameters
  • xclbin – file path to FPGA bitstream

  • numHandles – number of kernels in the xclbin

createUSPMVHandle(xclbin, numHandles)

create USPMV Handle

Parameters
  • xclbin – file path to FPGA bitstream

  • numHandles – number of kernels in the xclbin

createSPMVHandle(xclbin, numHandles)

create SPMV Handle

Parameters
  • xclbin – file path to FPGA bitstream

  • numHandles – number of kernels in the xclbin

sendMat(A, PE, sync_send=False)

send dense matrix to kernel if sync_send is true, will only create the buffer for that matrix, and will need to send it when executing the kernel

Parameters
  • A (ndarray) – dense matrix in the host memory

  • PE (int) – index of kernel

  • sync_send (boolean) –

    controls when to send the data to kernel.

    If false, send immediately, else need to send together when executing the kernel. Default value is false.

sendSpMat(row, col, data, m, k, nnz, xclbin_opts, PE)

send sparse matrix to kernel (for spmv engine).

Parameters
  • row (ndarray) – sparse matrix’s row indices

  • col (ndarray) – sparse matrix’s col indices

  • data (ndarray) – sparse matrix’s non-zero elements

  • m (int) – number of rows for this sparse matrix

  • k (int) – number of cols for this sparse matrix

  • nnz (int) – number of non-zero elements of this sparse matrix

  • xclbin_opts (dictionary) – information read from config_info.dat used to build the xclbin

  • PE (int) – index of kernel

Returns

pointer to the start of the host memory for the sparse matrix

Return type

c_void_p

sendUSpMat(rows, cols, datas, ms, ks, nnzs, pRelus, xclbin_opts, PE)

send sparse matrices to kernel (for uspmv engine).

Uspmv engine supports multi-stages usage, so if the prebuilt xclbin is for multi-stages, use this function to send multiple sparse matrices together.

For each matrix, its row index array, col index array and value array need to be sorted to avoid overhead on kernel side.

Parameters
  • rows (ndarray) –

    all the sparse matrices row indices

    when xclbin is multi-stages, rows should be a concatenated ndarray of all the row indices

  • cols (ndarray) –

    all the sparse matrices col indices

    when xclbin is multi-stages, cols should be a concatenated ndarray of all the col indices

  • datas (ndarray) –

    all the sparse matrices non-zero elements

    when xclbin is multi-stages, it should be a concatenated ndarray of all the non-zero elements

  • ms (ndarray) – numbers of rows for all the sparse matrices

  • ks (ndarray) – numbers of cols for all the sparse matrices

  • nnzs (ndarray) – numbers of non-zero elements for all the sparse matrices

  • pRelus (ndarray) – numbers to multiply with the output values when output values < 0

  • xclbin_opts (dictionary) – information read from config_info.dat used to build the xclbin

  • PE (int) – index of kernel

Returns

pointer to the start of the host memory for the sparse matrices

Return type

c_void_p

addFCNOp(A, B, C, bias, postScale, postShift, PReLUScale, PReLUAlpha, PE)

create FCN instruction for C = relu ((A * B + bias) * postScale >> postShift)

Parameters
  • A (ndarray) – dense matrix in the host memory

  • B (ndarray) – dense matrix in the host memory

  • C (ndarray) – dense matrix in the host memory

  • bias (ndarray) – dense matrix in the host memory

  • postScale (int) – multiply the output values with specific scalar

  • postShift (int) – shift the output values with specific scalar

  • PReLUScale (int) – multiply the output values with specific scalar when output values < 0

  • PReLUAlpha (int) – shift the output values with specific scalar when output values < 0

  • PE (int) – index of kernel

addGEMMOp(A, B, C, bias, postScale, postShift, PE)

create GEMM instruction for C = (A * B + bias) * postScale >> postShift

Parameters
  • A (ndarray) – dense matrix in the host memory

  • B (ndarray) – dense matrix in the host memory

  • C (ndarray) – dense matrix in the host memory

  • bias (ndarray) – dense matrix in the host memory

  • postScale (int) – multiply the output values with specific scalar

  • postShift (int) – shift the output values with specific scalar

  • PE (int) – index of kernel

addSPMVOp(A, B, C, nnz, xclbin_opts, relu, PE)

create SPMV instruction for C = relu (A (sparse matrix) * B (dense vector) )

Parameters
  • A (c_void_p) – pointer to the sparse matrix in the host memory

  • B (ndarray) – dense vector in the host memory

  • C (ndarray) – dense vector in the host memory

  • nnz (int) – number of non-zero elements of this sparse matrix

  • relu (boolean) – when relu is true, for output values < 0, output values = 0

  • PE (int) – index of kernel

addUSPMVOp(A, B, C, numRuns, PE)

create USPMV instruction for C = A (sparse matrix) * B (dense matrix)

Parameters
  • A (ndarray of c_void_p) – pointers to all the sparse matrices in the host memory

  • B (ndarray) – dense matrice in the host memory

  • C (ndarray) – dense matrice in the host memory

  • numRuns (int) – col size of the first dense matrix B

  • PE (int) – index of kernel

execute(PE, sync_exec=True)

send instructions created before to kernel, then start.

Parameters
  • PE (int) – index of kernel

  • sync_exec (boolean) –

    Default is True.

    If send some matrices with sync_send = True before, then here need to set sync_exec = False, otherwise those matrices won’t be sent to the kernel.

    It is suggested to use the default value for sync_send and sync_exec.

wait(PE)

Wait until all events have completed. If using default value for sync_send, sync_exec and sync_get before, there is no need to call this function.

Parameters

PE (int) – index of kernel

clearInstrBuf(PE)

Clear the instruction buffer in kernel.

The maximum instructions could be saved in the kernel is 16. Only call this function when previous instructions sent to kernel is > 16.

Parameters

PE (int) – index of kernel

getMat(A, PE, sync_get=True)

Get the dense matrix from kernel to host memory

Parameters
  • A (ndarray) – dense matrix in the host memory

  • PE (int) – index of kernel

  • sync_get (boolean) –

    Default is True.

    If true, it indicates that getMat will wait for the end of the transfer.

    If false, the wait function call is needed to have received all the data.

printStats()

print time used by functions in C++ side