gemx.py¶

class gemx.GEMXManager(libFile)¶

This class will load the C++ shared library and then specify the required argument and return types for each function in the shared library to use in python side.

Note

All the PE in the functions has default value = 0, so no need to put them when xclbin is built with one kernel in it.

createFCNHandle(xclbin, numHandles)¶

create FCN Handle

Parameters

xclbin – file path to FPGA bitstream
numHandles – number of kernels in the xclbin

createGEMMHandle(xclbin, numHandles)¶

create GEMM Handle

Parameters

xclbin – file path to FPGA bitstream
numHandles – number of kernels in the xclbin

createUSPMVHandle(xclbin, numHandles)¶

create USPMV Handle

Parameters

xclbin – file path to FPGA bitstream
numHandles – number of kernels in the xclbin

createSPMVHandle(xclbin, numHandles)¶

create SPMV Handle

Parameters

xclbin – file path to FPGA bitstream
numHandles – number of kernels in the xclbin

sendMat(A, PE, sync_send=False)¶

send dense matrix to kernel if sync_send is true, will only create the buffer for that matrix, and will need to send it when executing the kernel

Parameters

A (ndarray) – dense matrix in the host memory
PE (int) – index of kernel
sync_send (boolean) –
controls when to send the data to kernel.

If false, send immediately, else need to send together when executing the kernel. Default value is false.

sendSpMat(row, col, data, m, k, nnz, xclbin_opts, PE)¶

send sparse matrix to kernel (for spmv engine).

Parameters

row (ndarray) – sparse matrix’s row indices
col (ndarray) – sparse matrix’s col indices
data (ndarray) – sparse matrix’s non-zero elements
m (int) – number of rows for this sparse matrix
k (int) – number of cols for this sparse matrix
nnz (int) – number of non-zero elements of this sparse matrix
xclbin_opts (dictionary) – information read from config_info.dat used to build the xclbin
PE (int) – index of kernel

Returns

pointer to the start of the host memory for the sparse matrix

Return type

c_void_p

sendUSpMat(rows, cols, datas, ms, ks, nnzs, pRelus, xclbin_opts, PE)¶

send sparse matrices to kernel (for uspmv engine).

Uspmv engine supports multi-stages usage, so if the prebuilt xclbin is for multi-stages, use this function to send multiple sparse matrices together.

For each matrix, its row index array, col index array and value array need to be sorted to avoid overhead on kernel side.

Parameters

rows (ndarray) –
all the sparse matrices row indices

when xclbin is multi-stages, rows should be a concatenated ndarray of all the row indices
cols (ndarray) –
all the sparse matrices col indices

when xclbin is multi-stages, cols should be a concatenated ndarray of all the col indices
datas (ndarray) –
all the sparse matrices non-zero elements

when xclbin is multi-stages, it should be a concatenated ndarray of all the non-zero elements
ms (ndarray) – numbers of rows for all the sparse matrices
ks (ndarray) – numbers of cols for all the sparse matrices
nnzs (ndarray) – numbers of non-zero elements for all the sparse matrices
pRelus (ndarray) – numbers to multiply with the output values when output values < 0
xclbin_opts (dictionary) – information read from config_info.dat used to build the xclbin
PE (int) – index of kernel

Returns

pointer to the start of the host memory for the sparse matrices

Return type

c_void_p

addFCNOp(A, B, C, bias, postScale, postShift, PReLUScale, PReLUAlpha, PE)¶

create FCN instruction for C = relu ((A * B + bias) * postScale >> postShift)

Parameters

A (ndarray) – dense matrix in the host memory
B (ndarray) – dense matrix in the host memory
C (ndarray) – dense matrix in the host memory
bias (ndarray) – dense matrix in the host memory
postScale (int) – multiply the output values with specific scalar
postShift (int) – shift the output values with specific scalar
PReLUScale (int) – multiply the output values with specific scalar when output values < 0
PReLUAlpha (int) – shift the output values with specific scalar when output values < 0
PE (int) – index of kernel

addGEMMOp(A, B, C, bias, postScale, postShift, PE)¶

create GEMM instruction for C = (A * B + bias) * postScale >> postShift

Parameters

A (ndarray) – dense matrix in the host memory
B (ndarray) – dense matrix in the host memory
C (ndarray) – dense matrix in the host memory
bias (ndarray) – dense matrix in the host memory
postScale (int) – multiply the output values with specific scalar
postShift (int) – shift the output values with specific scalar
PE (int) – index of kernel

addSPMVOp(A, B, C, nnz, xclbin_opts, relu, PE)¶

create SPMV instruction for C = relu (A (sparse matrix) * B (dense vector) )

Parameters

A (c_void_p) – pointer to the sparse matrix in the host memory
B (ndarray) – dense vector in the host memory
C (ndarray) – dense vector in the host memory
nnz (int) – number of non-zero elements of this sparse matrix
relu (boolean) – when relu is true, for output values < 0, output values = 0
PE (int) – index of kernel

addUSPMVOp(A, B, C, numRuns, PE)¶

create USPMV instruction for C = A (sparse matrix) * B (dense matrix)

Parameters

A (ndarray of c_void_p) – pointers to all the sparse matrices in the host memory
B (ndarray) – dense matrice in the host memory
C (ndarray) – dense matrice in the host memory
numRuns (int) – col size of the first dense matrix B
PE (int) – index of kernel

execute(PE, sync_exec=True)¶

send instructions created before to kernel, then start.

Parameters

PE (int) – index of kernel
sync_exec (boolean) –
Default is True.

If send some matrices with sync_send = True before, then here need to set sync_exec = False, otherwise those matrices won’t be sent to the kernel.

It is suggested to use the default value for sync_send and sync_exec.

wait(PE)¶

Wait until all events have completed. If using default value for sync_send, sync_exec and sync_get before, there is no need to call this function.

Parameters: PE (int) – index of kernel

clearInstrBuf(PE)¶

Clear the instruction buffer in kernel.

The maximum instructions could be saved in the kernel is 16. Only call this function when previous instructions sent to kernel is > 16.

Parameters: PE (int) – index of kernel

getMat(A, PE, sync_get=True)¶

Get the dense matrix from kernel to host memory

Parameters

A (ndarray) – dense matrix in the host memory
PE (int) – index of kernel
sync_get (boolean) –
Default is True.

If true, it indicates that getMat will wait for the end of the transfer.

If false, the wait function call is needed to have received all the data.

printStats()¶: print time used by functions in C++ side