gemx.py¶
-
class
gemx.
GEMXManager
(libFile)¶ This class will load the C++ shared library and then specify the required argument and return types for each function in the shared library to use in python side.
Note
All the PE in the functions has default value = 0, so no need to put them when xclbin is built with one kernel in it.
-
createFCNHandle
(xclbin, numHandles)¶ create FCN Handle
- Parameters
xclbin – file path to FPGA bitstream
numHandles – number of kernels in the xclbin
-
createGEMMHandle
(xclbin, numHandles)¶ create GEMM Handle
- Parameters
xclbin – file path to FPGA bitstream
numHandles – number of kernels in the xclbin
-
createUSPMVHandle
(xclbin, numHandles)¶ create USPMV Handle
- Parameters
xclbin – file path to FPGA bitstream
numHandles – number of kernels in the xclbin
-
createSPMVHandle
(xclbin, numHandles)¶ create SPMV Handle
- Parameters
xclbin – file path to FPGA bitstream
numHandles – number of kernels in the xclbin
-
sendMat
(A, PE, sync_send=False)¶ send dense matrix to kernel if sync_send is true, will only create the buffer for that matrix, and will need to send it when executing the kernel
- Parameters
A (ndarray) – dense matrix in the host memory
PE (int) – index of kernel
sync_send (boolean) –
controls when to send the data to kernel.
If false, send immediately, else need to send together when executing the kernel. Default value is false.
-
sendSpMat
(row, col, data, m, k, nnz, xclbin_opts, PE)¶ send sparse matrix to kernel (for spmv engine).
- Parameters
row (ndarray) – sparse matrix’s row indices
col (ndarray) – sparse matrix’s col indices
data (ndarray) – sparse matrix’s non-zero elements
m (int) – number of rows for this sparse matrix
k (int) – number of cols for this sparse matrix
nnz (int) – number of non-zero elements of this sparse matrix
xclbin_opts (dictionary) – information read from config_info.dat used to build the xclbin
PE (int) – index of kernel
- Returns
pointer to the start of the host memory for the sparse matrix
- Return type
c_void_p
-
sendUSpMat
(rows, cols, datas, ms, ks, nnzs, pRelus, xclbin_opts, PE)¶ send sparse matrices to kernel (for uspmv engine).
Uspmv engine supports multi-stages usage, so if the prebuilt xclbin is for multi-stages, use this function to send multiple sparse matrices together.
For each matrix, its row index array, col index array and value array need to be sorted to avoid overhead on kernel side.
- Parameters
rows (ndarray) –
all the sparse matrices row indices
when xclbin is multi-stages, rows should be a concatenated ndarray of all the row indices
cols (ndarray) –
all the sparse matrices col indices
when xclbin is multi-stages, cols should be a concatenated ndarray of all the col indices
datas (ndarray) –
all the sparse matrices non-zero elements
when xclbin is multi-stages, it should be a concatenated ndarray of all the non-zero elements
ms (ndarray) – numbers of rows for all the sparse matrices
ks (ndarray) – numbers of cols for all the sparse matrices
nnzs (ndarray) – numbers of non-zero elements for all the sparse matrices
pRelus (ndarray) – numbers to multiply with the output values when output values < 0
xclbin_opts (dictionary) – information read from config_info.dat used to build the xclbin
PE (int) – index of kernel
- Returns
pointer to the start of the host memory for the sparse matrices
- Return type
c_void_p
-
addFCNOp
(A, B, C, bias, postScale, postShift, PReLUScale, PReLUAlpha, PE)¶ create FCN instruction for C = relu ((A * B + bias) * postScale >> postShift)
- Parameters
A (ndarray) – dense matrix in the host memory
B (ndarray) – dense matrix in the host memory
C (ndarray) – dense matrix in the host memory
bias (ndarray) – dense matrix in the host memory
postScale (int) – multiply the output values with specific scalar
postShift (int) – shift the output values with specific scalar
PReLUScale (int) – multiply the output values with specific scalar when output values < 0
PReLUAlpha (int) – shift the output values with specific scalar when output values < 0
PE (int) – index of kernel
-
addGEMMOp
(A, B, C, bias, postScale, postShift, PE)¶ create GEMM instruction for C = (A * B + bias) * postScale >> postShift
- Parameters
A (ndarray) – dense matrix in the host memory
B (ndarray) – dense matrix in the host memory
C (ndarray) – dense matrix in the host memory
bias (ndarray) – dense matrix in the host memory
postScale (int) – multiply the output values with specific scalar
postShift (int) – shift the output values with specific scalar
PE (int) – index of kernel
-
addSPMVOp
(A, B, C, nnz, xclbin_opts, relu, PE)¶ create SPMV instruction for C = relu (A (sparse matrix) * B (dense vector) )
- Parameters
A (c_void_p) – pointer to the sparse matrix in the host memory
B (ndarray) – dense vector in the host memory
C (ndarray) – dense vector in the host memory
nnz (int) – number of non-zero elements of this sparse matrix
relu (boolean) – when relu is true, for output values < 0, output values = 0
PE (int) – index of kernel
-
addUSPMVOp
(A, B, C, numRuns, PE)¶ create USPMV instruction for C = A (sparse matrix) * B (dense matrix)
- Parameters
A (ndarray of c_void_p) – pointers to all the sparse matrices in the host memory
B (ndarray) – dense matrice in the host memory
C (ndarray) – dense matrice in the host memory
numRuns (int) – col size of the first dense matrix B
PE (int) – index of kernel
-
execute
(PE, sync_exec=True)¶ send instructions created before to kernel, then start.
- Parameters
PE (int) – index of kernel
sync_exec (boolean) –
Default is True.
If send some matrices with sync_send = True before, then here need to set sync_exec = False, otherwise those matrices won’t be sent to the kernel.
It is suggested to use the default value for sync_send and sync_exec.
-
wait
(PE)¶ Wait until all events have completed. If using default value for sync_send, sync_exec and sync_get before, there is no need to call this function.
- Parameters
PE (int) – index of kernel
-
clearInstrBuf
(PE)¶ Clear the instruction buffer in kernel.
The maximum instructions could be saved in the kernel is 16. Only call this function when previous instructions sent to kernel is > 16.
- Parameters
PE (int) – index of kernel
-
getMat
(A, PE, sync_get=True)¶ Get the dense matrix from kernel to host memory
- Parameters
A (ndarray) – dense matrix in the host memory
PE (int) – index of kernel
sync_get (boolean) –
Default is True.
If true, it indicates that getMat will wait for the end of the transfer.
If false, the wait function call is needed to have received all the data.
-
printStats
()¶ print time used by functions in C++ side
-