CSCMV Kernel APIs

Note

CSCMV implementation only uses one HBM channel on U280 card. In future releases, multiple (up to 32) HBM channels may be used to achieve the maximum performance.

cscRowPktKernel

#include "cscRowPktKernel.hpp"
void cscRowPktKernel (
    const ap_uint <SPARSE_hbmMemBits>* p_aNnzIdx,
    const unsigned int p_memBlocks,
    const unsigned int p_nnzBlocks,
    const unsigned int p_rowBlocks,
    hls::stream <SPARSE_parDataPktType>& in,
    hls::stream <SPARSE_parDataPktType>& out
    )

cscRowPkt Kernel

Parameters:

p_aNnzIdx the device memory pointer for read the NNZ values and row indices
p_memBlocks the number of device memory accesses to read the NNZ values androw indices
p_nnzBlocks the number of parallel NNZ entries
p_rowBlocks the number of parallel row vector entries
in the input axi stream of column vector entries selected for the NNZs
out the output axi stream of result row vector entries

loadColPtrValKernel

#include "loadColPtrValKernel.hpp"
void loadColPtrValKernel (
    const ap_uint <SPARSE_ddrMemBits>* p_memColVal,
    const ap_uint <SPARSE_ddrMemBits>* p_memColPtr,
    const unsigned int p_memBlocks,
    const unsigned int p_numTrans,
    hls::stream <SPARSE_parDataPktType>& out1,
    hls::stream <SPARSE_parIndexPktType>& out2
    )

loadColPtrVal Kernel

Parameters:

p_memColVal device memory pointer for reading column vector
p_memColVal device memory pointer for read column pointers of NNZ entries
p_memBlocks number of blocks of vector entries in the memory read operation
p_numTrans number of times to trigger this kernel. Currently only support 1
out1 the axi stream of output column vector entries
out2 the axi stream of output column pointer entries

storeDatPktKernel

#include "storeDatPktKernel.hpp"
void storeDatPktKernel (
    hls::stream <SPARSE_parDataPktType>& in,
    ap_uint <SPARSE_ddrMemBits>* p_memPtr,
    unsigned int p_memBlocks
    )

storeDataPkt Kernel

Parameters:

in the input axi stream of row vector entries of cscmv operation results
p_memPtr the device memory pointer for writing the row vector entries
p_memBlocks the number of vector entries in each memory write

xBarColKernel

#include "xBarColKernel.hpp"
void xBarColKernel (
    const unsigned int p_colPtrBlocks,
    const unsigned int p_nnzBlocks,
    hls::stream <SPARSE_parDataPktType>& in1,
    hls::stream <SPARSE_parIndexPktType>& in2,
    hls::stream <SPARSE_parDataPktType>& out
    )

xBarCol Kernel

Parameters:

p_colPtrBlocks number of parallel column pointer entries in the axi stream input in2
p_nnzBlocks number of parallel NNZ entries in the input axi stream in1 and output axi stream out
in1 input axi stream of parallel column vector entries
in2 input axi stream of parallel column pointer entries
out output axi stream of parallel column vector entries selected for the NNZs