GQE Kernel APIs

Note

GQE has been tested on Alveo U280 card, and makes use of both HBM and DDR. While other cards like U250 and U200 are not supported out-of-box, porting and gaining acceleration is surely possible, with tailoring and tuning.

gqeAggr

#include "gqe_aggr.hpp"
void gqeAggr (
    ap_uint <8*sizeof (int32_t)*16> buf_in [],
    ap_uint <8*sizeof (int32_t)*16> buf_out [],
    ap_uint <8*sizeof (int32_t)> buf_cfg [],
    ap_uint <8*sizeof (int32_t)> buf_result_info [],
    ap_uint <8*sizeof (int32_t)*16> ping_buf0 [],
    ap_uint <8*sizeof (int32_t)*16> ping_buf1 [],
    ap_uint <8*sizeof (int32_t)*16> ping_buf2 [],
    ap_uint <8*sizeof (int32_t)*16> ping_buf3 [],
    ap_uint <8*sizeof (int32_t)*16> pong_buf0 [],
    ap_uint <8*sizeof (int32_t)*16> pong_buf1 [],
    ap_uint <8*sizeof (int32_t)*16> pong_buf2 [],
    ap_uint <8*sizeof (int32_t)*16> pong_buf3 []
    )

GQE Aggr Kernel.

For detailed document, see GQE Kernel Design.

Parameters:

buf_in input table buffer.
buf_out output table buffer.
buf_cfg input configuration buffer.
buf_result_info output information buffer.
ping_buf0 gqeAggr’s temporal buffer for storing overflow.
ping_buf1 gqeAggr’s temporal buffer for storing overflow.
ping_buf2 gqeAggr’s temporal buffer for storing overflow.
ping_buf3 gqeAggr’s temporal buffer for storing overflow.
pong_buf0 gqeAggr’s temporal buffer for storing overflow.
pong_buf1 gqeAggr’s temporal buffer for storing overflow.
pong_buf2 gqeAggr’s temporal buffer for storing overflow.
pong_buf3 gqeAggr’s temporal buffer for storing overflow.

gqeJoin

#include "gqe_join.hpp"
void gqeJoin (
    ap_uint <8*sizeof (int32_t)*16> buf_A [],
    ap_uint <8*sizeof (int32_t)*16> buf_B [],
    ap_uint <8*sizeof (int32_t)*16> buf_C [],
    ap_uint <8*sizeof (int32_t)*16> buf_D [],
    ap_uint <8*sizeof (int32_t)*2> htb_buf0 [],
    ap_uint <8*sizeof (int32_t)*2> htb_buf1 [],
    ap_uint <8*sizeof (int32_t)*2> htb_buf2 [],
    ap_uint <8*sizeof (int32_t)*2> htb_buf3 [],
    ap_uint <8*sizeof (int32_t)*2> htb_buf4 [],
    ap_uint <8*sizeof (int32_t)*2> htb_buf5 [],
    ap_uint <8*sizeof (int32_t)*2> htb_buf6 [],
    ap_uint <8*sizeof (int32_t)*2> htb_buf7 [],
    ap_uint <8*sizeof (int32_t)*2> stb_buf0 [],
    ap_uint <8*sizeof (int32_t)*2> stb_buf1 [],
    ap_uint <8*sizeof (int32_t)*2> stb_buf2 [],
    ap_uint <8*sizeof (int32_t)*2> stb_buf3 [],
    ap_uint <8*sizeof (int32_t)*2> stb_buf4 [],
    ap_uint <8*sizeof (int32_t)*2> stb_buf5 [],
    ap_uint <8*sizeof (int32_t)*2> stb_buf6 [],
    ap_uint <8*sizeof (int32_t)*2> stb_buf7 []
    )

GQE Join Kernel.

For detailed document, see GQE Kernel Design.

Parameters:

buf_A input table A buffer.
buf_B input table B buffer.
buf_C output table C buffer.
buf_D configuration buffer.
htb_buf0 gqeJoin’s temporal buffer for storing small table.
htb_buf1 gqeJoin’s temporal buffer for storing small table.
htb_buf2 gqeJoin’s temporal buffer for storing small table.
htb_buf3 gqeJoin’s temporal buffer for storing small table.
htb_buf4 gqeJoin’s temporal buffer for storing small table.
htb_buf5 gqeJoin’s temporal buffer for storing small table.
htb_buf6 gqeJoin’s temporal buffer for storing small table.
htb_buf7 gqeJoin’s temporal buffer for storing small table.
stb_buf0 gqeJoin’s temporal buffer for storing small table.
stb_buf1 gqeJoin’s temporal buffer for storing small table.
stb_buf2 gqeJoin’s temporal buffer for storing small table.
stb_buf3 gqeJoin’s temporal buffer for storing small table.
stb_buf4 gqeJoin’s temporal buffer for storing small table.
stb_buf5 gqeJoin’s temporal buffer for storing small table.
stb_buf6 gqeJoin’s temporal buffer for storing small table.
stb_buf7 gqeJoin’s temporal buffer for storing small table.

gqePart

#include "gqe_part.hpp"
void gqePart (
    const int k_depth,
    const int col_index,
    const int bit_num,
    ap_uint <8*4*16> buf_A [],
    ap_uint <8*4*16> buf_B [],
    ap_uint <8*4*16> buf_D []
    )

GQE partition kernel.

Parameters:

k_depth depth of each hash bucket in URAM
col_index index of input column
bit_num number of defined partition, log2(number of partition)
buf_A input table buffer
buf_B output table buffer
buf_D configuration buffer