GQE Kernel APIs¶
Note
GQE has been tested on Alveo U280 card, and makes use of both HBM and DDR. While other cards like U250 and U200 are not supported out-of-box, porting and gaining acceleration is surely possible, with tailoring and tuning.
gqeAggr¶
#include "gqe_aggr.hpp"
void gqeAggr ( ap_uint <8*sizeof (int32_t)*16> buf_in [], ap_uint <8*sizeof (int32_t)*16> buf_out [], ap_uint <8*sizeof (int32_t)> buf_cfg [], ap_uint <8*sizeof (int32_t)> buf_result_info [], ap_uint <8*sizeof (int32_t)*16> ping_buf0 [], ap_uint <8*sizeof (int32_t)*16> ping_buf1 [], ap_uint <8*sizeof (int32_t)*16> ping_buf2 [], ap_uint <8*sizeof (int32_t)*16> ping_buf3 [], ap_uint <8*sizeof (int32_t)*16> pong_buf0 [], ap_uint <8*sizeof (int32_t)*16> pong_buf1 [], ap_uint <8*sizeof (int32_t)*16> pong_buf2 [], ap_uint <8*sizeof (int32_t)*16> pong_buf3 [] )
GQE Aggr Kernel.
For detailed document, see GQE Kernel Design.
Parameters:
buf_in | input table buffer. |
buf_out | output table buffer. |
buf_cfg | input configuration buffer. |
buf_result_info | output information buffer. |
ping_buf0 | gqeAggr’s temporal buffer for storing overflow. |
ping_buf1 | gqeAggr’s temporal buffer for storing overflow. |
ping_buf2 | gqeAggr’s temporal buffer for storing overflow. |
ping_buf3 | gqeAggr’s temporal buffer for storing overflow. |
pong_buf0 | gqeAggr’s temporal buffer for storing overflow. |
pong_buf1 | gqeAggr’s temporal buffer for storing overflow. |
pong_buf2 | gqeAggr’s temporal buffer for storing overflow. |
pong_buf3 | gqeAggr’s temporal buffer for storing overflow. |
gqeJoin¶
#include "gqe_join.hpp"
void gqeJoin ( ap_uint <8*sizeof (int32_t)*16> buf_A [], ap_uint <8*sizeof (int32_t)*16> buf_B [], ap_uint <8*sizeof (int32_t)*16> buf_C [], ap_uint <8*sizeof (int32_t)*16> buf_D [], ap_uint <8*sizeof (int32_t)*2> htb_buf0 [], ap_uint <8*sizeof (int32_t)*2> htb_buf1 [], ap_uint <8*sizeof (int32_t)*2> htb_buf2 [], ap_uint <8*sizeof (int32_t)*2> htb_buf3 [], ap_uint <8*sizeof (int32_t)*2> htb_buf4 [], ap_uint <8*sizeof (int32_t)*2> htb_buf5 [], ap_uint <8*sizeof (int32_t)*2> htb_buf6 [], ap_uint <8*sizeof (int32_t)*2> htb_buf7 [], ap_uint <8*sizeof (int32_t)*2> stb_buf0 [], ap_uint <8*sizeof (int32_t)*2> stb_buf1 [], ap_uint <8*sizeof (int32_t)*2> stb_buf2 [], ap_uint <8*sizeof (int32_t)*2> stb_buf3 [], ap_uint <8*sizeof (int32_t)*2> stb_buf4 [], ap_uint <8*sizeof (int32_t)*2> stb_buf5 [], ap_uint <8*sizeof (int32_t)*2> stb_buf6 [], ap_uint <8*sizeof (int32_t)*2> stb_buf7 [] )
GQE Join Kernel.
For detailed document, see GQE Kernel Design.
Parameters:
buf_A | input table A buffer. |
buf_B | input table B buffer. |
buf_C | output table C buffer. |
buf_D | configuration buffer. |
htb_buf0 | gqeJoin’s temporal buffer for storing small table. |
htb_buf1 | gqeJoin’s temporal buffer for storing small table. |
htb_buf2 | gqeJoin’s temporal buffer for storing small table. |
htb_buf3 | gqeJoin’s temporal buffer for storing small table. |
htb_buf4 | gqeJoin’s temporal buffer for storing small table. |
htb_buf5 | gqeJoin’s temporal buffer for storing small table. |
htb_buf6 | gqeJoin’s temporal buffer for storing small table. |
htb_buf7 | gqeJoin’s temporal buffer for storing small table. |
stb_buf0 | gqeJoin’s temporal buffer for storing small table. |
stb_buf1 | gqeJoin’s temporal buffer for storing small table. |
stb_buf2 | gqeJoin’s temporal buffer for storing small table. |
stb_buf3 | gqeJoin’s temporal buffer for storing small table. |
stb_buf4 | gqeJoin’s temporal buffer for storing small table. |
stb_buf5 | gqeJoin’s temporal buffer for storing small table. |
stb_buf6 | gqeJoin’s temporal buffer for storing small table. |
stb_buf7 | gqeJoin’s temporal buffer for storing small table. |
gqePart¶
#include "gqe_part.hpp"
void gqePart ( const int k_depth, const int col_index, const int bit_num, ap_uint <8*4*16> buf_A [], ap_uint <8*4*16> buf_B [], ap_uint <8*4*16> buf_D [] )
GQE partition kernel.
Parameters:
k_depth | depth of each hash bucket in URAM |
col_index | index of input column |
bit_num | number of defined partition, log2(number of partition) |
buf_A | input table buffer |
buf_B | output table buffer |
buf_D | configuration buffer |