Primitive APIs in xf::database

aggregate

aggregate overload (1)

#include "xf_database/aggregate.hpp"
template <
    AggregateOp op,
    typename T
    >
void aggregate (
    hls::stream <T>& in_strm,
    hls::stream <bool>& in_e_strm,
    hls::stream <T>& out_strm,
    hls::stream <bool>& out_e_strm
    )

Overload for most common aggregations.

As shown below in the parameters, this function can calculate one of a range of statistics, including minimal, maximal, average (mean), variance, L1 norm, L2 norm. It can also calculate the sum and count.

The limitation in this function is that the output data type must match with the input data type. In some cases, the sum or count may overflow the output type, but it can be safely covered by other aggregation overloads.

Note that minimum, maximum, sum, count, number of non-zero, L1 norm as well as L2 norm aggregate functions will all be returned as zero when the input is empty.

For group-by aggregation, please refer to the hashGroupAggregateMPU primitive.

Parameters:

op the aggregate operator: AOP_SUM, AOP_MAX, AOP_MIN, AOP_MEAN, AOP_VARIANCE, AOP_NORML1 or AOP_NORML2
T the data type of input and output streams
in_strm input data stream
in_e_strm end flag stream for input data
out_strm output data stream
out_e_strm end flag stream for output data

aggregate overload (2)

#include "xf_database/aggregate.hpp"
template <
    AggregateOp op,
    typename T,
    typename T2
    >
void aggregate (
    hls::stream <T>& in_strm,
    hls::stream <bool>& in_e_strm,
    hls::stream <T2>& out_strm,
    hls::stream <bool>& out_e_strm
    )

Aggregate function overload for SUM operation.

The output type can be inferred to be different from input type, this allows the sum value to have more precision bits than input, and avoid overflow.

Note that sum aggregate function will be returned as zero when the input is empty.

For group-by aggregation, please refer to the hashGroupAggregateMPU primitive.

Parameters:

op the aggregate operator: AOP_SUM
T the data type of input stream, inferred from argument
T2 the data type of output stream, inferred from argument
in_strm input data stream
in_e_strm end flag stream for input data
out_strm output data stream
out_e_strm end flag stream for output data

aggregate overload (3)

#include "xf_database/aggregate.hpp"
template <
    AggregateOp op,
    typename T
    >
void aggregate (
    hls::stream <T>& in_strm,
    hls::stream <bool>& in_e_strm,
    hls::stream <uint64_t>& out_strm,
    hls::stream <bool>& out_e_strm
    )

Aggregate function overload for counting.

This function counts the number of input rows, or number of non-zero input rows, and returns the count as uint64_t value.

Note that count aggregate function will be returned as zero when the input is empty.

For group-by aggregation, please refer to the hashGroupAggregateMPU primitive.

Parameters:

op the aggregate operator: AOP_COUNT or AOP_COUNTNONZEROS
T the data type of input stream, inferred from argument
in_strm input data stream
in_e_strm end flag stream for input data
out_strm output data stream
out_e_strm end flag stream for output data

bitonicSort

#include "xf_database/bitonic_sort.hpp"
template <
    typename Key_Type,
    int BitonicSortNumber
    >
void bitonicSort (
    hls::stream <Key_Type>& kin_strm,
    hls::stream <bool>& kin_strm_end,
    hls::stream <Key_Type>& kout_strm,
    hls::stream <bool>& kout_strm_end,
    bool order
    )

Bitonic sort is parallel algorithm for sorting.

This algorithms can sort a large vector of data in parallel, and by cascading the sorters into a network it can offer good theoretical throughput.

Although this algorithms is suitable for FPGA acceleration, it does not work well with the row-by-row streaming interface in database library. Please consider this primitive as a demo, and only use it by deriving from this code. Alternative sorting algorithms in this library are insertSort and mergeSort .

Parameters:

Key_Type the input and output key type
BitonicSortNumber the parallel number
kin_strm input key stream
kin_strm_end end flag stream for input key
kout_strm output key stream
kout_strm_end end flag stream for output key
order 1 for ascending or 0 for descending sort

bfGen

#include "xf_database/bloom_filter.hpp"
template <
    bool IS_BRAM,
    int STR_IN_W,
    int BV_W
    >
void bfGen (
    hls::stream <ap_uint <STR_IN_W>>& msg_strm,
    hls::stream <bool>& in_e_strm,
    ap_uint <IS_BRAM?16:72>* bit_vector_ptr0,
    ap_uint <IS_BRAM?16:72>* bit_vector_ptr1,
    ap_uint <IS_BRAM?16:72>* bit_vector_ptr2
    )

Generate the bloomfilter in on-chip RAM blocks.

This primitive calculates hash of input values, and marks corresponding bits in the on-chip RAM blocks. RAM blocks can be configured to be 18-bit BRAM or 72-bit URAM.

The bloom-filter bit vectors are passed as three pointers, and behind the scene, one hash value is calculated and manipulated into three distint marker locatins in these vectors.

To check for existance of a value with generated vector, use the bfCheck primitive.

Parameters:

STR_IN_W W width of the streamed input message, e.g., W=512.
BV_W width of the hash value. ptr0, ptr1 and ptr2 should point at MEM_SPACE=2^BV_W (bit).
msg_strm input message stream.
in_e_strm the flag that indicate the end of input message stream.
bit_vector_ptr0 the pointer of bit_vector0.
bit_vector_ptr1 the pointer of bit_vector1.
bit_vector_ptr2 the pointer of bit_vector2.

bfGenStream

#include "xf_database/bloom_filter.hpp"
template <
    bool IS_BRAM,
    int STR_IN_W,
    int BV_W
    >
void bfGenStream (
    hls::stream <ap_uint <STR_IN_W>>& msg_strm,
    hls::stream <bool>& in_e_strm,
    hls::stream <ap_uint <IS_BRAM?16:64>>& bit_vet_strm,
    hls::stream <bool>& out_e_strm
    )

Generate the bloomfilter in on-chip RAM blocks, and emit the vectors upon finish.

This primitive calculates hash values of input, and marks corresponding bits in the on-chip RAM blocks. RAM blocks can be configured to be 18-bit BRAM or 72-bit URAM.

The bloom-filter bit vectors are built into internally allocated buffers, and streamed out after the filter has been fully built.

Parameters:

STR_IN_W W width of the streamed input message, e.g., W=512.
BV_W width of the hash value. bit_vet_strm should send out MEM_SPACE=2^BV_W (bit) data in total.
msg_strm input message stream.
in_e_strm the flag that indicate the end of input message stream.
bit_vet_strm the output stream of bit_vector.
out_e_strm the flag that indicate the end of output stream.

bfCheck

#include "xf_database/bloom_filter.hpp"
template <
    bool IS_BRAM,
    int STR_IN_W,
    int BV_W
    >
void bfCheck (
    hls::stream <ap_uint <STR_IN_W>>& msg_strm,
    hls::stream <bool>& in_e_strm,
    ap_uint <IS_BRAM?16:72>* bit_vector_ptr0,
    ap_uint <IS_BRAM?16:72>* bit_vector_ptr1,
    ap_uint <IS_BRAM?16:72>* bit_vector_ptr2,
    hls::stream <bool>& out_v_strm,
    hls::stream <bool>& out_e_strm
    )

Check existance of value using bloom-filter vectors.

This primitive is designed to work with the bloom-filter vectors generated by the bfGen primitive. Basically, it detects the existance of value by hashing it and check for the corresponding vector bits. When hit, it is likely to be in the set of generating values, otherwise, it cannot be element of the set. RAM blocks can be configured to be 18-bit BRAM or 72-bit URAM, the setting must match bfGen .

Parameters:

IS_BRAM choose which types of memory to use. True for BRAM. False for URAM
STR_IN_W W width of the streamed input message, e.g., W=512.
BV_W width of the hash value. ptr0, ptr1 and ptr2 should point at MEM_SPACE=2^BV_W (bit).
msg_strm input message stream.
in_e_strm the flag that indicate the end of input message stream.
bit_vector_ptr0 the pointer of bit_vector0.
bit_vector_ptr1 the pointer of bit_vector1.
bit_vector_ptr2 the pointer of bit_vector2.
out_v_strm the output stream that indicate whether the value may exist <1 for true, 0 for false>.
out_e_strm the output end flag stream.

combineCol

combineCol overload (1)

#include "xf_database/combine_split_col.hpp"
template <
    int _WCol1,
    int _WCol2,
    int _WColOut
    >
void combineCol (
    hls::stream <ap_uint <_WCol1>>& din1_strm,
    hls::stream <ap_uint <_WCol2>>& din2_strm,
    hls::stream <bool>& in_e_strm,
    hls::stream <ap_uint <_WColOut>>& dout_strm,
    hls::stream <bool>& out_e_strm
    )

Combines two columns into one.

Columns are passed through streams of certain width in hardware. Normally, each column uses one stream, but for some primitives, the processing semantic abstract the columns into a couple of groups, and trait each group as a whole. To make calling such primitives easier, the combine column primitive fuses data of same row but different columns into one wide column.

The counter part of this primitive is splitCol .

Parameters:

_WCol1 the width of 1st input stream.
_WCol2 the width of 2nd input stream.
_WColOut the width of output stream.
din1_strm 1st input data stream.
din2_strm 2nd input data stream.
in_e_strm end flag stream for input data.
dout_strm output data stream.
out_e_strm end flag stream for output data.

combineCol overload (2)

#include "xf_database/combine_split_col.hpp"
template <
    int _WCol1,
    int _WCol2,
    int _WCol3,
    int _WColOut
    >
void combineCol (
    hls::stream <ap_uint <_WCol1>>& din1_strm,
    hls::stream <ap_uint <_WCol2>>& din2_strm,
    hls::stream <ap_uint <_WCol3>>& din3_strm,
    hls::stream <bool>& in_e_strm,
    hls::stream <ap_uint <_WColOut>>& dout_strm,
    hls::stream <bool>& out_e_strm
    )

Combines three columns into one.

Columns are passed through streams of certain width in hardware. Normally, each column uses one stream, but for some primitives, the processing semantic abstract the columns into a couple of groups, and trait each group as a whole. To make calling such primitives easier, the combine column primitive fuses data of same row but different columns into one wide column.

The counter part of this primitive is splitCol .

Parameters:

_WCol1 the width of 1st input stream.
_WCol2 the width of 2nd input stream.
_WCol3 the width of 3rd input stream.
_WColOut the width of output stream.
din1_strm 1st input data stream.
din2_strm 2nd input data stream.
din3_strm 3rd input data stream.
in_e_strm end flag stream for input data.
dout_strm output data stream.
out_e_strm end flag stream for output data.

combineCol overload (3)

#include "xf_database/combine_split_col.hpp"
template <
    int _WCol1,
    int _WCol2,
    int _WCol3,
    int _WCol4,
    int _WColOut
    >
void combineCol (
    hls::stream <ap_uint <_WCol1>>& din1_strm,
    hls::stream <ap_uint <_WCol2>>& din2_strm,
    hls::stream <ap_uint <_WCol3>>& din3_strm,
    hls::stream <ap_uint <_WCol4>>& din4_strm,
    hls::stream <bool>& in_e_strm,
    hls::stream <ap_uint <_WColOut>>& dout_strm,
    hls::stream <bool>& out_e_strm
    )

Combines four columns into one.

Columns are passed through streams of certain width in hardware. Normally, each column uses one stream, but for some primitives, the processing semantic abstract the columns into a couple of groups, and trait each group as a whole. To make calling such primitives easier, the combine column primitive fuses data of same row but different columns into one wide column.

The counter part of this primitive is splitCol .

Parameters:

_WCol1 the width of 1st input stream.
_WCol2 the width of 2nd input stream.
_WCol3 the width of 3rd input stream.
_WCol4 the width of 4th input stream.
_WColOut the width of output stream.
din1_strm 1st input data stream.
din2_strm 2nd input data stream.
din3_strm 3rd input data stream.
din4_strm 4th input data stream.
in_e_strm end flag stream for input data.
dout_strm output data stream.
out_e_strm end flag stream for output data.

combineCol overload (4)

#include "xf_database/combine_split_col.hpp"
template <
    int _WCol1,
    int _WCol2,
    int _WCol3,
    int _WCol4,
    int _WCol5,
    int _WColOut
    >
void combineCol (
    hls::stream <ap_uint <_WCol1>>& din1_strm,
    hls::stream <ap_uint <_WCol2>>& din2_strm,
    hls::stream <ap_uint <_WCol3>>& din3_strm,
    hls::stream <ap_uint <_WCol4>>& din4_strm,
    hls::stream <ap_uint <_WCol5>>& din5_strm,
    hls::stream <bool>& in_e_strm,
    hls::stream <ap_uint <_WColOut>>& dout_strm,
    hls::stream <bool>& out_e_strm
    )

Combines five columns into one.

Columns are passed through streams of certain width in hardware. Normally, each column uses one stream, but for some primitives, the processing semantic abstract the columns into a couple of groups, and trait each group as a whole. To make calling such primitives easier, the combine column primitive fuses data of same row but different columns into one wide column.

The counter part of this primitive is splitCol .

Parameters:

_WCol1 the width of 1st input stream.
_WCol2 the width of 2nd input stream.
_WCol3 the width of 3rd input stream.
_WCol4 the width of 4th input stream.
_WCol5 the width of 5th input stream.
_WColOut the width of output stream.
din1_strm 1st input data stream.
din2_strm 2nd input data stream.
din3_strm 3rd input data stream.
din4_strm 4th input data stream.
din5_strm 5th input data stream.
in_e_strm end flag stream for input data.
dout_strm output data stream.
out_e_strm end flag stream for output data.

splitCol

splitCol overload (1)

#include "xf_database/combine_split_col.hpp"
template <
    int _WColIn,
    int _WCol1,
    int _WCol2
    >
void splitCol (
    hls::stream <ap_uint <_WColIn>>& din_strm,
    hls::stream <bool>& in_e_strm,
    hls::stream <ap_uint <_WCol1>>& dout1_strm,
    hls::stream <ap_uint <_WCol2>>& dout2_strm,
    hls::stream <bool>& out_e_strm
    )

Split previously combined columns into two.

Columns are passed through streams of certain width in hardware. Normally, each column uses one stream, but for some primitives, the processing semantic abstract the columns into a couple of groups, and trait each group as a whole. To make calling such primitives easier, the split column primitive breaks the wide output stream into independent column-specific streams.

The counter part of this primitive is combineCol .

Parameters:

_WColIn the width of input stream.
_WCol1 the width of 1st output stream.
_WCol2 the width of 2nd output stream.
din_strm input data stream.
in_e_strm end flag stream for input data.
dout1_strm 1st output data stream.
dout2_strm 2nd output data stream.
out_e_strm end flag stream for output data.

splitCol overload (2)

#include "xf_database/combine_split_col.hpp"
template <
    int _WColIn,
    int _WCol1,
    int _WCol2,
    int _WCol3
    >
void splitCol (
    hls::stream <ap_uint <_WColIn>>& din_strm,
    hls::stream <bool>& in_e_strm,
    hls::stream <ap_uint <_WCol1>>& dout1_strm,
    hls::stream <ap_uint <_WCol2>>& dout2_strm,
    hls::stream <ap_uint <_WCol3>>& dout3_strm,
    hls::stream <bool>& out_e_strm
    )

Split previously combined columns into three.

Columns are passed through streams of certain width in hardware. Normally, each column uses one stream, but for some primitives, the processing semantic abstract the columns into a couple of groups, and trait each group as a whole. To make calling such primitives easier, the split column primitive breaks the wide output stream into independent column-specific streams.

The counter part of this primitive is combineCol .

Parameters:

_WColIn the width of input stream.
_WCol1 the width of 1st output stream.
_WCol2 the width of 2nd output stream.
_WCol3 the width of 3rd output stream.
din_strm input data stream
in_e_strm end flag stream for input data
dout1_strm 1st output data stream
dout2_strm 2nd output data stream
dout3_strm 3rd output data stream
out_e_strm end flag stream for output data

splitCol overload (3)

#include "xf_database/combine_split_col.hpp"
template <
    int _WColIn,
    int _WCol1,
    int _WCol2,
    int _WCol3,
    int _WCol4
    >
void splitCol (
    hls::stream <ap_uint <_WColIn>>& din_strm,
    hls::stream <bool>& in_e_strm,
    hls::stream <ap_uint <_WCol1>>& dout1_strm,
    hls::stream <ap_uint <_WCol2>>& dout2_strm,
    hls::stream <ap_uint <_WCol3>>& dout3_strm,
    hls::stream <ap_uint <_WCol4>>& dout4_strm,
    hls::stream <bool>& out_e_strm
    )

Split previously combined columns into four.

Columns are passed through streams of certain width in hardware. Normally, each column uses one stream, but for some primitives, the processing semantic abstract the columns into a couple of groups, and trait each group as a whole. To make calling such primitives easier, the split column primitive breaks the wide output stream into independent column-specific streams.

The counter part of this primitive is combineCol .

Parameters:

_WColIn the width of input stream.
_WCol1 the width of 1st output stream.
_WCol2 the width of 2nd output stream.
_WCol3 the width of 3rd output stream.
_WCol4 the width of 4th output stream.
din_strm input data stream
in_e_strm end flag stream for input data
dout1_strm 1st output data stream
dout2_strm 2nd output data stream
dout3_strm 3rd output data stream
dout4_strm 4th output data stream
out_e_strm end flag stream for output data

splitCol overload (4)

#include "xf_database/combine_split_col.hpp"
template <
    int _WColIn,
    int _WCol1,
    int _WCol2,
    int _WCol3,
    int _WCol4,
    int _WCol5
    >
void splitCol (
    hls::stream <ap_uint <_WColIn>>& din_strm,
    hls::stream <bool>& in_e_strm,
    hls::stream <ap_uint <_WCol1>>& dout1_strm,
    hls::stream <ap_uint <_WCol2>>& dout2_strm,
    hls::stream <ap_uint <_WCol3>>& dout3_strm,
    hls::stream <ap_uint <_WCol4>>& dout4_strm,
    hls::stream <ap_uint <_WCol5>>& dout5_strm,
    hls::stream <bool>& out_e_strm
    )

Split previously combined columns into five.

Columns are passed through streams of certain width in hardware. Normally, each column uses one stream, but for some primitives, the processing semantic abstract the columns into a couple of groups, and trait each group as a whole. To make calling such primitives easier, the split column primitive breaks the wide output stream into independent column-specific streams.

The counter part of this primitive is combineCol .

Parameters:

_WColIn the width of input stream.
_WCol1 the width of 1st output stream.
_WCol2 the width of 2nd output stream.
_WCol3 the width of 3rd output stream.
_WCol4 the width of 4th output stream.
_WCol5 the width of 5th output stream.
din_strm input data stream
in_e_strm end flag stream for input data
dout1_strm 1st output data stream
dout2_strm 2nd output data stream
dout3_strm 3rd output data stream
dout4_strm 4th output data stream
dout5_strm 5th output data stream
out_e_strm end flag stream for output data

compoundSort

#include "xf_database/compound_sort.hpp"
template <
    typename KEY_TYPE,
    int SORT_LEN,
    int INSERT_LEN
    >
void compoundSort (
    bool order,
    hls::stream <KEY_TYPE>& inKeyStrm,
    hls::stream <bool>& inEndStrm,
    hls::stream <KEY_TYPE>& outKeyStrm,
    hls::stream <bool>& outEndStrm
    )

compoundSort sort the key based on insert sort and merge sort.

Parameters:

KEY_TYPE key type
SORT_LEN Maximum support sort length, between 16K to 2M, but it must be an integer power of 2.
INSERT_LEN insert sort length, maximum length 1024 (recommend)
order 1:sort ascending 0:sort descending
inKeyStrm input key stream
inEndStrm end flag stream for input key
outKeyStrm output key-sorted stream
outEndStrm end flag stream for output key

directGroupAggregate

directGroupAggregate overload (1)

#include "xf_database/direct_group_aggregate.hpp"
template <
    int op,
    int DATINW,
    int DATOUTW,
    int DIRECTW
    >
void directGroupAggregate (
    hls::stream <ap_uint <DATINW>>& vin_strm,
    hls::stream <bool>& in_e_strm,
    hls::stream <ap_uint <DATOUTW>>& vout_strm,
    hls::stream <bool>& out_e_strm,
    hls::stream <ap_uint <DIRECTW>>& kin_strm,
    hls::stream <ap_uint <DIRECTW>>& kout_strm
    )

Group-by aggregation with limited key width.

This primitive is suitable for scenario in which the width of group key is limited, so that a on-chip array directly addressed by the key can be created to store the aggregation value. The total storage required is row size * (2 ^ key width) .

The following aggregate operators are supported:

  • AOP_MAX
  • AOP_MIN
  • AOP_SUM
  • AOP_COUNT
  • AOP_MEAN
  • AOP_VARIANCE
  • AOP_NORML1
  • AOP_NORML2

The return value is typed the same as the input payload value.

Caution

Attention should be paid for overflow in sum or count.

Parameters:

op the aggregate operator, as defined in AggregateOp enum.
DATINW the width of input payload
DATOUTW the width of output aggr-payload
DIRECTW the width of input and output key
vin_strm value input
in_e_strm end flag stream for input data
vout_strm value output
out_e_strm end flag stream for output data
kin_strm group-by key input
kout_strm group-by key output

directGroupAggregate overload (2)

#include "xf_database/direct_group_aggregate.hpp"
template <
    int DATINW,
    int DATOUTW,
    int DIRECTW
    >
void directGroupAggregate (
    ap_uint <32> op,
    hls::stream <ap_uint <DATINW>>& vin_strm,
    hls::stream <bool>& in_e_strm,
    hls::stream <ap_uint <DATOUTW>>& vout_strm,
    hls::stream <bool>& out_e_strm,
    hls::stream <ap_uint <DIRECTW>>& kin_strm,
    hls::stream <ap_uint <DIRECTW>>& kout_strm
    )

Group-by aggregation with limited key width, runtime programmable.

This primitive is suitable for scenario in which the width of group key is limited, so that a on-chip array directly addressed by the key can be created to store the aggregation value. The total storage required is row size * (2 ^ key width) .

The following aggregate operators are supported:

  • AOP_MAX
  • AOP_MIN
  • AOP_SUM
  • AOP_COUNT
  • AOP_MEAN
  • AOP_NORM1

The return value is typed the same as the input payload value.

Caution

Attention should be paid for overflow in sum or count.

Parameters:

DATINW the width of input payload
DATOUTW the width of output aggr-payload
DIRECTW the width of input and output key
op the aggregate operator, as defined in AggregateOp enum.
vin_strm value input
in_e_strm end flag stream for input data
vout_strm value output
out_e_strm end flag stream for output data
kin_strm group-by key input
kout_strm group-by key output

duplicateCol

#include "xf_database/duplicate_col.hpp"
template <int W>
void duplicateCol (
    hls::stream <ap_uint <W>>& d_in_strm,
    hls::stream <bool>& e_in_strm,
    hls::stream <ap_uint <W>>& d0_out_strm,
    hls::stream <ap_uint <W>>& d1_out_strm,
    hls::stream <bool>& e_out_strm
    )

Duplicate one column into two columns.

Parameters:

W column data width in bits.
d_in_strm input data stream.
e_in_strm end flag for input data.
d0_out_strm output data stream 0.
d1_out_strm output data stream 1.
e_out_strm end flag for output data.

dynamicEval

#include "xf_database/dynamic_eval.hpp"
template <
    typename TStrm1,
    typename TStrm2,
    typename TStrm3,
    typename TStrm4,
    typename TConst1,
    typename TConst2,
    typename TConst3,
    typename TConst4,
    typename TOut
    >
void dynamicEval (
    ap_uint <289> config,
    hls::stream <TStrm1>& strm_in1,
    hls::stream <TStrm2>& strm_in2,
    hls::stream <TStrm3>& strm_in3,
    hls::stream <TStrm4>& strm_in4,
    hls::stream <bool>& strm_in_end,
    hls::stream <TOut>& strm_out,
    hls::stream <bool>& strm_out_end
    )

Dynamic expression evaluation.

This primitive has four fixed number of column inputs, and allows up to four constants to be specified via configuration. The operation between the column values and constants can be defined dynamically through the configuration at run-time. The same configuration is used for all rows until the end of input.

The constant numbers are assumed to be no more than 32-bits.

For the definition of the config word, please refer to the “Design Internal” Section of the document and the corresponding test in L1/tests .

Parameters:

TStrm1 Type of input Stream1
TStrm2 Type of input Stream2
TStrm3 Type of input Stream3
TStrm4 Type of input Stream4
TConst1 Type of input Constant1
TConst2 Type of input Constant2
TConst3 Type of input Constant3
TConst4 Type of input Constant4
TOut Type of Compute Result
config configuration bits of ops and constants.
strm_in1 input Stream1
strm_in2 input Stream2
strm_in3 input Stream3
strm_in4 input Stream4
strm_in_end end flag of input stream
strm_out output Stream
strm_out_end end flag of output stream

dynamicEvalV2

#include "xf_database/dynamic_eval_v2.hpp"
template <typename T>
void dynamicEvalV2 (
    hls::stream <ap_uint <32>>& cfgs,
    hls::stream <T>& col0_istrm,
    hls::stream <T>& col1_istrm,
    hls::stream <T>& col2_istrm,
    hls::stream <T>& col3_istrm,
    hls::stream <bool>& e_istrm,
    hls::stream <T>& ret_ostrm,
    hls::stream <bool>& e_ostrm
    )

Dynamic expression evaluation version 2.

This primitive has four fixed number of column inputs, and allows up to four constants to be specified via configuration. The operation between the column values and constants can be defined dynamically through the configuration at run-time. The same configuration is used for all rows until the end of input.

The constant numbers are assumed to be no more than 32-bits.

Parameters:

T Type of input streams
cfgs configuration bits of ops and constants.
col0_istrm input Stream1
col1_istrm input Stream2
col2_istrm input Stream3
col3_istrm input Stream4
e_istrm end flag of input stream
ret_ostrm output Stream
e_ostrm end flag of output stream

dynamicFilter

dynamicFilter overload (1)

#include "xf_database/dynamic_filter.hpp"
template <
    int W,
    int WP
    >
void dynamicFilter (
    hls::stream <ap_uint <32>>& filter_cfg_strm,
    hls::stream <ap_uint <W>>& v0_strm,
    hls::stream <ap_uint <W>>& v1_strm,
    hls::stream <ap_uint <W>>& v2_strm,
    hls::stream <ap_uint <W>>& v3_strm,
    hls::stream <ap_uint <WP>>& pay_in_strm,
    hls::stream <bool>& e_in_strm,
    hls::stream <ap_uint <WP>>& pay_out_strm,
    hls::stream <bool>& e_pay_out_strm
    )

Filter payloads according to conditions set during run-time.

This primitive, with its 3 overloads, supports filtering rows using up to four columns as conditions. The payload columns should be grouped together into this primitive, using combineCol primitive, and its total width is not explicitly limited (but naturally bound by resources).

The filter conditions consists of whether each of the conditions is within a given range, and relations between any two conditions. The configuration is set once before processing the rows, and reused until the last row. For configuration generation, please refer to the “Design Internals” Section of the document and corresponding test case of this primitive.

Parameters:

W width of all condition column streams, in bits.
WP width of payload column, in bits.
filter_cfg_strm stream of raw config bits for this primitive.
v0_strm condition column stream 0.
v1_strm condition column stream 1.
v2_strm condition column stream 2.
v3_strm condition column stream 3.
pay_in_strm payload input stream.
e_in_strm end flag stream for input table.
pay_out_strm payload output stream.
e_pay_out_strm end flag stream for payload output.

dynamicFilter overload (2)

#include "xf_database/dynamic_filter.hpp"
template <
    int W,
    int WP
    >
void dynamicFilter (
    hls::stream <ap_uint <32>>& filter_cfg_strm,
    hls::stream <ap_uint <W>>& v0_strm,
    hls::stream <ap_uint <W>>& v1_strm,
    hls::stream <ap_uint <W>>& v2_strm,
    hls::stream <ap_uint <WP>>& pay_in_strm,
    hls::stream <bool>& e_in_strm,
    hls::stream <ap_uint <WP>>& pay_out_strm,
    hls::stream <bool>& e_pay_out_strm
    )

Filter payloads according to conditions set during run-time.

This function is a wrapper-around the four-condition-column dynamic_filter, just duplicating the columns to feed all its inputs. Thus they share the same configuration bit pattern. All op related to the 4th column should be set to FOP_DC .

Parameters:

W width of all condition column streams, in bits.
WP width of payload column, in bits.
filter_cfg_strm stream of raw config bits for this primitive.
v0_strm condition column stream 0.
v1_strm condition column stream 1.
v2_strm condition column stream 2.
pay_in_strm payload input stream.
e_in_strm end flag stream for input table.
pay_out_strm payload output stream.
e_pay_out_strm end flag stream for payload output.

dynamicFilter overload (3)

#include "xf_database/dynamic_filter.hpp"
template <
    int W,
    int WP
    >
void dynamicFilter (
    hls::stream <ap_uint <32>>& filter_cfg_strm,
    hls::stream <ap_uint <W>>& v0_strm,
    hls::stream <ap_uint <W>>& v1_strm,
    hls::stream <ap_uint <WP>>& pay_in_strm,
    hls::stream <bool>& e_in_strm,
    hls::stream <ap_uint <WP>>& pay_out_strm,
    hls::stream <bool>& e_pay_out_strm
    )

Filter payloads according to conditions set during run-time.

This function is a wrapper-around the four-condition-column dynamic_filter, just duplicating the columns to feed all its inputs. Thus they share the same configuration bit pattern. All op related to the 3rd and 4th columns should be set to FOP_DC .

Parameters:

W width of all condition column streams, in bits.
WP width of payload column, in bits.
filter_cfg_strm stream of raw config bits for this primitive.
v0_strm condition column stream 0.
v1_strm condition column stream 1.
pay_in_strm payload input stream.
e_in_strm end flag stream for input table.
pay_out_strm payload output stream.
e_pay_out_strm end flag stream for payload output.

dynamicFilter overload (4)

#include "xf_database/dynamic_filter.hpp"
template <
    int W,
    int WP
    >
void dynamicFilter (
    hls::stream <ap_uint <32>>& filter_cfg_strm,
    hls::stream <ap_uint <W>>& v0_strm,
    hls::stream <ap_uint <WP>>& pay_in_strm,
    hls::stream <bool>& e_in_strm,
    hls::stream <ap_uint <WP>>& pay_out_strm,
    hls::stream <bool>& e_pay_out_strm
    )

Filter payloads according to conditions set during run-time.

This function is a wrapper-around the four-condition-column dynamic_filter, just duplicating the columns to feed all its inputs. Thus they share the same configuration bit pattern. All op related to the 2nd to 4th columns should be set to FOP_DC .

Parameters:

W width of all condition column streams, in bits.
WP width of payload column, in bits.
filter_cfg_strm stream of raw config bits for this primitive.
v0_strm condition column stream 0.
pay_in_strm payload input stream.
e_in_strm end flag stream for input table.
pay_out_strm payload output stream.
e_pay_out_strm end flag stream for payload output.

groupAggregate

groupAggregate overload (1)

#include "xf_database/group_aggregate.hpp"
template <
    AggregateOp op,
    typename T,
    typename KEY_T
    >
void groupAggregate (
    hls::stream <T>& din_strm,
    hls::stream <bool>& in_e_strm,
    hls::stream <T>& dout_strm,
    hls::stream <bool>& out_e_strm,
    hls::stream <KEY_T>& kin_strm,
    hls::stream <KEY_T>& kout_strm
    )

group aggregate function that returns same type as input

Parameters:

op the aggregate operator: AOP_MAX, AOP_MIN, AOP_MEAN, AOP_VARIANCE, AOP_NORML1 or AOP_NORML2
T the data type of input and output streams
KEY_T the input and output indexing key type
din_strm input data stream
in_e_strm end flag stream for input data
dout_strm output data stream
out_e_strm end flag stream for output data
kin_strm input indexing key stream
kout_strm output indexing key stream

groupAggregate overload (2)

#include "xf_database/group_aggregate.hpp"
template <
    AggregateOp op,
    typename T,
    typename T2,
    typename KEY_T
    >
void groupAggregate (
    hls::stream <T>& in_strm,
    hls::stream <bool>& in_e_strm,
    hls::stream <T2>& out_strm,
    hls::stream <bool>& out_e_strm,
    hls::stream <KEY_T>& kin_strm,
    hls::stream <KEY_T>& kout_strm
    )

group aggregate function that returns different type as input

Parameters:

op the aggregate operator: AOP_SUM
T the input stream type, inferred from argument
T2 the output stream type, inferred from argument
KEY_T the input and output stream type, inferred from argument
in_strm input data stream
in_e_strm end flag stream for input data
out_strm output data stream
out_e_strm end flag stream for output data
kin_strm input indexing key stream
kout_strm output indexing key stream

groupAggregate overload (3)

#include "xf_database/group_aggregate.hpp"
template <
    AggregateOp op,
    typename T,
    typename KEY_T
    >
void groupAggregate (
    hls::stream <T>& in_strm,
    hls::stream <bool>& in_e_strm,
    hls::stream <uint64_t>& out_strm,
    hls::stream <bool>& out_e_strm,
    hls::stream <KEY_T>& kin_strm,
    hls::stream <KEY_T>& kout_strm
    )

aggregate function that counts and returns uint64_t

Parameters:

op the aggregate operator: AOP_COUNT or AOP_COUNTNONZEROS
T the input stream type, inferred from argument
KEY_T the input and output stream type, inferred from argument
in_strm input data stream
in_e_strm end flag stream for input data
out_strm output data stream
out_e_strm end flag stream for output data
kin_strm input indexing key stream
kout_strm output indexing key stream

groupAggregate overload (4)

#include "xf_database/group_aggregate.hpp"
template <
    AggregateOp op,
    typename T,
    typename KEY_T
    >
void groupAggregate (
    hls::stream <T>& in_strm,
    hls::stream <bool>& isnull_strm,
    hls::stream <bool>& in_e_strm,
    hls::stream <uint64_t>& out_strm,
    hls::stream <bool>& out_e_strm,
    hls::stream <KEY_T>& kin_strm,
    hls::stream <KEY_T>& kout_strm
    )

aggregate function that counts and returns uint64_t

Parameters:

op the aggregate operator: AOP_COUNT
T the input stream type, inferred from argument
KEY_T the input and output stream type, inferred from argument
in_strm input data stream
isnull_strm flag to indicate the input data is null or not
in_e_strm end flag stream for input data
out_strm output data stream
out_e_strm end flag stream for output data
kin_strm input indexing key stream
kout_strm output indexing key stream

hashAntiJoin

#include "xf_database/hash_anti_join.hpp"
template <
    int HASH_MODE,
    int KEYW,
    int PW,
    int S_PW,
    int B_PW,
    int HASHWH,
    int HASHWL,
    int ARW,
    int CH_NM
    >
void hashAntiJoin (
    hls::stream <ap_uint <KEYW>> k0_strm_arry [CH_NM],
    hls::stream <ap_uint <PW>> p0_strm_arry [CH_NM],
    hls::stream <bool> e0_strm_arry [CH_NM],
    ap_uint <256>* htb0_buf,
    ap_uint <256>* htb1_buf,
    ap_uint <256>* htb2_buf,
    ap_uint <256>* htb3_buf,
    ap_uint <256>* htb4_buf,
    ap_uint <256>* htb5_buf,
    ap_uint <256>* htb6_buf,
    ap_uint <256>* htb7_buf,
    ap_uint <256>* stb0_buf,
    ap_uint <256>* stb1_buf,
    ap_uint <256>* stb2_buf,
    ap_uint <256>* stb3_buf,
    ap_uint <256>* stb4_buf,
    ap_uint <256>* stb5_buf,
    ap_uint <256>* stb6_buf,
    ap_uint <256>* stb7_buf,
    hls::stream <ap_uint <32>>& pu_begin_status_strms,
    hls::stream <ap_uint <32>>& pu_end_status_strms,
    hls::stream <ap_uint <KEYW+S_PW+B_PW>>& j_strm,
    hls::stream <bool>& j_e_strm
    )

Multi-PU Hash-Anti-Join primitive, using multiple DDR/HBM buffers.

This primitive shares most of the structure of hashJoinV3 , but performs anti-join instead of inner-join. Both inner and outer table should be send to this primitve once, starting with the inner table.

Parameters:

HASH_MODE 0 for radix and 1 for Jenkin’s Lookup3 hash.
KEYW width of key, in bit.
PW width of max payload, in bit.
S_PW width of payload of small table.
B_PW width of payload of big table.
HASHWH number of hash bits used for PU/buffer selection, 1~3.
HASHWL number of hash bits used for hash-table in PU.
ARW width of address, larger than 24 is suggested.
CH_NM number of input channels, 1,2,4.
k0_strm_arry input of key columns of both tables.
p0_strm_arry input of payload columns of both tables.
e0_strm_arry input of end signal of both tables.
htb0_buf HBM/DDR buffer of hash_table0
htb1_buf HBM/DDR buffer of hash_table1
htb2_buf HBM/DDR buffer of hash_table2
htb3_buf HBM/DDR buffer of hash_table3
htb4_buf HBM/DDR buffer of hash_table4
htb5_buf HBM/DDR buffer of hash_table5
htb6_buf HBM/DDR buffer of hash_table6
htb7_buf HBM/DDR buffer of hash_table7
stb0_buf HBM/DDR buffer of PU0
stb1_buf HBM/DDR buffer of PU1
stb2_buf HBM/DDR buffer of PU2
stb3_buf HBM/DDR buffer of PU3
stb4_buf HBM/DDR buffer of PU4
stb5_buf HBM/DDR buffer of PU5
stb6_buf HBM/DDR buffer of PU6
stb7_buf HBM/DDR buffer of PU7
pu_begin_status_strms the 1st element is the depth of each hash, the 2nd element is joined number
pu_end_status_strms  
j_strm output of joined result
j_e_strm end flag of joined result

hashGroupAggregate

#include "xf_database/hash_group_aggregate.hpp"
template <
    int _WKey,
    int _KeyNM,
    int _WPay,
    int _PayNM,
    int _HashMode,
    int _WHashHigh,
    int _WHashLow,
    int _CHNM,
    int _Wcnt,
    int _WBuffer,
    int _BurstLenW = 32,
    int _BurstLenR = 32
    >
void hashGroupAggregate (
    hls::stream <ap_uint <_WKey>> strm_key_in [_CHNM][_KeyNM],
    hls::stream <ap_uint <_WPay>> strm_pld_in [_CHNM][_PayNM],
    hls::stream <bool> strm_e_in [_CHNM],
    hls::stream <ap_uint <32>>& config,
    hls::stream <ap_uint <32>>& result_info,
    ap_uint <_WBuffer>* ping_buf0,
    ap_uint <_WBuffer>* ping_buf1,
    ap_uint <_WBuffer>* ping_buf2,
    ap_uint <_WBuffer>* ping_buf3,
    ap_uint <_WBuffer>* pong_buf0,
    ap_uint <_WBuffer>* pong_buf1,
    ap_uint <_WBuffer>* pong_buf2,
    ap_uint <_WBuffer>* pong_buf3,
    hls::stream <ap_uint <_WKey>> aggr_key_out [_KeyNM],
    hls::stream <ap_uint <_WPay>> aggr_pld_out [3][_PayNM],
    hls::stream <bool>& strm_e_out
    )

Generic hash group aggregate primitive.

With this primitive, the max number of lines of aggregate table is bound by the AXI buffer size.

The group aggregation values are updated inside the chip, and when a hash-bucket overflows, the overflowed rows are spilled into external buffers. The overflow buffer will be automatically re-scanned, and within each round, a number of distinct groups will be aggregated and emitted. This algorithm ends when the overflow buffer is empty and all groups are aggregated.

Attention

  1. This module can accept multiple input row of key and payload pair per cycle.
  2. The max distinct groups aggregated in one pass is 2 ^ (1 + _WHash).
  3. When the width of the input stream is not fully used, data should be aligned to the little-end.
  4. It is highly recommended to assign the ping buffer and pong buffer in different HBM banks, input and output in different DDR banks for a better performance.
  5. The max number of lines of aggregate table cannot bigger than the max DDR/HBM SIZE used in this design.
  6. When the bit-width of group key is known to be small, say 10-bit, please consider the directAggregate primitive, which offers smaller utilization, and requires no external buffer access.

Parameters:

_WKey width of key, in bit.
_KeyNM maximum number of key column, maximum is 8.
_WPay width of max payload, in bit.
_PayNM maximum number of payload column, maximum is 8.
_HashMode control hash algotithm, 0: radix 1: lookup3.
_WHashHigh number of hash bits used for dispatch pu.
_WHashLow number of hash bits used for hash-table.
_CHNM number of input channels.
_WBuffer width of HBM/DDR buffer(ping_buf and pong_buf).
_BurstLenW burst len of writting unhandled data.
_BurstLenR burst len of reloading unhandled data.
strm_key_in input of key streams.
strm_pld_in input of payload streams.
strm_e_in input of end signal.
config information for initializing primitive, contains op for maximum of 8 columns, key column number(less than 8), pld column number(less than 8) and initial aggregate cnt.
result_info result information at kernel end, contains op, key_column, pld_column and aggregate result cnt
ping_buf0 DDR/HBM ping buffer for unhandled data.
ping_buf1 DDR/HBM ping buffer for unhandled data.
ping_buf2 DDR/HBM ping buffer for unhandled data.
ping_buf3 DDR/HBM ping buffer for unhandled data.
pong_buf0 DDR/HBM pong buffer for unhandled data.
pong_buf1 DDR/HBM pong buffer for unhandled data.
pong_buf2 DDR/HBM pong buffer for unhandled data.
pong_buf3 DDR/HBM pong buffer for unhandled data.
aggr_key_out output of key columns.
aggr_pld_out output of pld columns. [0][*] is the result of min/max/cnt for pld columns, [1][*] is the low-bit value of sum/average, [2][*] is the hight-bit value of sum/average.
strm_e_out is the end signal of output.

hashJoinMPU

hashJoinMPU overload (1)

#include "xf_database/hash_join_v2.hpp"
template <
    int HASH_MODE,
    int KEYW,
    int PW,
    int S_PW,
    int B_PW,
    int HASHWH,
    int HASHWL,
    int ARW,
    int BFW,
    int CH_NM,
    int BF_W,
    int EN_BF
    >
static void hashJoinMPU (
    hls::stream <ap_uint <KEYW>> k0_strm_arry [CH_NM],
    hls::stream <ap_uint <PW>> p0_strm_arry [CH_NM],
    hls::stream <bool> e0_strm_arry [CH_NM],
    ap_uint <BFW>* stb0_buf,
    ap_uint <BFW>* stb1_buf,
    ap_uint <BFW>* stb2_buf,
    ap_uint <BFW>* stb3_buf,
    ap_uint <BFW>* stb4_buf,
    ap_uint <BFW>* stb5_buf,
    ap_uint <BFW>* stb6_buf,
    ap_uint <BFW>* stb7_buf,
    hls::stream <ap_uint <S_PW+B_PW>>& j1_strm,
    hls::stream <bool>& e5_strm
    )

Multi-PU Hash-Join primitive, using multiple DDR/HBM buffers.

The max number of lines of small table is 2M in this design. It is assumed that the hash-conflict is within 512 per bin.

This module can accept more than 1 input row per cycle, via multiple input channels. The small table and the big table shares the same input ports, so the width of the payload should be the max of both, while the data should be aligned to the little-end. The small table should be fed TWICE, followed by the big table once.

Parameters:

HASH_MODE 0 for radix and 1 for Jenkin’s Lookup3 hash.
KEYW width of key, in bit.
PW width of max payload, in bit.
S_PW width of payload of small table.
B_PW width of payload of big table.
HASHWH number of hash bits used for PU/buffer selection, 1~3.
HASHWL number of hash bits used for hash-table in PU.
ARW width of address, log2(small table max num of rows).
BFW width of buffer.
CH_NM number of input channels, 1,2,4.
BF_W bloom-filter hash width.
EN_BF bloom-filter switch, 0 for off, 1 for on.
k0_strm_arry input of key columns of both tables.
p0_strm_arry input of payload columns of both tables.
e0_strm_arry input of end signal of both tables.
stb0_buf HBM/DDR buffer of PU0
stb1_buf HBM/DDR buffer of PU1
stb2_buf HBM/DDR buffer of PU2
stb3_buf HBM/DDR buffer of PU3
stb4_buf HBM/DDR buffer of PU4
stb5_buf HBM/DDR buffer of PU5
stb6_buf HBM/DDR buffer of PU6
stb7_buf HBM/DDR buffer of PU7
j1_strm output of joined rows.
e5_strm end signal of joined rows.

hashJoinMPU overload (2)

#include "xf_database/hash_join_v2.hpp"
template <
    int HASH_MODE,
    int KEYW,
    int PW,
    int S_PW,
    int B_PW,
    int HASHWH,
    int HASHWL,
    int ARW,
    int BFW,
    int CH_NM
    >
void hashJoinMPU (
    hls::stream <ap_uint <KEYW>> k0_strm_arry [CH_NM],
    hls::stream <ap_uint <PW>> p0_strm_arry [CH_NM],
    hls::stream <bool> e0_strm_arry [CH_NM],
    ap_uint <BFW>* stb0_buf,
    ap_uint <BFW>* stb1_buf,
    ap_uint <BFW>* stb2_buf,
    ap_uint <BFW>* stb3_buf,
    ap_uint <BFW>* stb4_buf,
    ap_uint <BFW>* stb5_buf,
    ap_uint <BFW>* stb6_buf,
    ap_uint <BFW>* stb7_buf,
    hls::stream <ap_uint <S_PW+B_PW>>& j1_strm,
    hls::stream <bool>& e5_strm
    )

Multi-PU Hash-Join primitive, using multiple DDR/HBM buffers.

The max number of lines of small table is 8M in this design. It is assumed that the hash-conflict is within 512 per bin.

This module can accept more than 1 input row per cycle, via multiple input channels. The small table and the big table shares the same input ports, so the width of the payload should be the max of both, while the data should be aligned to the little-end. The small table should be fed TWICE, followed by the big table once.

Parameters:

HASH_MODE 0 for radix and 1 for Jenkin’s Lookup3 hash.
KEYW width of key, in bit.
PW width of max payload, in bit.
S_PW width of payload of small table.
B_PW width of payload of big table.
HASHWH number of hash bits used for PU/buffer selection, 1~3.
HASHWL number of hash bits used for hash-table in PU.
ARW width of address, log2(small table max num of rows).
BFW width of buffer.
CH_NM number of input channels, 1,2,4.
k0_strm_arry input of key columns of both tables.
p0_strm_arry input of payload columns of both tables.
e0_strm_arry input of end signal of both tables.
stb0_buf HBM/DDR buffer of PU0
stb1_buf HBM/DDR buffer of PU1
stb2_buf HBM/DDR buffer of PU2
stb3_buf HBM/DDR buffer of PU3
stb4_buf HBM/DDR buffer of PU4
stb5_buf HBM/DDR buffer of PU5
stb6_buf HBM/DDR buffer of PU6
stb7_buf HBM/DDR buffer of PU7
j1_strm output of joined rows.
e5_strm end signal of joined rows.

hashJoinV3

#include "xf_database/hash_join_v3.hpp"
template <
    int HASH_MODE,
    int KEYW,
    int PW,
    int S_PW,
    int B_PW,
    int HASHWH,
    int HASHWL,
    int ARW,
    int CH_NM
    >
void hashJoinV3 (
    hls::stream <ap_uint <KEYW>> k0_strm_arry [CH_NM],
    hls::stream <ap_uint <PW>> p0_strm_arry [CH_NM],
    hls::stream <bool> e0_strm_arry [CH_NM],
    ap_uint <256>* htb0_buf,
    ap_uint <256>* htb1_buf,
    ap_uint <256>* htb2_buf,
    ap_uint <256>* htb3_buf,
    ap_uint <256>* htb4_buf,
    ap_uint <256>* htb5_buf,
    ap_uint <256>* htb6_buf,
    ap_uint <256>* htb7_buf,
    ap_uint <256>* stb0_buf,
    ap_uint <256>* stb1_buf,
    ap_uint <256>* stb2_buf,
    ap_uint <256>* stb3_buf,
    ap_uint <256>* stb4_buf,
    ap_uint <256>* stb5_buf,
    ap_uint <256>* stb6_buf,
    ap_uint <256>* stb7_buf,
    hls::stream <ap_uint <32>>& pu_begin_status_strms,
    hls::stream <ap_uint <32>>& pu_end_status_strms,
    hls::stream <ap_uint <KEYW+S_PW+B_PW>>& j_strm,
    hls::stream <bool>& j_e_strm
    )

Hash-Join v3 primitive, it takes more resourse than hashJoinMPU and promises a better performance in large size of table.

The maximum size of small table is 256MBx8(HBM number)=2GB in this design. The total hash entries is equal to 1<<(HASHWH + HASHWL), and it is limitied to maximum of 1M entries because of the size of URAM in a single SLR.

This module can accept more than 1 input row per cycle, via multiple input channels. The small table and the big table shares the same input ports, so the width of the payload should be the max of both, while the data should be aligned to the little-end. To be different with hashJoinMPU , the small table and big table should be fed only once.

Parameters:

HASH_MODE 0 for radix and 1 for Jenkin’s Lookup3 hash.
KEYW width of key, in bit.
PW width of max payload, in bit.
S_PW width of payload of small table.
B_PW width of payload of big table.
HASHWH number of hash bits used for PU/buffer selection, 1~3.
HASHWL number of hash bits used for hash-table in PU.
ARW width of address, larger than 24 is suggested.
CH_NM number of input channels, 1,2,4.
k0_strm_arry input of key columns of both tables.
p0_strm_arry input of payload columns of both tables.
e0_strm_arry input of end signal of both tables.
htb0_buf HBM/DDR buffer of hash_table0
htb1_buf HBM/DDR buffer of hash_table1
htb2_buf HBM/DDR buffer of hash_table2
htb3_buf HBM/DDR buffer of hash_table3
htb4_buf HBM/DDR buffer of hash_table4
htb5_buf HBM/DDR buffer of hash_table5
htb6_buf HBM/DDR buffer of hash_table6
htb7_buf HBM/DDR buffer of hash_table7
stb0_buf HBM/DDR buffer of PU0
stb1_buf HBM/DDR buffer of PU1
stb2_buf HBM/DDR buffer of PU2
stb3_buf HBM/DDR buffer of PU3
stb4_buf HBM/DDR buffer of PU4
stb5_buf HBM/DDR buffer of PU5
stb6_buf HBM/DDR buffer of PU6
stb7_buf HBM/DDR buffer of PU7
pu_begin_status_strms contains hash depth, row number of join result
pu_end_status_strms contains hash depth, row number of join result
j_strm output of joined result
j_e_strm end flag of joined result

hashBuildProbeV3

#include "xf_database/hash_join_v3.hpp"
template <
    int HASH_MODE,
    int KEYW,
    int PW,
    int S_PW,
    int B_PW,
    int HASHWH,
    int HASHWL,
    int ARW,
    int CH_NM,
    int BF_W,
    int EN_BF
    >
static void hashBuildProbeV3 (
    bool& build_probe_flag,
    hls::stream <ap_uint <KEYW>> k0_strm_arry [CH_NM],
    hls::stream <ap_uint <PW>> p0_strm_arry [CH_NM],
    hls::stream <bool> e0_strm_arry [CH_NM],
    ap_uint <256>* htb0_buf,
    ap_uint <256>* htb1_buf,
    ap_uint <256>* htb2_buf,
    ap_uint <256>* htb3_buf,
    ap_uint <256>* htb4_buf,
    ap_uint <256>* htb5_buf,
    ap_uint <256>* htb6_buf,
    ap_uint <256>* htb7_buf,
    ap_uint <256>* stb0_buf,
    ap_uint <256>* stb1_buf,
    ap_uint <256>* stb2_buf,
    ap_uint <256>* stb3_buf,
    ap_uint <256>* stb4_buf,
    ap_uint <256>* stb5_buf,
    ap_uint <256>* stb6_buf,
    ap_uint <256>* stb7_buf,
    hls::stream <ap_uint <32>>& pu_begin_status_strms,
    hls::stream <ap_uint <32>>& pu_end_status_strms,
    hls::stream <ap_uint <KEYW+S_PW+B_PW>>& j_strm,
    hls::stream <bool>& j_e_strm
    )

Hash-Build-Probe v3 primitive, it can perform hash build and hash probe separately. It needs two call of kernel to perform build and probe seperately. There is a control flag to decide buld or probe. This primitive supports multiple build and mutiple probe, for example, you can scadule a workflow as: build0->build1->probe0->probe1->build2->build3->probe3…

The maximum size of small table is 256MBx8=2GB in this design. The total hash entries is equal to 1<<(HASHWH + HASHWL), and it is limitied to maximum of 1M entries because of the size of URAM in a single SLR.

This module can accept more than 1 input row per cycle, via multiple input channels. The small table and the big table shares the same input ports, so the width of the payload should be the max of both, while the data should be aligned to the little-end. The small table and big table should be fed only ONCE.

Parameters:

HASH_MODE 0 for radix and 1 for Jenkin’s Lookup3 hash.
KEYW width of key, in bit.
PW width of max payload, in bit.
S_PW width of payload of small table.
B_PW width of payload of big table.
HASHWH number of hash bits used for PU/buffer selection, 1~3.
HASHWL number of hash bits used for hash-table in PU.
ARW width of address, log2(small table max num of rows).
CH_NM number of input channels, 1,2,4.
BF_W bloom-filter hash width.
EN_BF bloom-filter switch, 0 for off, 1 for on.
build_probe_flag 0:build 1:probe
k0_strm_arry input of key columns of both tables.
p0_strm_arry input of payload columns of both tables.
e0_strm_arry input of end signal of both tables.
htb0_buf HBM/DDR buffer of hash_table0
htb1_buf HBM/DDR buffer of hash_table1
htb2_buf HBM/DDR buffer of hash_table2
htb3_buf HBM/DDR buffer of hash_table3
htb4_buf HBM/DDR buffer of hash_table4
htb5_buf HBM/DDR buffer of hash_table5
htb6_buf HBM/DDR buffer of hash_table6
htb7_buf HBM/DDR buffer of hash_table7
stb0_buf HBM/DDR buffer of PU0
stb1_buf HBM/DDR buffer of PU1
stb2_buf HBM/DDR buffer of PU2
stb3_buf HBM/DDR buffer of PU3
stb4_buf HBM/DDR buffer of PU4
stb5_buf HBM/DDR buffer of PU5
stb6_buf HBM/DDR buffer of PU6
stb7_buf HBM/DDR buffer of PU7
pu_begin_status_strms contains build id, fixed hash depth, joined number of last probe and start addr of unused stb_buf for each PU
pu_end_status_strms returns next build id, fixed hash depth, joined number of current probe and end addr of used stb_buf for each PU
j_strm output of joined result
j_e_strm end flag of joined result

hashJoinV4

#include "xf_database/hash_join_v4.hpp"
template <
    int HASH_MODE,
    int KEYW,
    int PW,
    int S_PW,
    int B_PW,
    int HASHWH,
    int HASHWL,
    int ARW,
    int CH_NM,
    int BF_HASH_NM,
    int BFW,
    bool EN_BF
    >
static void hashJoinV4 (
    hls::stream <ap_uint <KEYW>> k0_strm_arry [CH_NM],
    hls::stream <ap_uint <PW>> p0_strm_arry [CH_NM],
    hls::stream <bool> e0_strm_arry [CH_NM],
    ap_uint <64>* htb0_buf,
    ap_uint <64>* htb1_buf,
    ap_uint <64>* htb2_buf,
    ap_uint <64>* htb3_buf,
    ap_uint <64>* htb4_buf,
    ap_uint <64>* htb5_buf,
    ap_uint <64>* htb6_buf,
    ap_uint <64>* htb7_buf,
    ap_uint <64>* stb0_buf,
    ap_uint <64>* stb1_buf,
    ap_uint <64>* stb2_buf,
    ap_uint <64>* stb3_buf,
    ap_uint <64>* stb4_buf,
    ap_uint <64>* stb5_buf,
    ap_uint <64>* stb6_buf,
    ap_uint <64>* stb7_buf,
    hls::stream <ap_uint <32>>& pu_begin_status_strms,
    hls::stream <ap_uint <32>>& pu_end_status_strms,
    hls::stream <ap_uint <KEYW+S_PW+B_PW>>& j_strm,
    hls::stream <bool>& j_e_strm
    )

Hash-Join v4 primitive, using bloom filter to enhance performance of hash join.

The build and probe procedure is similar to which in hashJoinV3 , and this primitive adds a bloom filter to reduce the redundant access to HBM.

The maximum size of small table is 256MBx8=2GB in this design. The total hash entries is equal to 1<<(HASHWH + HASHWL), and it is limitied to maximum of 1M entries because of the size of URAM in a single SLR.

This module can accept more than 1 input row per cycle, via multiple input channels. The small table and the big table shares the same input ports, so the width of the payload should be the max of both, while the data should be aligned to the little-end. Similar to hashJoinV3 , small table and big table should be fed only once.

Parameters:

HASH_MODE 0 for radix and 1 for Jenkin’s Lookup3 hash.
KEYW width of key, in bit.
PW width of max payload, in bit.
S_PW width of payload of small table.
B_PW width of payload of big table.
HASHWH number of hash bits used for PU/buffer selection, 1~3.
HASHWL number of hash bits used for hash-table in PU.
ARW width of address, log2(small table max num of rows).
CH_NM number of input channels, 1,2,4.
BF_HASH_NM number of bloom filter, 1,2,3.
BF_W bloom-filter hash width.
EN_BF bloom-filter switch, 0 for off, 1 for on.
k0_strm_arry input of key columns of both tables.
p0_strm_arry input of payload columns of both tables.
e0_strm_arry input of end signal of both tables.
htb0_buf HBM/DDR buffer of hash_table0
htb1_buf HBM/DDR buffer of hash_table1
htb2_buf HBM/DDR buffer of hash_table2
htb3_buf HBM/DDR buffer of hash_table3
htb4_buf HBM/DDR buffer of hash_table4
htb5_buf HBM/DDR buffer of hash_table5
htb6_buf HBM/DDR buffer of hash_table6
htb7_buf HBM/DDR buffer of hash_table7
stb0_buf HBM/DDR buffer of PU0
stb1_buf HBM/DDR buffer of PU1
stb2_buf HBM/DDR buffer of PU2
stb3_buf HBM/DDR buffer of PU3
stb4_buf HBM/DDR buffer of PU4
stb5_buf HBM/DDR buffer of PU5
stb6_buf HBM/DDR buffer of PU6
stb7_buf HBM/DDR buffer of PU7
pu_begin_status_strms contains build id, fixed hash depth
pu_end_status_strms returns next build id, fixed hash depth, joined number
j_strm output of joined result
j_e_strm end flag of joined result

hashBuildProbeV4

#include "xf_database/hash_join_v4.hpp"
template <
    int KEYW,
    int PW,
    int S_PW,
    int B_PW,
    int HASHWH,
    int HASHWL,
    int HASHO,
    int ARW,
    int CH_NM,
    int BF_HASH_NM,
    int BFW,
    int EN_BF
    >
static void hashBuildProbeV4 (
    bool& build_probe_flag,
    hls::stream <ap_uint <KEYW>> k0_strm_arry [CH_NM],
    hls::stream <ap_uint <PW>> p0_strm_arry [CH_NM],
    hls::stream <bool> e0_strm_arry [CH_NM],
    ap_uint <64>* htb0_buf,
    ap_uint <64>* htb1_buf,
    ap_uint <64>* htb2_buf,
    ap_uint <64>* htb3_buf,
    ap_uint <64>* htb4_buf,
    ap_uint <64>* htb5_buf,
    ap_uint <64>* htb6_buf,
    ap_uint <64>* htb7_buf,
    ap_uint <64>* stb0_buf,
    ap_uint <64>* stb1_buf,
    ap_uint <64>* stb2_buf,
    ap_uint <64>* stb3_buf,
    ap_uint <64>* stb4_buf,
    ap_uint <64>* stb5_buf,
    ap_uint <64>* stb6_buf,
    ap_uint <64>* stb7_buf,
    hls::stream <ap_uint <32>>& pu_begin_status_strms,
    hls::stream <ap_uint <32>>& pu_end_status_strms,
    hls::stream <ap_uint <KEYW+S_PW+B_PW>>& j_strm,
    hls::stream <bool>& j_e_strm
    )

Hash-Build-Probe v4 primitive. Compared with HashBuildProbeV3 , it enables bloom filter to reduce redundant access to HBM which can further reduce run-time of hash join. Build and probe are separately performed and controlled by a boolean flag. Mutiple build and probe are also provided, and it should make sure all rows in build phase can be stored temporarily in HBM to maintain correctness.

The maximum size of small table is 256MBx8=2GB in this design. The total hash entries is equal to 1<<(HASHWH + HASHWL), and it is limitied to maximun of 1M entries because of the size of URAM in a single SLR.

Parameters:

KEYW width of key, in bit.
PW width of max payload, in bit.
S_PW width of payload of small table.
B_PW width of payload of big table.
HASHWH number of hash bits used for PU/buffer selection, 1~3.
HASHWL number of hash bits used for hash-table in PU.
HASHO number of hash bits used for overflow hash counter, 8-12.
ARW width of address, log2(small table max num of rows).
CH_NM number of input channels, 1,2,4.
BF_HASH_NM number of hash functions in bloom filter, 1,2,3.
BFW bloom-filter hash width.
EN_BF bloom-filter switch, 0 for off, 1 for on.
build_probe_flag 0:build 1:probe
k0_strm_arry input of key columns of both tables.
p0_strm_arry input of payload columns of both tables.
e0_strm_arry input of end signal of both tables.
htb0_buf HBM/DDR buffer of hash_table0
htb1_buf HBM/DDR buffer of hash_table1
htb2_buf HBM/DDR buffer of hash_table2
htb3_buf HBM/DDR buffer of hash_table3
htb4_buf HBM/DDR buffer of hash_table4
htb5_buf HBM/DDR buffer of hash_table5
htb6_buf HBM/DDR buffer of hash_table6
htb7_buf HBM/DDR buffer of hash_table7
stb0_buf HBM/DDR buffer of PU0
stb1_buf HBM/DDR buffer of PU1
stb2_buf HBM/DDR buffer of PU2
stb3_buf HBM/DDR buffer of PU3
stb4_buf HBM/DDR buffer of PU4
stb5_buf HBM/DDR buffer of PU5
stb6_buf HBM/DDR buffer of PU6
stb7_buf HBM/DDR buffer of PU7
pu_begin_status_strms contains build ID, probe ID, fixed hash depth, joined number of last probe and start addr of unused stb_buf for each PU
pu_end_status_strms returns next build ID, next probe ID, fixed hash depth, joined number of current probe and end addr of stb_buf for each PU
j_strm output of joined rows.
j_e_strm is the end flag of joined result.

hashLookup3

hashLookup3 overload (1)

#include "xf_database/hash_lookup3.hpp"
template <int W>
void hashLookup3 (
    hls::stream <ap_uint <W>>& key_strm,
    hls::stream <ap_uint <64>>& hash_strm
    )

lookup3 algorithm, 64-bit hash. II=1 when W<=96, otherwise II=(W/96).

Parameters:

W the bit width of ap_uint type for input message stream.
key_strm the message being hashed.
hash_strm the result.

hashLookup3 overload (2)

#include "xf_database/hash_lookup3.hpp"
template <int W>
void hashLookup3 (
    hls::stream <ap_uint <W>>& key_strm,
    hls::stream <ap_uint <32>>& hash_strm
    )

lookup3 algorithm, 32-bit hash. II=1 when W<=96, otherwise II=(W/96).

Parameters:

W the bit width of ap_uint type for input message stream.
key_strm the message being hashed.
hash_strm the result.

hashLookup3 overload (3)

#include "xf_database/hash_lookup3.hpp"
template <
    int WK,
    int WH
    >
void hashLookup3 (
    hls::stream <ap_uint <WK>>& key_strm,
    hls::stream <bool>& e_key_strm,
    hls::stream <ap_uint <WH>>& hash_strm,
    hls::stream <bool>& e_hash_strm
    )

lookup3 algorithm, 64-bit or 32-bit hash.

Parameters:

WK the bit width of input message stream.
WH the bit width of output hash stream, must be 64 or 32.
key_strm the message being hashed.
e_key_strm end of key flag stream.
hash_strm the result.
e_hash_strm end of hash flag stream.

hashMultiJoin

#include "xf_database/hash_multi_join.hpp"
template <
    int HASH_MODE,
    int KEYW,
    int PW,
    int S_PW,
    int B_PW,
    int HASHWH,
    int HASHWL,
    int ARW,
    int CH_NM
    >
void hashMultiJoin (
    hls::stream <ap_uint <3>>& join_flag_strm,
    hls::stream <ap_uint <KEYW>> k0_strm_arry [CH_NM],
    hls::stream <ap_uint <PW>> p0_strm_arry [CH_NM],
    hls::stream <bool> e0_strm_arry [CH_NM],
    ap_uint <256>* htb0_buf,
    ap_uint <256>* htb1_buf,
    ap_uint <256>* htb2_buf,
    ap_uint <256>* htb3_buf,
    ap_uint <256>* htb4_buf,
    ap_uint <256>* htb5_buf,
    ap_uint <256>* htb6_buf,
    ap_uint <256>* htb7_buf,
    ap_uint <256>* stb0_buf,
    ap_uint <256>* stb1_buf,
    ap_uint <256>* stb2_buf,
    ap_uint <256>* stb3_buf,
    ap_uint <256>* stb4_buf,
    ap_uint <256>* stb5_buf,
    ap_uint <256>* stb6_buf,
    ap_uint <256>* stb7_buf,
    hls::stream <ap_uint <32>>& pu_begin_status_strms,
    hls::stream <ap_uint <32>>& pu_end_status_strms,
    hls::stream <ap_uint <KEYW+S_PW+B_PW>>& j_strm,
    hls::stream <bool>& j_e_strm
    )

Multi-PU Hash-Multi-Join primitive, using multiple DDR/HBM buffers.

This primitive shares most of the structure of hashJoinV3 . The inner table should be fed once, followed by the outer table once.

Parameters:

HASH_MODE 0 for radix and 1 for Jenkin’s Lookup3 hash.
KEYW width of key, in bit.
PW width of max payload, in bit.
S_PW width of payload of small table.
B_PW width of payload of big table.
HASHWH number of hash bits used for PU/buffer selection, 1~3.
HASHWL number of hash bits used for hash-table in PU.
ARW width of address, larger than 24 is suggested.
CH_NM number of input channels, 1,2,4.
join_flag_strm specifies the join type, this flag is only read once.
k0_strm_arry input of key columns of both tables.
p0_strm_arry input of payload columns of both tables.
e0_strm_arry input of end signal of both tables.
htb0_buf HBM/DDR buffer of hash_table0
htb1_buf HBM/DDR buffer of hash_table1
htb2_buf HBM/DDR buffer of hash_table2
htb3_buf HBM/DDR buffer of hash_table3
htb4_buf HBM/DDR buffer of hash_table4
htb5_buf HBM/DDR buffer of hash_table5
htb6_buf HBM/DDR buffer of hash_table6
htb7_buf HBM/DDR buffer of hash_table7
stb0_buf HBM/DDR buffer of PU0
stb1_buf HBM/DDR buffer of PU1
stb2_buf HBM/DDR buffer of PU2
stb3_buf HBM/DDR buffer of PU3
stb4_buf HBM/DDR buffer of PU4
stb5_buf HBM/DDR buffer of PU5
stb6_buf HBM/DDR buffer of PU6
stb7_buf HBM/DDR buffer of PU7
pu_begin_status_strms constains depth of hash, row number of join result
pu_end_status_strms constains depth of hash, row number of join result
j_strm output of joined result
j_e_strm end flag of joined result

hashMultiJoinBuildProbe

#include "xf_database/hash_multi_join_build_probe.hpp"
template <
    int HASH_MODE,
    int KEYW,
    int PW,
    int S_PW,
    int B_PW,
    int HASHWH,
    int HASHWL,
    int ARW,
    int CH_NM
    >
void hashMultiJoinBuildProbe (
    bool build_probe_flag,
    hls::stream <ap_uint <3>>& join_flag_strm,
    hls::stream <ap_uint <KEYW>> k0_strm_arry [CH_NM],
    hls::stream <ap_uint <PW>> p0_strm_arry [CH_NM],
    hls::stream <bool> e0_strm_arry [CH_NM],
    ap_uint <256>* htb0_buf,
    ap_uint <256>* htb1_buf,
    ap_uint <256>* htb2_buf,
    ap_uint <256>* htb3_buf,
    ap_uint <256>* htb4_buf,
    ap_uint <256>* htb5_buf,
    ap_uint <256>* htb6_buf,
    ap_uint <256>* htb7_buf,
    ap_uint <256>* stb0_buf,
    ap_uint <256>* stb1_buf,
    ap_uint <256>* stb2_buf,
    ap_uint <256>* stb3_buf,
    ap_uint <256>* stb4_buf,
    ap_uint <256>* stb5_buf,
    ap_uint <256>* stb6_buf,
    ap_uint <256>* stb7_buf,
    hls::stream <ap_uint <32>>& pu_begin_status_strms,
    hls::stream <ap_uint <32>>& pu_end_status_strms,
    hls::stream <ap_uint <KEYW+S_PW+B_PW>>& j_strm,
    hls::stream <bool>& j_e_strm
    )

Multi-PU Hash-Multi-Join primitive, using multiple DDR/HBM buffers.

This primitive shares most of the structure of hashJoinV3 . The inner table should be fed once, followed by the outer table once.

Parameters:

HASH_MODE 0 for radix and 1 for Jenkin’s Lookup3 hash.
KEYW width of key, in bit.
PW width of max payload, in bit.
S_PW width of payload of small table.
B_PW width of payload of big table.
HASHWH number of hash bits used for PU/buffer selection, 1~3.
HASHWL number of hash bits used for hash-table in PU.
ARW width of address, larger than 24 is suggested.
CH_NM number of input channels, 1,2,4.
join_flag_strm specifies the join type, this flag is only read once.
k0_strm_arry input of key columns of both tables.
p0_strm_arry input of payload columns of both tables.
e0_strm_arry input of end signal of both tables.
htb0_buf HBM/DDR buffer of hash_table0
htb1_buf HBM/DDR buffer of hash_table1
htb2_buf HBM/DDR buffer of hash_table2
htb3_buf HBM/DDR buffer of hash_table3
htb4_buf HBM/DDR buffer of hash_table4
htb5_buf HBM/DDR buffer of hash_table5
htb6_buf HBM/DDR buffer of hash_table6
htb7_buf HBM/DDR buffer of hash_table7
stb0_buf HBM/DDR buffer of PU0
stb1_buf HBM/DDR buffer of PU1
stb2_buf HBM/DDR buffer of PU2
stb3_buf HBM/DDR buffer of PU3
stb4_buf HBM/DDR buffer of PU4
stb5_buf HBM/DDR buffer of PU5
stb6_buf HBM/DDR buffer of PU6
stb7_buf HBM/DDR buffer of PU7
pu_begin_status_strms constains depth of hash, row number of join result
pu_end_status_strms constains depth of hash, row number of join result
j_strm output of joined result
j_e_strm end flag of joined result

hashMurmur3

#include "xf_database/hash_murmur3.hpp"
template <
    int W,
    int H
    >
void hashMurmur3 (
    hls::stream <ap_uint <W>>& key_strm,
    hls::stream <ap_uint <H>>& hash_strm
    )

murmur3 algorithm.

Parameters:

W the bit width of ap_uint type for input message stream.
h the bit width of ap_uint type for output hash stream.
key_strm the message being hashed.
hash_strm the result.

hashPartition

#include "xf_database/hash_partition.hpp"
template <
    int HASH_MODE,
    int KEYW,
    int PW,
    int EW,
    int HASHWH,
    int HASHWL,
    int ARW,
    int CH_NM,
    int COL_NM
    >
void hashPartition (
    bool mk_on,
    int depth,
    hls::stream <int>& bit_num_strm,
    hls::stream <ap_uint <KEYW>> k0_strm_arry [CH_NM],
    hls::stream <ap_uint <PW>> p0_strm_arry [CH_NM],
    hls::stream <bool> e0_strm_arry [CH_NM],
    hls::stream <ap_uint <16>>& o_bkpu_num_strm,
    hls::stream <ap_uint <10>>& o_nm_strm,
    hls::stream <ap_uint <EW>> o_kpld_strm [COL_NM]
    )

Hash-Partition primitive.

Parameters:

HASH_MODE 0 for radix and 1 for Jenkin’s Lookup3 hash.
KEYW width of key, in bit.
PW width of max payload, in bit.
EW element data width of input table, in bit.
HASHWH number of hash bits used for PU selection.
HASHWL number of hash bits used for partition selection.
ARW width of address for URAM
CH_NM number of input channels, 1,2,4.
COL_NM number of input columns, 1~8.
mk_on input of double key flag, 0 for off, 1 for on.
depth input of depth of each hash bucket in URAM.
bit_num_strm input of partition number, log2(number of partition).
k0_strm_arry input of key columns of both tables.
p0_strm_arry input of payload columns of both tables.
e0_strm_arry input of end signal of both tables.
o_bkpu_num_strm output of index for bucket and PU
o_nm_strm output of row number each time
o_kpld_strm output of key+payload

hashSemiJoin

#include "xf_database/hash_semi_join.hpp"
template <
    int HashMode,
    int WKey,
    int WPayload,
    int WHashHigh,
    int WhashLow,
    int WTmpBufferAddress,
    int WTmpBuffer,
    int NChannels,
    int WBloomFilter,
    int EnBloomFilter
    >
static void hashSemiJoin (
    hls::stream <ap_uint <WKey>> key_istrms [NChannels],
    hls::stream <ap_uint <WPayload>> payload_istrms [NChannels],
    hls::stream <bool> e0_strm_arry [NChannels],
    ap_uint <WTmpBuffer>* pu0_tmp_rwtpr,
    ap_uint <WTmpBuffer>* pu1_tmp_rwptr,
    ap_uint <WTmpBuffer>* pu2_tmp_rwptr,
    ap_uint <WTmpBuffer>* pu3_tmp_rwptr,
    ap_uint <WTmpBuffer>* pu4_tmp_rwptr,
    ap_uint <WTmpBuffer>* pu5_tmp_rwptr,
    ap_uint <WTmpBuffer>* pu6_tmp_rwptr,
    ap_uint <WTmpBuffer>* pu7_tmp_rwptr,
    hls::stream <ap_uint <WPayload>>& join_ostrm,
    hls::stream <bool>& end_ostrm
    )

Multi-PU Hash-Semi-Join primitive, using multiple DDR/HBM buffers.

The max number of lines of inner table is 2M in this design. It is assumed that the hash-conflict is within 256K per bin.

This module can accept more than 1 input row per cycle, via multiple input channels. The outer table and the inner table share the same input ports, so the width of the payload should be the max of both, while the data should be aligned to the little-end. The inner table should be fed TWICE, followed by the outer table ONCE.

Parameters:

HashMode 0 for radix and 1 for Jenkin’s Lookup3 hash.
WKey width of key, in bit.
WPayload width of payload of outer table.
WHashHigh number of hash bits used for PU/buffer selection, 1~3.
WhashLow number of hash bits used for hash-table in PU.
WTmpBufferAddress width of address, log2(inner table max num of rows).
WTmpBuffer width of buffer.
NChannels number of input channels, 1,2,4.
WBloomFilter bloom-filter hash width.
EnBloomFilter bloom-filter switch, 0 for off, 1 for on.
key_istrms input of key columns of both tables.
payload_istrms input of payload columns of both tables.
e0_strm_arry input of end signal of both tables.
pu0_tmp_rwtpr HBM/DDR buffer of PU0
pu1_tmp_rwptr HBM/DDR buffer of PU1
pu2_tmp_rwptr HBM/DDR buffer of PU2
pu3_tmp_rwptr HBM/DDR buffer of PU3
pu4_tmp_rwptr HBM/DDR buffer of PU4
pu5_tmp_rwptr HBM/DDR buffer of PU5
pu6_tmp_rwptr HBM/DDR buffer of PU6
pu7_tmp_rwptr HBM/DDR buffer of PU7
join_ostrm output of joined rows.
end_ostrm end signal of joined rows.

insertSort

insertSort overload (1)

#include "xf_database/insert_sort.hpp"
template <
    typename KEY_TYPE,
    int MAX_SORT_NUMBER
    >
void insertSort (
    hls::stream <KEY_TYPE>& kinStrm,
    hls::stream <bool>& endInStrm,
    hls::stream <KEY_TYPE>& koutStrm,
    hls::stream <bool>& endOutStrm,
    bool order
    )

Insert sort top function.

Parameters:

KEY_TYPE the input and output key type
MAX_SORT_NUMBER the max number of the sequence can be sorted
kinStrm input key stream
endInStrm end flag stream for input
koutStrm output key stream
endOutStrm end flag stream for output
order 1:sort ascending 0:sort descending

insertSort overload (2)

#include "xf_database/insert_sort.hpp"
template <
    typename KEY_TYPE,
    typename DATA_TYPE,
    int MAX_SORT_NUMBER
    >
void insertSort (
    hls::stream <DATA_TYPE>& dinStrm,
    hls::stream <KEY_TYPE>& kinStrm,
    hls::stream <bool>& endInStrm,
    hls::stream <DATA_TYPE>& doutStrm,
    hls::stream <KEY_TYPE>& koutStrm,
    hls::stream <bool>& endOutStrm,
    bool order
    )

Insert sort top function.

Parameters:

KEY_TYPE the input and output key type
DATA_TYPE the input and output data type
MAX_SORT_NUMBER the max number of the sequence can be sorted
dinStrm input data stream
kinStrm input key stream
endInStrm end flag stream for input
doutStrm output data stream
koutStrm output key stream
endOutStrm end flag stream for output
order 1:sort ascending 0:sort descending

mergeJoin

#include "xf_database/merge_join.hpp"
template <
    typename KEY_T,
    typename LEFT_FIELD_T,
    typename RIGHT_FIELD_T
    >
void mergeJoin (
    bool isascend,
    hls::stream <KEY_T>& left_strm_in_key,
    hls::stream <LEFT_FIELD_T>& left_strm_in_field,
    hls::stream <bool>& left_e_strm,
    hls::stream <KEY_T>& right_strm_in_key,
    hls::stream <RIGHT_FIELD_T>& right_strm_in_field,
    hls::stream <bool>& right_e_strm,
    hls::stream <KEY_T>& left_strm_out_key,
    hls::stream <LEFT_FIELD_T>& left_strm_out_field,
    hls::stream <KEY_T>& right_strm_out_key,
    hls::stream <RIGHT_FIELD_T>& right_strm_out_field,
    hls::stream <bool>& out_e_strm
    )

merge join function for sorted tables without duplicated key in the left table

Parameters:

KEY_T the type of the key of left table
LEFT_FIELD_T the type of the field of left table
RIGHT_FIELD_T the type of the field of right table
isascend the flag to show if the input tables are ascend or descend tables
left_strm_in_key the key stream of the left input table
left_strm_in_field the field stream of the left input table
left_e_strm the end flag stream to mark the end of left input table
right_strm_in_key the key stream of the right input table
right_strm_in_field the field stream of the right input table
right_e_strm the end flag stream to mark the end of right input table
left_strm_out_key the output key stream of left table
left_strm_out_field the output field stream of left table
right_strm_out_key the output key stream of right table
right_strm_out_field the output field stream of right
out_e_strm the end flag stream to mark the end of out table

mergeLeftJoin

#include "xf_database/merge_left_join.hpp"
template <
    typename KEY_T,
    typename LEFT_FIELD_T,
    typename RIGHT_FIELD_T
    >
void mergeLeftJoin (
    bool isascend,
    hls::stream <KEY_T>& left_strm_in_key,
    hls::stream <LEFT_FIELD_T>& left_strm_in_field,
    hls::stream <bool>& left_e_strm,
    hls::stream <KEY_T>& right_strm_in_key,
    hls::stream <RIGHT_FIELD_T>& right_strm_in_field,
    hls::stream <bool>& right_e_strm,
    hls::stream <KEY_T>& left_strm_out_key,
    hls::stream <LEFT_FIELD_T>& left_strm_out_field,
    hls::stream <KEY_T>& right_strm_out_key,
    hls::stream <RIGHT_FIELD_T>& right_strm_out_field,
    hls::stream <bool>& out_e_strm,
    hls::stream <bool>& isnull_strm
    )

merge left join function for sorted table, left table should not have duplicated keys.

Parameters:

KEY_T the type of the key
LEFT_FIELD_T the type of the field of left table
RIGHT_FIELD_T the type of the field of right table
isascend flag to show if the input tables are ascend tables
left_strm_in_key the key stream of the left input table
left_strm_in_field the field stream of the left input table
left_e_strm the end flag stream to mark the end of left input table
right_strm_in_key the key stream of the right input table
right_strm_in_field the field stream of the right input table
right_e_strm the end flag stream to mark the end of right input table
left_strm_out_key the output key stream of left table
left_strm_out_field the output field stream of left table
right_strm_out_key the output key stream of right table
right_strm_out_field the output field stream of right
out_e_strm the end flag stream to mark the end of out table
isnull_strm the isnull stream to show if the result right table is null.

mergeSort

mergeSort overload (1)

#include "xf_database/merge_sort.hpp"
template <typename Key_Type>
void mergeSort (
    hls::stream <Key_Type>& left_kin_strm,
    hls::stream <bool>& left_strm_in_end,
    hls::stream <Key_Type>& right_kin_strm,
    hls::stream <bool>& right_strm_in_end,
    hls::stream <Key_Type>& kout_strm,
    hls::stream <bool>& strm_out_end,
    bool order
    )

Merge sort function.

Parameters:

Data_Type the input and output key type
left_kin_strm input key stream
left_strm_in_end end flag stream for left input
right_kin_strm input key stream
right_strm_in_end end flag stream for right input
kout_strm output key stream
strm_out_end end flag stream for output data
order 1:ascending 0:descending

mergeSort overload (2)

#include "xf_database/merge_sort.hpp"
template <
    typename Data_Type,
    typename Key_Type
    >
void mergeSort (
    hls::stream <Data_Type>& left_din_strm,
    hls::stream <Key_Type>& left_kin_strm,
    hls::stream <bool>& left_strm_in_end,
    hls::stream <Data_Type>& right_din_strm,
    hls::stream <Key_Type>& right_kin_strm,
    hls::stream <bool>& right_strm_in_end,
    hls::stream <Data_Type>& dout_strm,
    hls::stream <Key_Type>& kout_strm,
    hls::stream <bool>& strm_out_end,
    bool order
    )

Merge sort function.

Parameters:

Data_Type the input and output data type
Data_Type the input and output key type
left_din_strm input left data stream
left_kin_strm input key stream
left_strm_in_end end flag stream for left input
right_din_strm input right data stream
right_kin_strm input key stream
right_strm_in_end end flag stream for right input
dout_strm output data stream
kout_strm output key stream
strm_out_end end flag stream for output data
order 1:ascending 0:descending

nestedLoopJoin

#include "xf_database/nested_loop_join.hpp"
template <
    int CMP_NUM,
    typename KEY_T,
    typename LEFT_FIELD_T,
    typename RIGHT_FIELD_T
    >
void nestedLoopJoin (
    hls::stream <KEY_T>& strm_in_left_key,
    hls::stream <LEFT_FIELD_T>& strm_in_left_field,
    hls::stream <bool>& strm_in_left_e,
    hls::stream <KEY_T>& strm_in_right_key,
    hls::stream <RIGHT_FIELD_T>& strm_in_right_field,
    hls::stream <bool>& strm_in_right_e,
    hls::stream <KEY_T> strm_out_left_key [CMP_NUM],
    hls::stream <LEFT_FIELD_T> strm_out_left_field [CMP_NUM],
    hls::stream <KEY_T> strm_out_right_key [CMP_NUM],
    hls::stream <RIGHT_FIELD_T> strm_out_right_field [CMP_NUM],
    hls::stream <bool> strm_out_e [CMP_NUM]
    )

Nested loop join function.

Parameters:

KEY_T the type of the key of left table
LEFT_FIELD_T the type of the field of left table
RIGHT_FIELD_T the type of the field of right table
strm_in_left_key the key stream of the left input table
strm_in_left_field the field stream of the left input table
strm_in_left_e the end flag stream to mark the end of left input table
strm_in_right_key the key stream of the right input table
strm_in_right_field the field stream of the right input table
strm_in_right_e the end flag stream to mark the end of right input table
strm_out_left_key the output key stream of left table
strm_out_left_field the output field stream of left table
strm_out_right_key the output key stream of right table
strm_out_right_field the output field stream of right
strm_out_e the end flag stream to mark the end of out table

scanCmpStrCol

#include "xf_database/scan_cmp_str_col.hpp"
void scanCmpStrCol (
    ap_uint <512>* ddr_ptr,
    hls::stream <int>& size,
    hls::stream <int>& num_str,
    hls::stream <ap_uint <512>>& cnst_stream,
    hls::stream <bool>& out_stream,
    hls::stream <bool>& e_str_o
    )

sacn multiple columns of string in global memory, and compare each of them with constant string

Parameters:

ddr_ptr input string array stored in global memory.
size the number of times reading global memory
num_str the number of actual strings
cnst_stream input constant string stream, 512 bits in heading-length and padding-zero format, read only once as configuration.
out_stream output whether each string is equal to the constant string, true indicates they are equal.
e_str_o end flag stream for output stream.

scanCol

scanCol overload (1)

#include "xf_database/scan_col.hpp"
template <
    int burst_len,
    int vec_len,
    int size0
    >
void scanCol (
    ap_uint <8*size0*vec_len>* c0vec_ptr,
    const int nrow,
    hls::stream <ap_uint <8*size0>>& c0_strm,
    hls::stream <bool>& e_row_strm
    )

Scan 1 column from DDR/HBM buffers.

Parameters:

burst_len burst read length, must be supported by MC.
vec_len number of items to be scanned as a vector from AXI port.
size0 size of column 0, in byte.
c0vec_ptr buffer pointer to column 0.
nrow number of row to scan.
c0_strm column 0 stream.
e_row_strm output end flag stream.

scanCol overload (2)

#include "xf_database/scan_col.hpp"
template <
    int burst_len,
    int vec_len,
    int size0,
    int size1
    >
void scanCol (
    ap_uint <8*size0*vec_len>* c0vec_ptr,
    ap_uint <8*size1*vec_len>* c1vec_ptr,
    const int nrow,
    hls::stream <ap_uint <8*size0>>& c0_strm,
    hls::stream <ap_uint <8*size1>>& c1_strm,
    hls::stream <bool>& e_row_strm
    )

Scan 2 columns from DDR/HBM buffers.

Parameters:

burst_len burst read length, must be supported by MC.
vec_len number of items to be scanned as a vector from AXI port.
size0 size of column 0, in byte.
size1 size of column 1, in byte.
c0vec_ptr buffer pointer to column 0.
c1vec_ptr buffer pointer to column 1.
nrow number of row to scan.
c0_strm column 0 stream.
c1_strm column 1 stream.
e_row_strm output end flag stream.

scanCol overload (3)

#include "xf_database/scan_col.hpp"
template <
    int burst_len,
    int vec_len,
    int size0,
    int size1,
    int size2
    >
void scanCol (
    ap_uint <8*size0*vec_len>* c0vec_ptr,
    ap_uint <8*size1*vec_len>* c1vec_ptr,
    ap_uint <8*size2*vec_len>* c2vec_ptr,
    const int nrow,
    hls::stream <ap_uint <8*size0>>& c0_strm,
    hls::stream <ap_uint <8*size1>>& c1_strm,
    hls::stream <ap_uint <8*size2>>& c2_strm,
    hls::stream <bool>& e_row_strm
    )

Scan 3 columns from DDR/HBM buffers.

Parameters:

burst_len burst read length, must be supported by MC.
vec_len number of items to be scanned as a vector from AXI port.
size0 size of column 0, in byte.
size1 size of column 1, in byte.
size2 size of column 2, in byte.
c0vec_ptr buffer pointer to column 0.
c1vec_ptr buffer pointer to column 1.
c2vec_ptr buffer pointer to column 2.
nrow number of row to scan.
c0_strm column 0 stream.
c1_strm column 1 stream.
c2_strm column 2 stream.
e_row_strm output end flag stream.

scanCol overload (4)

#include "xf_database/scan_col.hpp"
template <
    int burst_len,
    int vec_len,
    int size0,
    int size1,
    int size2,
    int size3
    >
void scanCol (
    ap_uint <8*size0*vec_len>* c0vec_ptr,
    ap_uint <8*size1*vec_len>* c1vec_ptr,
    ap_uint <8*size2*vec_len>* c2vec_ptr,
    ap_uint <8*size3*vec_len>* c3vec_ptr,
    const int nrow,
    hls::stream <ap_uint <8*size0>>& c0_strm,
    hls::stream <ap_uint <8*size1>>& c1_strm,
    hls::stream <ap_uint <8*size2>>& c2_strm,
    hls::stream <ap_uint <8*size3>>& c3_strm,
    hls::stream <bool>& e_row_strm
    )

Scan 4 columns from DDR/HBM buffers.

Parameters:

burst_len burst read length, must be supported by MC.
vec_len number of items to be scanned as a vector from AXI port.
size0 size of column 0, in byte.
size1 size of column 1, in byte.
size2 size of column 2, in byte.
size3 size of column 3, in byte.
c0vec_ptr buffer pointer to column 0.
c1vec_ptr buffer pointer to column 1.
c2vec_ptr buffer pointer to column 2.
c3vec_ptr buffer pointer to column 3.
nrow number of row to scan.
c0_strm column 0 stream.
c1_strm column 1 stream.
c2_strm column 2 stream.
c3_strm column 3 stream.
e_row_strm output end flag stream.

scanCol overload (5)

#include "xf_database/scan_col.hpp"
template <
    int burst_len,
    int vec_len,
    int size0,
    int size1,
    int size2,
    int size3,
    int size4
    >
void scanCol (
    ap_uint <8*size0*vec_len>* c0vec_ptr,
    ap_uint <8*size1*vec_len>* c1vec_ptr,
    ap_uint <8*size2*vec_len>* c2vec_ptr,
    ap_uint <8*size3*vec_len>* c3vec_ptr,
    ap_uint <8*size4*vec_len>* c4vec_ptr,
    const int nrow,
    hls::stream <ap_uint <8*size0>>& c0_strm,
    hls::stream <ap_uint <8*size1>>& c1_strm,
    hls::stream <ap_uint <8*size2>>& c2_strm,
    hls::stream <ap_uint <8*size3>>& c3_strm,
    hls::stream <ap_uint <8*size4>>& c4_strm,
    hls::stream <bool>& e_row_strm
    )

Scan 5 columns from DDR/HBM buffers.

Parameters:

burst_len burst read length, must be supported by MC.
vec_len number of items to be scanned as a vector from AXI port.
size0 size of column 0, in byte.
size1 size of column 1, in byte.
size2 size of column 2, in byte.
size3 size of column 3, in byte.
size4 size of column 4, in byte.
c0vec_ptr buffer pointer to column 0.
c1vec_ptr buffer pointer to column 1.
c2vec_ptr buffer pointer to column 2.
c3vec_ptr buffer pointer to column 3.
c4vec_ptr buffer pointer to column 4.
nrow number of row to scan.
c0_strm column 0 stream.
c1_strm column 1 stream.
c2_strm column 2 stream.
c3_strm column 3 stream.
c4_strm column 4 stream.
e_row_strm output end flag stream.

scanCol overload (6)

#include "xf_database/scan_col.hpp"
template <
    int burst_len,
    int vec_len,
    int size0,
    int size1,
    int size2,
    int size3,
    int size4,
    int size5
    >
void scanCol (
    ap_uint <8*size0*vec_len>* c0vec_ptr,
    ap_uint <8*size1*vec_len>* c1vec_ptr,
    ap_uint <8*size2*vec_len>* c2vec_ptr,
    ap_uint <8*size3*vec_len>* c3vec_ptr,
    ap_uint <8*size4*vec_len>* c4vec_ptr,
    ap_uint <8*size5*vec_len>* c5vec_ptr,
    const int nrow,
    hls::stream <ap_uint <8*size0>>& c0_strm,
    hls::stream <ap_uint <8*size1>>& c1_strm,
    hls::stream <ap_uint <8*size2>>& c2_strm,
    hls::stream <ap_uint <8*size3>>& c3_strm,
    hls::stream <ap_uint <8*size4>>& c4_strm,
    hls::stream <ap_uint <8*size5>>& c5_strm,
    hls::stream <bool>& e_row_strm
    )

Scan 6 columns from DDR/HBM buffers.

Parameters:

burst_len burst read length, must be supported by MC.
vec_len number of items to be scanned as a vector from AXI port.
size0 size of column 0, in byte.
size1 size of column 1, in byte.
size2 size of column 2, in byte.
size3 size of column 3, in byte.
size4 size of column 4, in byte.
size5 size of column 5, in byte.
c0vec_ptr buffer pointer to column 0.
c1vec_ptr buffer pointer to column 1.
c2vec_ptr buffer pointer to column 2.
c3vec_ptr buffer pointer to column 3.
c4vec_ptr buffer pointer to column 4.
c5vec_ptr buffer pointer to column 5.
nrow number of row to scan.
c0_strm column 0 stream.
c1_strm column 1 stream.
c2_strm column 2 stream.
c3_strm column 3 stream.
c4_strm column 4 stream.
c5_strm column 5 stream.
e_row_strm output end flag stream.

scanCol overload (7)

#include "xf_database/scan_col.hpp"
template <
    int burst_len,
    int vec_len,
    int ch_num,
    int size0
    >
void scanCol (
    ap_uint <8*size0*vec_len>* c0vec_ptr,
    const int nrow,
    hls::stream <ap_uint <8*size0>> c0_strm [ch_num],
    hls::stream <bool> e_row_strm [ch_num]
    )

Scan one column from DDR/HBM buffers, emit multiple rows concurrently.

Parameters:

burst_len burst read length, must be supported by MC.
vec_len number of items to be scanned as a vector from AXI port.
ch_num number of concurrent output channels per column.
size0 size of column 0, in byte.
c0vec_ptr buffer pointer to column 0.
nrow number of row to scan.
c0_strm array of column 0 stream.
e_row_strm array of output end flag stream.

scanCol overload (8)

#include "xf_database/scan_col.hpp"
template <
    int burst_len,
    int vec_len,
    int ch_num,
    int size0,
    int size1
    >
void scanCol (
    ap_uint <8*size0*vec_len>* c0vec_ptr,
    ap_uint <8*size1*vec_len>* c1vec_ptr,
    const int nrow,
    hls::stream <ap_uint <8*size0>> c0_strm [ch_num],
    hls::stream <ap_uint <8*size1>> c1_strm [ch_num],
    hls::stream <bool> e_row_strm [ch_num]
    )

Scan two columns from DDR/HBM buffers, emit multiple rows concurrently.

Parameters:

burst_len burst read length, must be supported by MC.
vec_len number of items to be scanned as a vector from AXI port.
ch_num number of concurrent output channels per column.
size0 size of column 0, in byte.
size1 size of column 1, in byte.
c0vec_ptr buffer pointer to column 0.
c1vec_ptr buffer pointer to column 1.
nrow number of row to scan.
c0_strm array of column 0 stream.
c1_strm array of column 1 stream.
e_row_strm array of output end flag stream.

scanCol overload (9)

#include "xf_database/scan_col.hpp"
template <
    int burst_len,
    int vec_len,
    int ch_num,
    int size0,
    int size1,
    int size2
    >
void scanCol (
    ap_uint <8*size0*vec_len>* c0vec_ptr,
    ap_uint <8*size1*vec_len>* c1vec_ptr,
    ap_uint <8*size2*vec_len>* c2vec_ptr,
    const int nrow,
    hls::stream <ap_uint <8*size0>> c0_strm [ch_num],
    hls::stream <ap_uint <8*size1>> c1_strm [ch_num],
    hls::stream <ap_uint <8*size2>> c2_strm [ch_num],
    hls::stream <bool> e_row_strm [ch_num]
    )

Scan three columns from DDR/HBM buffers, emit multiple rows concurrently.

Parameters:

burst_len burst read length, must be supported by MC.
vec_len number of items to be scanned as a vector from AXI port.
ch_num number of concurrent output channels per column.
size0 size of column 0, in byte.
size1 size of column 1, in byte.
size2 size of column 2, in byte.
c0vec_ptr buffer pointer to column 0.
c1vec_ptr buffer pointer to column 1.
c2vec_ptr buffer pointer to column 2.
nrow number of row to scan.
c0_strm array of column 0 stream.
c1_strm array of column 1 stream.
c2_strm array of column 2 stream.
e_row_strm array of output end flag stream.

scanCol overload (10)

#include "xf_database/scan_col_2.hpp"
template <
    int burst_len,
    int vec_len,
    int ch_nm,
    int size0,
    int size1
    >
static void scanCol (
    ap_uint <8*size0*vec_len>* c0vec_ptr,
    ap_uint <8*size1*vec_len>* c1vec_ptr,
    hls::stream <ap_uint <8*size0>> c0_strm [ch_nm],
    hls::stream <ap_uint <8*size1>> c1_strm [ch_nm],
    hls::stream <bool> e_row_strm [ch_nm]
    )

scan 2 columns from DDR/HBM buffers.

The LSB of first vector of first column specifies the number of rows to be scanned. For a following buffer, if the first vector is zero, same number of zeros will be emitted, otherwise, same number of rows will be read from the buffer.

Parameters:

burst_len burst read length, must be supported by MC.
vec_len scan this number of items as a vector from AXI port.
ch_nm number of concurrent output channels per column.
size0 size of column 0, in byte.
size1 size of column 1, in byte.
c0vec_ptr buffer pointer to column 0.
c1vec_ptr buffer pointer to column 1.
c0_strm array of column 0 stream.
c1_strm array of column 1 stream.
e_row_strm array of output end flag stream.

scanCol overload (11)

#include "xf_database/scan_col_2.hpp"
template <
    int burst_len,
    int vec_len,
    int ch_nm,
    int size0,
    int size1,
    int size2
    >
static void scanCol (
    ap_uint <8*size0*vec_len>* c0vec_ptr,
    ap_uint <8*size1*vec_len>* c1vec_ptr,
    ap_uint <8*size2*vec_len>* c2vec_ptr,
    hls::stream <ap_uint <8*size0>> c0_strm [ch_nm],
    hls::stream <ap_uint <8*size1>> c1_strm [ch_nm],
    hls::stream <ap_uint <8*size2>> c2_strm [ch_nm],
    hls::stream <bool> e_row_strm [ch_nm]
    )

scan 3 columns from DDR/HBM buffers.

The LSB of first vector of first column specifies the number of rows to be scanned. For a following buffer, if the first vector is zero, same number of zeros will be emitted, otherwise, same number of rows will be read from the buffer.

Parameters:

burst_len burst read length, must be supported by MC.
vec_len scan this number of items as a vector from AXI port.
ch_nm number of concurrent output channels per column.
size0 size of column 0, in byte.
size1 size of column 1, in byte.
size2 size of column 2, in byte.
c0vec_ptr buffer pointer to column 0.
c1vec_ptr buffer pointer to column 1.
c2vec_ptr buffer pointer to column 2.
c0_strm array of column 0 stream.
c1_strm array of column 1 stream.
c2_strm array of column 2 stream.
e_row_strm array of output end flag stream.

scanCol overload (12)

#include "xf_database/scan_col_2.hpp"
template <
    int burst_len,
    int vec_len,
    int ch_nm,
    int size0,
    int size1,
    int size2,
    int size3
    >
static void scanCol (
    ap_uint <8*size0*vec_len>* c0vec_ptr,
    ap_uint <8*size1*vec_len>* c1vec_ptr,
    ap_uint <8*size2*vec_len>* c2vec_ptr,
    ap_uint <8*size3*vec_len>* c3vec_ptr,
    hls::stream <ap_uint <8*size0>> c0_strm [ch_nm],
    hls::stream <ap_uint <8*size1>> c1_strm [ch_nm],
    hls::stream <ap_uint <8*size2>> c2_strm [ch_nm],
    hls::stream <ap_uint <8*size3>> c3_strm [ch_nm],
    hls::stream <bool> e_row_strm [ch_nm]
    )

scan 4 columns from DDR/HBM buffers.

The LSB of first vector of first column specifies the number of rows to be scanned. For a following buffer, if the first vector is zero, same number of zeros will be emitted, otherwise, same number of rows will be read from the buffer.

Parameters:

burst_len burst read length, must be supported by MC.
vec_len scan this number of items as a vector from AXI port.
ch_nm number of concurrent output channels per column.
size0 size of column 0, in byte.
size1 size of column 1, in byte.
size2 size of column 2, in byte.
size3 size of column 3, in byte.
c0vec_ptr buffer pointer to column 0.
c1vec_ptr buffer pointer to column 1.
c2vec_ptr buffer pointer to column 2.
c3vec_ptr buffer pointer to column 3.
c0_strm array of column 0 stream.
c1_strm array of column 1 stream.
c2_strm array of column 2 stream.
c3_strm array of column 3 stream.
e_row_strm array of output end flag stream.

scanCol overload (13)

#include "xf_database/scan_col_2.hpp"
template <
    int burst_len,
    int vec_len,
    int ch_nm,
    int size0,
    int size1,
    int size2,
    int size3,
    int size4
    >
static void scanCol (
    ap_uint <8*size0*vec_len>* c0vec_ptr,
    ap_uint <8*size1*vec_len>* c1vec_ptr,
    ap_uint <8*size2*vec_len>* c2vec_ptr,
    ap_uint <8*size3*vec_len>* c3vec_ptr,
    ap_uint <8*size4*vec_len>* c4vec_ptr,
    hls::stream <ap_uint <8*size0>> c0_strm [ch_nm],
    hls::stream <ap_uint <8*size1>> c1_strm [ch_nm],
    hls::stream <ap_uint <8*size2>> c2_strm [ch_nm],
    hls::stream <ap_uint <8*size3>> c3_strm [ch_nm],
    hls::stream <ap_uint <8*size4>> c4_strm [ch_nm],
    hls::stream <bool> e_row_strm [ch_nm]
    )

scan 5 columns from DDR/HBM buffers.

The LSB of first vector of first column specifies the number of rows to be scanned. For a following buffer, if the first vector is zero, same number of zeros will be emitted, otherwise, same number of rows will be read from the buffer.

Parameters:

burst_len burst read length, must be supported by MC.
vec_len scan this number of items as a vector from AXI port.
ch_nm number of concurrent output channels per column.
size0 size of column 0, in byte.
size1 size of column 1, in byte.
size2 size of column 2, in byte.
size3 size of column 3, in byte.
size4 size of column 4, in byte.
c0vec_ptr buffer pointer to column 0.
c1vec_ptr buffer pointer to column 1.
c2vec_ptr buffer pointer to column 2.
c3vec_ptr buffer pointer to column 3.
c4vec_ptr buffer pointer to column 4.
c0_strm array of column 0 stream.
c1_strm array of column 1 stream.
c2_strm array of column 2 stream.
c3_strm array of column 3 stream.
c4_strm array of column 4 stream.
e_row_strm array of output end flag stream.

staticEval

staticEval overload (1)

#include "xf_database/static_eval.hpp"
template <
    typename T,
    typename T_O,
    T_O(*)(T) opf
    >
void staticEval (
    hls::stream <T>& in_strm,
    hls::stream <bool>& e_in_strm,
    hls::stream <T_O>& out_strm,
    hls::stream <bool>& e_out_strm
    )

One stream input static evaluation.

static_eval function calculates the experssion result that user defined. This result will be passed to aggregate module as the input. When calling this API, the T T_O are the input/output data types for each parameter of user code. E.g.

// decl
long user_func(int a);
// use
 database::static_eval<int, long, user_func>(
  in1_strm, e_in_strm, out_strm, e_out_strm);

In the above call, int is the data type of input of user_func , and long is the return type of user_func .

Parameters:

T the input stream type, inferred from argument
T_O the output stream type, inferred from argument
opf the user-defined expression function
in_strm input data stream
e_in_strm end flag stream for input data
out_strm output data stream
e_out_strm end flag stream for output data

staticEval overload (2)

#include "xf_database/static_eval.hpp"
template <
    typename T1,
    typename T2,
    typename T_O,
    T_O(*)(T1, T2) opf
    >
void staticEval (
    hls::stream <T1>& in1_strm,
    hls::stream <T2>& in2_strm,
    hls::stream <bool>& e_in_strm,
    hls::stream <T_O>& out_strm,
    hls::stream <bool>& e_out_strm
    )

Two stream input static evaluation.

static_eval function calculate the experssion result that user defined. This result will be passed to aggregate module as the input. When calling this API, the T1 T2 T_O are the input/output data types for each parameter of user code. E.g.

// decl
long user_func(int a, int b);
// use
 database::static_eval<int, int, long, user_func>(
  in1_strm, in2_strm, e_in_strm, out_strm, e_out_strm);

In the above call, two int are the data type of input of user_func , and long is the return type of user_func .

Parameters:

T1 the input stream type, inferred from argument
T2 the input stream type, inferred from argument
T_O the output stream type, inferred from argument
opf the user-defined expression function
in1_strm input data stream
in2_strm input data stream
e_in_strm end flag stream for input data
out_strm output data stream
e_out_strm end flag stream for output data

staticEval overload (3)

#include "xf_database/static_eval.hpp"
template <
    typename T1,
    typename T2,
    typename T3,
    typename T_O,
    T_O(*)(T1, T2, T3) opf
    >
void staticEval (
    hls::stream <T1>& in1_strm,
    hls::stream <T2>& in2_strm,
    hls::stream <T3>& in3_strm,
    hls::stream <bool>& e_in_strm,
    hls::stream <T_O>& out_strm,
    hls::stream <bool>& e_out_strm
    )

Three stream input static evaluation.

static_eval function calculate the experssion result that user defined. This result will be passed to aggregate module as the input. When calling this API, the T1 T2 T3 T_O are the input/output data types for each parameter of user code. E.g.

// decl
long user_func(int a, int b, int c);
// use
 database::static_eval<int, int, int, long, user_func>(
  in1_strm, in2_strm, in3_strm, e_in_strm,
  out_strm, e_out_strm);

In the above call, three int are the data type of input of user_func , and long is the return type of user_func .

Parameters:

T1 the input stream type, inferred from argument
T2 the input stream type, inferred from argument
T3 the input stream type, inferred from argument
T_O the output stream type, inferred from argument
opf the user-defined expression function
in1_strm input data stream
in2_strm input data stream
in3_strm input data stream
e_in_strm end flag stream for input data
out_strm output data stream
e_out_strm end flag stream for output data

staticEval overload (4)

#include "xf_database/static_eval.hpp"
template <
    typename T1,
    typename T2,
    typename T3,
    typename T4,
    typename T_O,
    T_O(*)(T1, T2, T3, T4) opf
    >
void staticEval (
    hls::stream <T1>& in1_strm,
    hls::stream <T2>& in2_strm,
    hls::stream <T3>& in3_strm,
    hls::stream <T4>& in4_strm,
    hls::stream <bool>& e_in_strm,
    hls::stream <T_O>& out_strm,
    hls::stream <bool>& e_out_strm
    )

Four stream input static evaluation.

static_eval function calculate the experssion result that user defined. This result will be passed to aggregate module as the input. When calling this API, the T1 T2 T3 T_O are the input/output data types for each parameter of user code. E.g.

// decl
long user_func(int a, int b, int c, int d);
// use
 database::static_eval<int, int, int, int, long, user_func>(
  in1_strm, in2_strm, in3_strm, in3_strm, e_in_strm,
  out_strm, e_out_strm);

In the above call, four int are the data type of input of user_func , and long is the return type of user_func .

Parameters:

T1 the input stream type, inferred from argument
T2 the input stream type, inferred from argument
T3 the input stream type, inferred from argument
T4 the input stream type, inferred from argument
T_O the output stream type, inferred from argument
opf the user-defined expression function
in1_strm input data stream
in2_strm input data stream
in3_strm input data stream
in4_strm input data stream
e_in_strm end flag stream for input data
out_strm output data stream
e_out_strm end flag stream for output data