Primitive APIs in `xf::database`¶

aggregate¶

aggregate overload (1)¶

#include "xf_database/aggregate.hpp"

template <
    AggregateOp op,
    typename T
    >
void aggregate (
    hls::stream <T>& in_strm,
    hls::stream <bool>& in_e_strm,
    hls::stream <T>& out_strm,
    hls::stream <bool>& out_e_strm
    )

Overload for most common aggregations.

As shown below in the parameters, this function can calculate one of a range of statistics, including minimal, maximal, average (mean), variance, L1 norm, L2 norm. It can also calculate the sum and count.

The limitation in this function is that the output data type must match with the input data type. In some cases, the sum or count may overflow the output type, but it can be safely covered by other aggregation overloads.

Note that minimum, maximum, sum, count, number of non-zero, L1 norm as well as L2 norm aggregate functions will all be returned as zero when the input is empty.

For group-by aggregation, please refer to the hashGroupAggregateMPU primitive.

Parameters:

op	the aggregate operator: AOP_SUM, AOP_MAX, AOP_MIN, AOP_MEAN, AOP_VARIANCE, AOP_NORML1 or AOP_NORML2
T	the data type of input and output streams
in_strm	input data stream
in_e_strm	end flag stream for input data
out_strm	output data stream
out_e_strm	end flag stream for output data

aggregate overload (2)¶

#include "xf_database/aggregate.hpp"

template <
    AggregateOp op,
    typename T,
    typename T2
    >
void aggregate (
    hls::stream <T>& in_strm,
    hls::stream <bool>& in_e_strm,
    hls::stream <T2>& out_strm,
    hls::stream <bool>& out_e_strm
    )

Aggregate function overload for SUM operation.

The output type can be inferred to be different from input type, this allows the sum value to have more precision bits than input, and avoid overflow.

Note that sum aggregate function will be returned as zero when the input is empty.

For group-by aggregation, please refer to the hashGroupAggregateMPU primitive.

Parameters:

op	the aggregate operator: AOP_SUM
T	the data type of input stream, inferred from argument
T2	the data type of output stream, inferred from argument
in_strm	input data stream
in_e_strm	end flag stream for input data
out_strm	output data stream
out_e_strm	end flag stream for output data

aggregate overload (3)¶

#include "xf_database/aggregate.hpp"

template <
    AggregateOp op,
    typename T
    >
void aggregate (
    hls::stream <T>& in_strm,
    hls::stream <bool>& in_e_strm,
    hls::stream <uint64_t>& out_strm,
    hls::stream <bool>& out_e_strm
    )

Aggregate function overload for counting.

This function counts the number of input rows, or number of non-zero input rows, and returns the count as uint64_t value.

Note that count aggregate function will be returned as zero when the input is empty.

For group-by aggregation, please refer to the hashGroupAggregateMPU primitive.

Parameters:

op	the aggregate operator: AOP_COUNT or AOP_COUNTNONZEROS
T	the data type of input stream, inferred from argument
in_strm	input data stream
in_e_strm	end flag stream for input data
out_strm	output data stream
out_e_strm	end flag stream for output data

bitonicSort¶

#include "xf_database/bitonic_sort.hpp"

template <
    typename Key_Type,
    int BitonicSortNumber
    >
void bitonicSort (
    hls::stream <Key_Type>& kin_strm,
    hls::stream <bool>& kin_strm_end,
    hls::stream <Key_Type>& kout_strm,
    hls::stream <bool>& kout_strm_end,
    bool order
    )

Bitonic sort is parallel algorithm for sorting.

This algorithms can sort a large vector of data in parallel, and by cascading the sorters into a network it can offer good theoretical throughput.

Although this algorithms is suitable for FPGA acceleration, it does not work well with the row-by-row streaming interface in database library. Please consider this primitive as a demo, and only use it by deriving from this code. Alternative sorting algorithms in this library are insertSort and mergeSort .

Parameters:

Key_Type	the input and output key type
BitonicSortNumber	the parallel number
kin_strm	input key stream
kin_strm_end	end flag stream for input key
kout_strm	output key stream
kout_strm_end	end flag stream for output key
order	1 for ascending or 0 for descending sort

bfGen¶

#include "xf_database/bloom_filter.hpp"

template <
    bool IS_BRAM,
    int STR_IN_W,
    int BV_W
    >
void bfGen (
    hls::stream <ap_uint <STR_IN_W>>& msg_strm,
    hls::stream <bool>& in_e_strm,
    ap_uint <IS_BRAM?16:72>* bit_vector_ptr0,
    ap_uint <IS_BRAM?16:72>* bit_vector_ptr1,
    ap_uint <IS_BRAM?16:72>* bit_vector_ptr2
    )

Generate the bloomfilter in on-chip RAM blocks.

This primitive calculates hash of input values, and marks corresponding bits in the on-chip RAM blocks. RAM blocks can be configured to be 18-bit BRAM or 72-bit URAM.

The bloom-filter bit vectors are passed as three pointers, and behind the scene, one hash value is calculated and manipulated into three distint marker locatins in these vectors.

To check for existance of a value with generated vector, use the bfCheck primitive.

Parameters:

STR_IN_W	W width of the streamed input message, e.g., W=512.
BV_W	width of the hash value. ptr0, ptr1 and ptr2 should point at MEM_SPACE=2^BV_W (bit).
msg_strm	input message stream.
in_e_strm	the flag that indicate the end of input message stream.
bit_vector_ptr0	the pointer of bit_vector0.
bit_vector_ptr1	the pointer of bit_vector1.
bit_vector_ptr2	the pointer of bit_vector2.

bfGenStream¶

#include "xf_database/bloom_filter.hpp"

template <
    bool IS_BRAM,
    int STR_IN_W,
    int BV_W
    >
void bfGenStream (
    hls::stream <ap_uint <STR_IN_W>>& msg_strm,
    hls::stream <bool>& in_e_strm,
    hls::stream <ap_uint <IS_BRAM?16:64>>& bit_vet_strm,
    hls::stream <bool>& out_e_strm
    )

Generate the bloomfilter in on-chip RAM blocks, and emit the vectors upon finish.

This primitive calculates hash values of input, and marks corresponding bits in the on-chip RAM blocks. RAM blocks can be configured to be 18-bit BRAM or 72-bit URAM.

The bloom-filter bit vectors are built into internally allocated buffers, and streamed out after the filter has been fully built.

Parameters:

STR_IN_W	W width of the streamed input message, e.g., W=512.
BV_W	width of the hash value. bit_vet_strm should send out MEM_SPACE=2^BV_W (bit) data in total.
msg_strm	input message stream.
in_e_strm	the flag that indicate the end of input message stream.
bit_vet_strm	the output stream of bit_vector.
out_e_strm	the flag that indicate the end of output stream.

bfCheck¶

#include "xf_database/bloom_filter.hpp"

template <
    bool IS_BRAM,
    int STR_IN_W,
    int BV_W
    >
void bfCheck (
    hls::stream <ap_uint <STR_IN_W>>& msg_strm,
    hls::stream <bool>& in_e_strm,
    ap_uint <IS_BRAM?16:72>* bit_vector_ptr0,
    ap_uint <IS_BRAM?16:72>* bit_vector_ptr1,
    ap_uint <IS_BRAM?16:72>* bit_vector_ptr2,
    hls::stream <bool>& out_v_strm,
    hls::stream <bool>& out_e_strm
    )

Check existance of value using bloom-filter vectors.

This primitive is designed to work with the bloom-filter vectors generated by the bfGen primitive. Basically, it detects the existance of value by hashing it and check for the corresponding vector bits. When hit, it is likely to be in the set of generating values, otherwise, it cannot be element of the set. RAM blocks can be configured to be 18-bit BRAM or 72-bit URAM, the setting must match bfGen .

Parameters:

IS_BRAM	choose which types of memory to use. True for BRAM. False for URAM
STR_IN_W	W width of the streamed input message, e.g., W=512.
BV_W	width of the hash value. ptr0, ptr1 and ptr2 should point at MEM_SPACE=2^BV_W (bit).
msg_strm	input message stream.
in_e_strm	the flag that indicate the end of input message stream.
bit_vector_ptr0	the pointer of bit_vector0.
bit_vector_ptr1	the pointer of bit_vector1.
bit_vector_ptr2	the pointer of bit_vector2.
out_v_strm	the output stream that indicate whether the value may exist <1 for true, 0 for false>.
out_e_strm	the output end flag stream.

combineCol¶

combineCol overload (1)¶

#include "xf_database/combine_split_col.hpp"

template <
    int _WCol1,
    int _WCol2,
    int _WColOut
    >
void combineCol (
    hls::stream <ap_uint <_WCol1>>& din1_strm,
    hls::stream <ap_uint <_WCol2>>& din2_strm,
    hls::stream <bool>& in_e_strm,
    hls::stream <ap_uint <_WColOut>>& dout_strm,
    hls::stream <bool>& out_e_strm
    )

Combines two columns into one.

Columns are passed through streams of certain width in hardware. Normally, each column uses one stream, but for some primitives, the processing semantic abstract the columns into a couple of groups, and trait each group as a whole. To make calling such primitives easier, the combine column primitive fuses data of same row but different columns into one wide column.

The counter part of this primitive is splitCol .

Parameters:

_WCol1	the width of 1st input stream.
_WCol2	the width of 2nd input stream.
_WColOut	the width of output stream.
din1_strm	1st input data stream.
din2_strm	2nd input data stream.
in_e_strm	end flag stream for input data.
dout_strm	output data stream.
out_e_strm	end flag stream for output data.

combineCol overload (2)¶

#include "xf_database/combine_split_col.hpp"

template <
    int _WCol1,
    int _WCol2,
    int _WCol3,
    int _WColOut
    >
void combineCol (
    hls::stream <ap_uint <_WCol1>>& din1_strm,
    hls::stream <ap_uint <_WCol2>>& din2_strm,
    hls::stream <ap_uint <_WCol3>>& din3_strm,
    hls::stream <bool>& in_e_strm,
    hls::stream <ap_uint <_WColOut>>& dout_strm,
    hls::stream <bool>& out_e_strm
    )

Combines three columns into one.

Columns are passed through streams of certain width in hardware. Normally, each column uses one stream, but for some primitives, the processing semantic abstract the columns into a couple of groups, and trait each group as a whole. To make calling such primitives easier, the combine column primitive fuses data of same row but different columns into one wide column.

The counter part of this primitive is splitCol .

Parameters:

_WCol1	the width of 1st input stream.
_WCol2	the width of 2nd input stream.
_WCol3	the width of 3rd input stream.
_WColOut	the width of output stream.
din1_strm	1st input data stream.
din2_strm	2nd input data stream.
din3_strm	3rd input data stream.
in_e_strm	end flag stream for input data.
dout_strm	output data stream.
out_e_strm	end flag stream for output data.

combineCol overload (3)¶

#include "xf_database/combine_split_col.hpp"

template <
    int _WCol1,
    int _WCol2,
    int _WCol3,
    int _WCol4,
    int _WColOut
    >
void combineCol (
    hls::stream <ap_uint <_WCol1>>& din1_strm,
    hls::stream <ap_uint <_WCol2>>& din2_strm,
    hls::stream <ap_uint <_WCol3>>& din3_strm,
    hls::stream <ap_uint <_WCol4>>& din4_strm,
    hls::stream <bool>& in_e_strm,
    hls::stream <ap_uint <_WColOut>>& dout_strm,
    hls::stream <bool>& out_e_strm
    )

Combines four columns into one.

Columns are passed through streams of certain width in hardware. Normally, each column uses one stream, but for some primitives, the processing semantic abstract the columns into a couple of groups, and trait each group as a whole. To make calling such primitives easier, the combine column primitive fuses data of same row but different columns into one wide column.

The counter part of this primitive is splitCol .

Parameters:

_WCol1	the width of 1st input stream.
_WCol2	the width of 2nd input stream.
_WCol3	the width of 3rd input stream.
_WCol4	the width of 4th input stream.
_WColOut	the width of output stream.
din1_strm	1st input data stream.
din2_strm	2nd input data stream.
din3_strm	3rd input data stream.
din4_strm	4th input data stream.
in_e_strm	end flag stream for input data.
dout_strm	output data stream.
out_e_strm	end flag stream for output data.

combineCol overload (4)¶

#include "xf_database/combine_split_col.hpp"

template <
    int _WCol1,
    int _WCol2,
    int _WCol3,
    int _WCol4,
    int _WCol5,
    int _WColOut
    >
void combineCol (
    hls::stream <ap_uint <_WCol1>>& din1_strm,
    hls::stream <ap_uint <_WCol2>>& din2_strm,
    hls::stream <ap_uint <_WCol3>>& din3_strm,
    hls::stream <ap_uint <_WCol4>>& din4_strm,
    hls::stream <ap_uint <_WCol5>>& din5_strm,
    hls::stream <bool>& in_e_strm,
    hls::stream <ap_uint <_WColOut>>& dout_strm,
    hls::stream <bool>& out_e_strm
    )

Combines five columns into one.

Columns are passed through streams of certain width in hardware. Normally, each column uses one stream, but for some primitives, the processing semantic abstract the columns into a couple of groups, and trait each group as a whole. To make calling such primitives easier, the combine column primitive fuses data of same row but different columns into one wide column.

The counter part of this primitive is splitCol .

Parameters:

_WCol1	the width of 1st input stream.
_WCol2	the width of 2nd input stream.
_WCol3	the width of 3rd input stream.
_WCol4	the width of 4th input stream.
_WCol5	the width of 5th input stream.
_WColOut	the width of output stream.
din1_strm	1st input data stream.
din2_strm	2nd input data stream.
din3_strm	3rd input data stream.
din4_strm	4th input data stream.
din5_strm	5th input data stream.
in_e_strm	end flag stream for input data.
dout_strm	output data stream.
out_e_strm	end flag stream for output data.

splitCol¶

splitCol overload (1)¶

#include "xf_database/combine_split_col.hpp"

template <
    int _WColIn,
    int _WCol1,
    int _WCol2
    >
void splitCol (
    hls::stream <ap_uint <_WColIn>>& din_strm,
    hls::stream <bool>& in_e_strm,
    hls::stream <ap_uint <_WCol1>>& dout1_strm,
    hls::stream <ap_uint <_WCol2>>& dout2_strm,
    hls::stream <bool>& out_e_strm
    )

Split previously combined columns into two.

Columns are passed through streams of certain width in hardware. Normally, each column uses one stream, but for some primitives, the processing semantic abstract the columns into a couple of groups, and trait each group as a whole. To make calling such primitives easier, the split column primitive breaks the wide output stream into independent column-specific streams.

The counter part of this primitive is combineCol .

Parameters:

_WColIn	the width of input stream.
_WCol1	the width of 1st output stream.
_WCol2	the width of 2nd output stream.
din_strm	input data stream.
in_e_strm	end flag stream for input data.
dout1_strm	1st output data stream.
dout2_strm	2nd output data stream.
out_e_strm	end flag stream for output data.

splitCol overload (2)¶

#include "xf_database/combine_split_col.hpp"

template <
    int _WColIn,
    int _WCol1,
    int _WCol2,
    int _WCol3
    >
void splitCol (
    hls::stream <ap_uint <_WColIn>>& din_strm,
    hls::stream <bool>& in_e_strm,
    hls::stream <ap_uint <_WCol1>>& dout1_strm,
    hls::stream <ap_uint <_WCol2>>& dout2_strm,
    hls::stream <ap_uint <_WCol3>>& dout3_strm,
    hls::stream <bool>& out_e_strm
    )

Split previously combined columns into three.

Columns are passed through streams of certain width in hardware. Normally, each column uses one stream, but for some primitives, the processing semantic abstract the columns into a couple of groups, and trait each group as a whole. To make calling such primitives easier, the split column primitive breaks the wide output stream into independent column-specific streams.

The counter part of this primitive is combineCol .

Parameters:

_WColIn	the width of input stream.
_WCol1	the width of 1st output stream.
_WCol2	the width of 2nd output stream.
_WCol3	the width of 3rd output stream.
din_strm	input data stream
in_e_strm	end flag stream for input data
dout1_strm	1st output data stream
dout2_strm	2nd output data stream
dout3_strm	3rd output data stream
out_e_strm	end flag stream for output data

splitCol overload (3)¶

#include "xf_database/combine_split_col.hpp"

template <
    int _WColIn,
    int _WCol1,
    int _WCol2,
    int _WCol3,
    int _WCol4
    >
void splitCol (
    hls::stream <ap_uint <_WColIn>>& din_strm,
    hls::stream <bool>& in_e_strm,
    hls::stream <ap_uint <_WCol1>>& dout1_strm,
    hls::stream <ap_uint <_WCol2>>& dout2_strm,
    hls::stream <ap_uint <_WCol3>>& dout3_strm,
    hls::stream <ap_uint <_WCol4>>& dout4_strm,
    hls::stream <bool>& out_e_strm
    )

Split previously combined columns into four.

Columns are passed through streams of certain width in hardware. Normally, each column uses one stream, but for some primitives, the processing semantic abstract the columns into a couple of groups, and trait each group as a whole. To make calling such primitives easier, the split column primitive breaks the wide output stream into independent column-specific streams.

The counter part of this primitive is combineCol .

Parameters:

_WColIn	the width of input stream.
_WCol1	the width of 1st output stream.
_WCol2	the width of 2nd output stream.
_WCol3	the width of 3rd output stream.
_WCol4	the width of 4th output stream.
din_strm	input data stream
in_e_strm	end flag stream for input data
dout1_strm	1st output data stream
dout2_strm	2nd output data stream
dout3_strm	3rd output data stream
dout4_strm	4th output data stream
out_e_strm	end flag stream for output data

splitCol overload (4)¶

#include "xf_database/combine_split_col.hpp"

template <
    int _WColIn,
    int _WCol1,
    int _WCol2,
    int _WCol3,
    int _WCol4,
    int _WCol5
    >
void splitCol (
    hls::stream <ap_uint <_WColIn>>& din_strm,
    hls::stream <bool>& in_e_strm,
    hls::stream <ap_uint <_WCol1>>& dout1_strm,
    hls::stream <ap_uint <_WCol2>>& dout2_strm,
    hls::stream <ap_uint <_WCol3>>& dout3_strm,
    hls::stream <ap_uint <_WCol4>>& dout4_strm,
    hls::stream <ap_uint <_WCol5>>& dout5_strm,
    hls::stream <bool>& out_e_strm
    )

Split previously combined columns into five.

Columns are passed through streams of certain width in hardware. Normally, each column uses one stream, but for some primitives, the processing semantic abstract the columns into a couple of groups, and trait each group as a whole. To make calling such primitives easier, the split column primitive breaks the wide output stream into independent column-specific streams.

The counter part of this primitive is combineCol .

Parameters:

_WColIn	the width of input stream.
_WCol1	the width of 1st output stream.
_WCol2	the width of 2nd output stream.
_WCol3	the width of 3rd output stream.
_WCol4	the width of 4th output stream.
_WCol5	the width of 5th output stream.
din_strm	input data stream
in_e_strm	end flag stream for input data
dout1_strm	1st output data stream
dout2_strm	2nd output data stream
dout3_strm	3rd output data stream
dout4_strm	4th output data stream
dout5_strm	5th output data stream
out_e_strm	end flag stream for output data

compoundSort¶

#include "xf_database/compound_sort.hpp"

template <
    typename KEY_TYPE,
    int SORT_LEN,
    int INSERT_LEN
    >
void compoundSort (
    bool order,
    hls::stream <KEY_TYPE>& inKeyStrm,
    hls::stream <bool>& inEndStrm,
    hls::stream <KEY_TYPE>& outKeyStrm,
    hls::stream <bool>& outEndStrm
    )

compoundSort sort the key based on insert sort and merge sort.

Parameters:

KEY_TYPE	key type
SORT_LEN	Maximum support sort length, between 16K to 2M, but it must be an integer power of 2.
INSERT_LEN	insert sort length, maximum length 1024 (recommend)
order	1:sort ascending 0:sort descending
inKeyStrm	input key stream
inEndStrm	end flag stream for input key
outKeyStrm	output key-sorted stream
outEndStrm	end flag stream for output key

directGroupAggregate¶

directGroupAggregate overload (1)¶

#include "xf_database/direct_group_aggregate.hpp"

template <
    int op,
    int DATINW,
    int DATOUTW,
    int DIRECTW
    >
void directGroupAggregate (
    hls::stream <ap_uint <DATINW>>& vin_strm,
    hls::stream <bool>& in_e_strm,
    hls::stream <ap_uint <DATOUTW>>& vout_strm,
    hls::stream <bool>& out_e_strm,
    hls::stream <ap_uint <DIRECTW>>& kin_strm,
    hls::stream <ap_uint <DIRECTW>>& kout_strm
    )

Group-by aggregation with limited key width.

This primitive is suitable for scenario in which the width of group key is limited, so that a on-chip array directly addressed by the key can be created to store the aggregation value. The total storage required is row size * (2 ^ key width) .

The following aggregate operators are supported:

AOP_MAX
AOP_MIN
AOP_SUM
AOP_COUNT
AOP_MEAN
AOP_VARIANCE
AOP_NORML1
AOP_NORML2

The return value is typed the same as the input payload value.

Caution

Attention should be paid for overflow in sum or count.

Parameters:

op	the aggregate operator, as defined in AggregateOp enum.
DATINW	the width of input payload
DATOUTW	the width of output aggr-payload
DIRECTW	the width of input and output key
vin_strm	value input
in_e_strm	end flag stream for input data
vout_strm	value output
out_e_strm	end flag stream for output data
kin_strm	group-by key input
kout_strm	group-by key output

directGroupAggregate overload (2)¶

#include "xf_database/direct_group_aggregate.hpp"

template <
    int DATINW,
    int DATOUTW,
    int DIRECTW
    >
void directGroupAggregate (
    ap_uint <32> op,
    hls::stream <ap_uint <DATINW>>& vin_strm,
    hls::stream <bool>& in_e_strm,
    hls::stream <ap_uint <DATOUTW>>& vout_strm,
    hls::stream <bool>& out_e_strm,
    hls::stream <ap_uint <DIRECTW>>& kin_strm,
    hls::stream <ap_uint <DIRECTW>>& kout_strm
    )

Group-by aggregation with limited key width, runtime programmable.

This primitive is suitable for scenario in which the width of group key is limited, so that a on-chip array directly addressed by the key can be created to store the aggregation value. The total storage required is row size * (2 ^ key width) .

The following aggregate operators are supported:

AOP_MAX
AOP_MIN
AOP_SUM
AOP_COUNT
AOP_MEAN
AOP_NORM1

The return value is typed the same as the input payload value.

Caution

Attention should be paid for overflow in sum or count.

Parameters:

DATINW	the width of input payload
DATOUTW	the width of output aggr-payload
DIRECTW	the width of input and output key
op	the aggregate operator, as defined in AggregateOp enum.
vin_strm	value input
in_e_strm	end flag stream for input data
vout_strm	value output
out_e_strm	end flag stream for output data
kin_strm	group-by key input
kout_strm	group-by key output

duplicateCol¶

#include "xf_database/duplicate_col.hpp"

template <int W>
void duplicateCol (
    hls::stream <ap_uint <W>>& d_in_strm,
    hls::stream <bool>& e_in_strm,
    hls::stream <ap_uint <W>>& d0_out_strm,
    hls::stream <ap_uint <W>>& d1_out_strm,
    hls::stream <bool>& e_out_strm
    )

Duplicate one column into two columns.

Parameters:

W	column data width in bits.
d_in_strm	input data stream.
e_in_strm	end flag for input data.
d0_out_strm	output data stream 0.
d1_out_strm	output data stream 1.
e_out_strm	end flag for output data.

dynamicEval¶

#include "xf_database/dynamic_eval.hpp"

template <
    typename TStrm1,
    typename TStrm2,
    typename TStrm3,
    typename TStrm4,
    typename TConst1,
    typename TConst2,
    typename TConst3,
    typename TConst4,
    typename TOut
    >
void dynamicEval (
    ap_uint <289> config,
    hls::stream <TStrm1>& strm_in1,
    hls::stream <TStrm2>& strm_in2,
    hls::stream <TStrm3>& strm_in3,
    hls::stream <TStrm4>& strm_in4,
    hls::stream <bool>& strm_in_end,
    hls::stream <TOut>& strm_out,
    hls::stream <bool>& strm_out_end
    )

Dynamic expression evaluation.

This primitive has four fixed number of column inputs, and allows up to four constants to be specified via configuration. The operation between the column values and constants can be defined dynamically through the configuration at run-time. The same configuration is used for all rows until the end of input.

The constant numbers are assumed to be no more than 32-bits.

For the definition of the config word, please refer to the “Design Internal” Section of the document and the corresponding test in L1/tests .

Parameters:

TStrm1	Type of input Stream1
TStrm2	Type of input Stream2
TStrm3	Type of input Stream3
TStrm4	Type of input Stream4
TConst1	Type of input Constant1
TConst2	Type of input Constant2
TConst3	Type of input Constant3
TConst4	Type of input Constant4
TOut	Type of Compute Result
config	configuration bits of ops and constants.
strm_in1	input Stream1
strm_in2	input Stream2
strm_in3	input Stream3
strm_in4	input Stream4
strm_in_end	end flag of input stream
strm_out	output Stream
strm_out_end	end flag of output stream

dynamicEvalV2¶

#include "xf_database/dynamic_eval_v2.hpp"

template <typename T>
void dynamicEvalV2 (
    hls::stream <ap_uint <32>>& cfgs,
    hls::stream <T>& col0_istrm,
    hls::stream <T>& col1_istrm,
    hls::stream <T>& col2_istrm,
    hls::stream <T>& col3_istrm,
    hls::stream <bool>& e_istrm,
    hls::stream <T>& ret_ostrm,
    hls::stream <bool>& e_ostrm
    )

Dynamic expression evaluation version 2.

This primitive has four fixed number of column inputs, and allows up to four constants to be specified via configuration. The operation between the column values and constants can be defined dynamically through the configuration at run-time. The same configuration is used for all rows until the end of input.

The constant numbers are assumed to be no more than 32-bits.

Parameters:

T	Type of input streams
cfgs	configuration bits of ops and constants.
col0_istrm	input Stream1
col1_istrm	input Stream2
col2_istrm	input Stream3
col3_istrm	input Stream4
e_istrm	end flag of input stream
ret_ostrm	output Stream
e_ostrm	end flag of output stream

dynamicFilter¶

dynamicFilter overload (1)¶

#include "xf_database/dynamic_filter.hpp"

template <
    int W,
    int WP
    >
void dynamicFilter (
    hls::stream <ap_uint <32>>& filter_cfg_strm,
    hls::stream <ap_uint <W>>& v0_strm,
    hls::stream <ap_uint <W>>& v1_strm,
    hls::stream <ap_uint <W>>& v2_strm,
    hls::stream <ap_uint <W>>& v3_strm,
    hls::stream <ap_uint <WP>>& pay_in_strm,
    hls::stream <bool>& e_in_strm,
    hls::stream <ap_uint <WP>>& pay_out_strm,
    hls::stream <bool>& e_pay_out_strm
    )

Filter payloads according to conditions set during run-time.

This primitive, with its 3 overloads, supports filtering rows using up to four columns as conditions. The payload columns should be grouped together into this primitive, using combineCol primitive, and its total width is not explicitly limited (but naturally bound by resources).

The filter conditions consists of whether each of the conditions is within a given range, and relations between any two conditions. The configuration is set once before processing the rows, and reused until the last row. For configuration generation, please refer to the “Design Internals” Section of the document and corresponding test case of this primitive.

Parameters:

W	width of all condition column streams, in bits.
WP	width of payload column, in bits.
filter_cfg_strm	stream of raw config bits for this primitive.
v0_strm	condition column stream 0.
v1_strm	condition column stream 1.
v2_strm	condition column stream 2.
v3_strm	condition column stream 3.
pay_in_strm	payload input stream.
e_in_strm	end flag stream for input table.
pay_out_strm	payload output stream.
e_pay_out_strm	end flag stream for payload output.

dynamicFilter overload (2)¶

#include "xf_database/dynamic_filter.hpp"

template <
    int W,
    int WP
    >
void dynamicFilter (
    hls::stream <ap_uint <32>>& filter_cfg_strm,
    hls::stream <ap_uint <W>>& v0_strm,
    hls::stream <ap_uint <W>>& v1_strm,
    hls::stream <ap_uint <W>>& v2_strm,
    hls::stream <ap_uint <WP>>& pay_in_strm,
    hls::stream <bool>& e_in_strm,
    hls::stream <ap_uint <WP>>& pay_out_strm,
    hls::stream <bool>& e_pay_out_strm
    )

Filter payloads according to conditions set during run-time.

This function is a wrapper-around the four-condition-column dynamic_filter, just duplicating the columns to feed all its inputs. Thus they share the same configuration bit pattern. All op related to the 4th column should be set to FOP_DC .

Parameters:

W	width of all condition column streams, in bits.
WP	width of payload column, in bits.
filter_cfg_strm	stream of raw config bits for this primitive.
v0_strm	condition column stream 0.
v1_strm	condition column stream 1.
v2_strm	condition column stream 2.
pay_in_strm	payload input stream.
e_in_strm	end flag stream for input table.
pay_out_strm	payload output stream.
e_pay_out_strm	end flag stream for payload output.

dynamicFilter overload (3)¶

#include "xf_database/dynamic_filter.hpp"

template <
    int W,
    int WP
    >
void dynamicFilter (
    hls::stream <ap_uint <32>>& filter_cfg_strm,
    hls::stream <ap_uint <W>>& v0_strm,
    hls::stream <ap_uint <W>>& v1_strm,
    hls::stream <ap_uint <WP>>& pay_in_strm,
    hls::stream <bool>& e_in_strm,
    hls::stream <ap_uint <WP>>& pay_out_strm,
    hls::stream <bool>& e_pay_out_strm
    )

Filter payloads according to conditions set during run-time.

This function is a wrapper-around the four-condition-column dynamic_filter, just duplicating the columns to feed all its inputs. Thus they share the same configuration bit pattern. All op related to the 3rd and 4th columns should be set to FOP_DC .

Parameters:

W	width of all condition column streams, in bits.
WP	width of payload column, in bits.
filter_cfg_strm	stream of raw config bits for this primitive.
v0_strm	condition column stream 0.
v1_strm	condition column stream 1.
pay_in_strm	payload input stream.
e_in_strm	end flag stream for input table.
pay_out_strm	payload output stream.
e_pay_out_strm	end flag stream for payload output.

dynamicFilter overload (4)¶

#include "xf_database/dynamic_filter.hpp"

template <
    int W,
    int WP
    >
void dynamicFilter (
    hls::stream <ap_uint <32>>& filter_cfg_strm,
    hls::stream <ap_uint <W>>& v0_strm,
    hls::stream <ap_uint <WP>>& pay_in_strm,
    hls::stream <bool>& e_in_strm,
    hls::stream <ap_uint <WP>>& pay_out_strm,
    hls::stream <bool>& e_pay_out_strm
    )

Filter payloads according to conditions set during run-time.

This function is a wrapper-around the four-condition-column dynamic_filter, just duplicating the columns to feed all its inputs. Thus they share the same configuration bit pattern. All op related to the 2nd to 4th columns should be set to FOP_DC .

Parameters:

W	width of all condition column streams, in bits.
WP	width of payload column, in bits.
filter_cfg_strm	stream of raw config bits for this primitive.
v0_strm	condition column stream 0.
pay_in_strm	payload input stream.
e_in_strm	end flag stream for input table.
pay_out_strm	payload output stream.
e_pay_out_strm	end flag stream for payload output.

groupAggregate¶

groupAggregate overload (1)¶

#include "xf_database/group_aggregate.hpp"

template <
    AggregateOp op,
    typename T,
    typename KEY_T
    >
void groupAggregate (
    hls::stream <T>& din_strm,
    hls::stream <bool>& in_e_strm,
    hls::stream <T>& dout_strm,
    hls::stream <bool>& out_e_strm,
    hls::stream <KEY_T>& kin_strm,
    hls::stream <KEY_T>& kout_strm
    )

group aggregate function that returns same type as input

Parameters:

op	the aggregate operator: AOP_MAX, AOP_MIN, AOP_MEAN, AOP_VARIANCE, AOP_NORML1 or AOP_NORML2
T	the data type of input and output streams
KEY_T	the input and output indexing key type
din_strm	input data stream
in_e_strm	end flag stream for input data
dout_strm	output data stream
out_e_strm	end flag stream for output data
kin_strm	input indexing key stream
kout_strm	output indexing key stream

groupAggregate overload (2)¶

#include "xf_database/group_aggregate.hpp"

template <
    AggregateOp op,
    typename T,
    typename T2,
    typename KEY_T
    >
void groupAggregate (
    hls::stream <T>& in_strm,
    hls::stream <bool>& in_e_strm,
    hls::stream <T2>& out_strm,
    hls::stream <bool>& out_e_strm,
    hls::stream <KEY_T>& kin_strm,
    hls::stream <KEY_T>& kout_strm
    )

group aggregate function that returns different type as input

Parameters:

op	the aggregate operator: AOP_SUM
T	the input stream type, inferred from argument
T2	the output stream type, inferred from argument
KEY_T	the input and output stream type, inferred from argument
in_strm	input data stream
in_e_strm	end flag stream for input data
out_strm	output data stream
out_e_strm	end flag stream for output data
kin_strm	input indexing key stream
kout_strm	output indexing key stream

groupAggregate overload (3)¶

#include "xf_database/group_aggregate.hpp"

template <
    AggregateOp op,
    typename T,
    typename KEY_T
    >
void groupAggregate (
    hls::stream <T>& in_strm,
    hls::stream <bool>& in_e_strm,
    hls::stream <uint64_t>& out_strm,
    hls::stream <bool>& out_e_strm,
    hls::stream <KEY_T>& kin_strm,
    hls::stream <KEY_T>& kout_strm
    )

aggregate function that counts and returns uint64_t

Parameters:

op	the aggregate operator: AOP_COUNT or AOP_COUNTNONZEROS
T	the input stream type, inferred from argument
KEY_T	the input and output stream type, inferred from argument
in_strm	input data stream
in_e_strm	end flag stream for input data
out_strm	output data stream
out_e_strm	end flag stream for output data
kin_strm	input indexing key stream
kout_strm	output indexing key stream

groupAggregate overload (4)¶

#include "xf_database/group_aggregate.hpp"

template <
    AggregateOp op,
    typename T,
    typename KEY_T
    >
void groupAggregate (
    hls::stream <T>& in_strm,
    hls::stream <bool>& isnull_strm,
    hls::stream <bool>& in_e_strm,
    hls::stream <uint64_t>& out_strm,
    hls::stream <bool>& out_e_strm,
    hls::stream <KEY_T>& kin_strm,
    hls::stream <KEY_T>& kout_strm
    )

aggregate function that counts and returns uint64_t

Parameters:

op	the aggregate operator: AOP_COUNT
T	the input stream type, inferred from argument
KEY_T	the input and output stream type, inferred from argument
in_strm	input data stream
isnull_strm	flag to indicate the input data is null or not
in_e_strm	end flag stream for input data
out_strm	output data stream
out_e_strm	end flag stream for output data
kin_strm	input indexing key stream
kout_strm	output indexing key stream

hashAntiJoin¶

#include "xf_database/hash_anti_join.hpp"

template <
    int HASH_MODE,
    int KEYW,
    int PW,
    int S_PW,
    int B_PW,
    int HASHWH,
    int HASHWL,
    int ARW,
    int CH_NM
    >
void hashAntiJoin (
    hls::stream <ap_uint <KEYW>> k0_strm_arry [CH_NM],
    hls::stream <ap_uint <PW>> p0_strm_arry [CH_NM],
    hls::stream <bool> e0_strm_arry [CH_NM],
    ap_uint <256>* htb0_buf,
    ap_uint <256>* htb1_buf,
    ap_uint <256>* htb2_buf,
    ap_uint <256>* htb3_buf,
    ap_uint <256>* htb4_buf,
    ap_uint <256>* htb5_buf,
    ap_uint <256>* htb6_buf,
    ap_uint <256>* htb7_buf,
    ap_uint <256>* stb0_buf,
    ap_uint <256>* stb1_buf,
    ap_uint <256>* stb2_buf,
    ap_uint <256>* stb3_buf,
    ap_uint <256>* stb4_buf,
    ap_uint <256>* stb5_buf,
    ap_uint <256>* stb6_buf,
    ap_uint <256>* stb7_buf,
    hls::stream <ap_uint <32>>& pu_begin_status_strms,
    hls::stream <ap_uint <32>>& pu_end_status_strms,
    hls::stream <ap_uint <KEYW+S_PW+B_PW>>& j_strm,
    hls::stream <bool>& j_e_strm
    )

Multi-PU Hash-Anti-Join primitive, using multiple DDR/HBM buffers.

This primitive shares most of the structure of hashJoinV3 , but performs anti-join instead of inner-join. Both inner and outer table should be send to this primitve once, starting with the inner table.

Parameters:

HASH_MODE	0 for radix and 1 for Jenkin’s Lookup3 hash.
KEYW	width of key, in bit.
PW	width of max payload, in bit.
S_PW	width of payload of small table.
B_PW	width of payload of big table.
HASHWH	number of hash bits used for PU/buffer selection, 1~3.
HASHWL	number of hash bits used for hash-table in PU.
ARW	width of address, larger than 24 is suggested.
CH_NM	number of input channels, 1,2,4.
k0_strm_arry	input of key columns of both tables.
p0_strm_arry	input of payload columns of both tables.
e0_strm_arry	input of end signal of both tables.
htb0_buf	HBM/DDR buffer of hash_table0
htb1_buf	HBM/DDR buffer of hash_table1
htb2_buf	HBM/DDR buffer of hash_table2
htb3_buf	HBM/DDR buffer of hash_table3
htb4_buf	HBM/DDR buffer of hash_table4
htb5_buf	HBM/DDR buffer of hash_table5
htb6_buf	HBM/DDR buffer of hash_table6
htb7_buf	HBM/DDR buffer of hash_table7
stb0_buf	HBM/DDR buffer of PU0
stb1_buf	HBM/DDR buffer of PU1
stb2_buf	HBM/DDR buffer of PU2
stb3_buf	HBM/DDR buffer of PU3
stb4_buf	HBM/DDR buffer of PU4
stb5_buf	HBM/DDR buffer of PU5
stb6_buf	HBM/DDR buffer of PU6
stb7_buf	HBM/DDR buffer of PU7
pu_begin_status_strms	the 1st element is the depth of each hash, the 2nd element is joined number
pu_end_status_strms
j_strm	output of joined result
j_e_strm	end flag of joined result

hashGroupAggregate¶

#include "xf_database/hash_group_aggregate.hpp"

template <
    int _WKey,
    int _KeyNM,
    int _WPay,
    int _PayNM,
    int _HashMode,
    int _WHashHigh,
    int _WHashLow,
    int _CHNM,
    int _Wcnt,
    int _WBuffer,
    int _BurstLenW = 32,
    int _BurstLenR = 32
    >
void hashGroupAggregate (
    hls::stream <ap_uint <_WKey>> strm_key_in [_CHNM][_KeyNM],
    hls::stream <ap_uint <_WPay>> strm_pld_in [_CHNM][_PayNM],
    hls::stream <bool> strm_e_in [_CHNM],
    hls::stream <ap_uint <32>>& config,
    hls::stream <ap_uint <32>>& result_info,
    ap_uint <_WBuffer>* ping_buf0,
    ap_uint <_WBuffer>* ping_buf1,
    ap_uint <_WBuffer>* ping_buf2,
    ap_uint <_WBuffer>* ping_buf3,
    ap_uint <_WBuffer>* pong_buf0,
    ap_uint <_WBuffer>* pong_buf1,
    ap_uint <_WBuffer>* pong_buf2,
    ap_uint <_WBuffer>* pong_buf3,
    hls::stream <ap_uint <_WKey>> aggr_key_out [_KeyNM],
    hls::stream <ap_uint <_WPay>> aggr_pld_out [3][_PayNM],
    hls::stream <bool>& strm_e_out
    )

Generic hash group aggregate primitive.

With this primitive, the max number of lines of aggregate table is bound by the AXI buffer size.

The group aggregation values are updated inside the chip, and when a hash-bucket overflows, the overflowed rows are spilled into external buffers. The overflow buffer will be automatically re-scanned, and within each round, a number of distinct groups will be aggregated and emitted. This algorithm ends when the overflow buffer is empty and all groups are aggregated.

Attention

This module can accept multiple input row of key and payload pair per cycle.
The max distinct groups aggregated in one pass is 2 ^ (1 + _WHash).
When the width of the input stream is not fully used, data should be aligned to the little-end.
It is highly recommended to assign the ping buffer and pong buffer in different HBM banks, input and output in different DDR banks for a better performance.
The max number of lines of aggregate table cannot bigger than the max DDR/HBM SIZE used in this design.
When the bit-width of group key is known to be small, say 10-bit, please consider the directAggregate primitive, which offers smaller utilization, and requires no external buffer access.

Parameters:

_WKey	width of key, in bit.
_KeyNM	maximum number of key column, maximum is 8.
_WPay	width of max payload, in bit.
_PayNM	maximum number of payload column, maximum is 8.
_HashMode	control hash algotithm, 0: radix 1: lookup3.
_WHashHigh	number of hash bits used for dispatch pu.
_WHashLow	number of hash bits used for hash-table.
_CHNM	number of input channels.
_WBuffer	width of HBM/DDR buffer(ping_buf and pong_buf).
_BurstLenW	burst len of writting unhandled data.
_BurstLenR	burst len of reloading unhandled data.
strm_key_in	input of key streams.
strm_pld_in	input of payload streams.
strm_e_in	input of end signal.
config	information for initializing primitive, contains op for maximum of 8 columns, key column number(less than 8), pld column number(less than 8) and initial aggregate cnt.
result_info	result information at kernel end, contains op, key_column, pld_column and aggregate result cnt
ping_buf0	DDR/HBM ping buffer for unhandled data.
ping_buf1	DDR/HBM ping buffer for unhandled data.
ping_buf2	DDR/HBM ping buffer for unhandled data.
ping_buf3	DDR/HBM ping buffer for unhandled data.
pong_buf0	DDR/HBM pong buffer for unhandled data.
pong_buf1	DDR/HBM pong buffer for unhandled data.
pong_buf2	DDR/HBM pong buffer for unhandled data.
pong_buf3	DDR/HBM pong buffer for unhandled data.
aggr_key_out	output of key columns.
aggr_pld_out	output of pld columns. [0][] is the result of min/max/cnt for pld columns, [1][] is the low-bit value of sum/average, [2][*] is the hight-bit value of sum/average.
strm_e_out	is the end signal of output.

hashJoinMPU¶

hashJoinMPU overload (1)¶

#include "xf_database/hash_join_v2.hpp"

template <
    int HASH_MODE,
    int KEYW,
    int PW,
    int S_PW,
    int B_PW,
    int HASHWH,
    int HASHWL,
    int ARW,
    int BFW,
    int CH_NM,
    int BF_W,
    int EN_BF
    >
static void hashJoinMPU (
    hls::stream <ap_uint <KEYW>> k0_strm_arry [CH_NM],
    hls::stream <ap_uint <PW>> p0_strm_arry [CH_NM],
    hls::stream <bool> e0_strm_arry [CH_NM],
    ap_uint <BFW>* stb0_buf,
    ap_uint <BFW>* stb1_buf,
    ap_uint <BFW>* stb2_buf,
    ap_uint <BFW>* stb3_buf,
    ap_uint <BFW>* stb4_buf,
    ap_uint <BFW>* stb5_buf,
    ap_uint <BFW>* stb6_buf,
    ap_uint <BFW>* stb7_buf,
    hls::stream <ap_uint <S_PW+B_PW>>& j1_strm,
    hls::stream <bool>& e5_strm
    )

Multi-PU Hash-Join primitive, using multiple DDR/HBM buffers.

The max number of lines of small table is 2M in this design. It is assumed that the hash-conflict is within 512 per bin.

This module can accept more than 1 input row per cycle, via multiple input channels. The small table and the big table shares the same input ports, so the width of the payload should be the max of both, while the data should be aligned to the little-end. The small table should be fed TWICE, followed by the big table once.

Parameters:

HASH_MODE	0 for radix and 1 for Jenkin’s Lookup3 hash.
KEYW	width of key, in bit.
PW	width of max payload, in bit.
S_PW	width of payload of small table.
B_PW	width of payload of big table.
HASHWH	number of hash bits used for PU/buffer selection, 1~3.
HASHWL	number of hash bits used for hash-table in PU.
ARW	width of address, log2(small table max num of rows).
BFW	width of buffer.
CH_NM	number of input channels, 1,2,4.
BF_W	bloom-filter hash width.
EN_BF	bloom-filter switch, 0 for off, 1 for on.
k0_strm_arry	input of key columns of both tables.
p0_strm_arry	input of payload columns of both tables.
e0_strm_arry	input of end signal of both tables.
stb0_buf	HBM/DDR buffer of PU0
stb1_buf	HBM/DDR buffer of PU1
stb2_buf	HBM/DDR buffer of PU2
stb3_buf	HBM/DDR buffer of PU3
stb4_buf	HBM/DDR buffer of PU4
stb5_buf	HBM/DDR buffer of PU5
stb6_buf	HBM/DDR buffer of PU6
stb7_buf	HBM/DDR buffer of PU7
j1_strm	output of joined rows.
e5_strm	end signal of joined rows.

hashJoinMPU overload (2)¶

#include "xf_database/hash_join_v2.hpp"

template <
    int HASH_MODE,
    int KEYW,
    int PW,
    int S_PW,
    int B_PW,
    int HASHWH,
    int HASHWL,
    int ARW,
    int BFW,
    int CH_NM
    >
void hashJoinMPU (
    hls::stream <ap_uint <KEYW>> k0_strm_arry [CH_NM],
    hls::stream <ap_uint <PW>> p0_strm_arry [CH_NM],
    hls::stream <bool> e0_strm_arry [CH_NM],
    ap_uint <BFW>* stb0_buf,
    ap_uint <BFW>* stb1_buf,
    ap_uint <BFW>* stb2_buf,
    ap_uint <BFW>* stb3_buf,
    ap_uint <BFW>* stb4_buf,
    ap_uint <BFW>* stb5_buf,
    ap_uint <BFW>* stb6_buf,
    ap_uint <BFW>* stb7_buf,
    hls::stream <ap_uint <S_PW+B_PW>>& j1_strm,
    hls::stream <bool>& e5_strm
    )

Multi-PU Hash-Join primitive, using multiple DDR/HBM buffers.

The max number of lines of small table is 8M in this design. It is assumed that the hash-conflict is within 512 per bin.

This module can accept more than 1 input row per cycle, via multiple input channels. The small table and the big table shares the same input ports, so the width of the payload should be the max of both, while the data should be aligned to the little-end. The small table should be fed TWICE, followed by the big table once.

Parameters:

HASH_MODE	0 for radix and 1 for Jenkin’s Lookup3 hash.
KEYW	width of key, in bit.
PW	width of max payload, in bit.
S_PW	width of payload of small table.
B_PW	width of payload of big table.
HASHWH	number of hash bits used for PU/buffer selection, 1~3.
HASHWL	number of hash bits used for hash-table in PU.
ARW	width of address, log2(small table max num of rows).
BFW	width of buffer.
CH_NM	number of input channels, 1,2,4.
k0_strm_arry	input of key columns of both tables.
p0_strm_arry	input of payload columns of both tables.
e0_strm_arry	input of end signal of both tables.
stb0_buf	HBM/DDR buffer of PU0
stb1_buf	HBM/DDR buffer of PU1
stb2_buf	HBM/DDR buffer of PU2
stb3_buf	HBM/DDR buffer of PU3
stb4_buf	HBM/DDR buffer of PU4
stb5_buf	HBM/DDR buffer of PU5
stb6_buf	HBM/DDR buffer of PU6
stb7_buf	HBM/DDR buffer of PU7
j1_strm	output of joined rows.
e5_strm	end signal of joined rows.

hashJoinV3¶

#include "xf_database/hash_join_v3.hpp"

template <
    int HASH_MODE,
    int KEYW,
    int PW,
    int S_PW,
    int B_PW,
    int HASHWH,
    int HASHWL,
    int ARW,
    int CH_NM
    >
void hashJoinV3 (
    hls::stream <ap_uint <KEYW>> k0_strm_arry [CH_NM],
    hls::stream <ap_uint <PW>> p0_strm_arry [CH_NM],
    hls::stream <bool> e0_strm_arry [CH_NM],
    ap_uint <256>* htb0_buf,
    ap_uint <256>* htb1_buf,
    ap_uint <256>* htb2_buf,
    ap_uint <256>* htb3_buf,
    ap_uint <256>* htb4_buf,
    ap_uint <256>* htb5_buf,
    ap_uint <256>* htb6_buf,
    ap_uint <256>* htb7_buf,
    ap_uint <256>* stb0_buf,
    ap_uint <256>* stb1_buf,
    ap_uint <256>* stb2_buf,
    ap_uint <256>* stb3_buf,
    ap_uint <256>* stb4_buf,
    ap_uint <256>* stb5_buf,
    ap_uint <256>* stb6_buf,
    ap_uint <256>* stb7_buf,
    hls::stream <ap_uint <32>>& pu_begin_status_strms,
    hls::stream <ap_uint <32>>& pu_end_status_strms,
    hls::stream <ap_uint <KEYW+S_PW+B_PW>>& j_strm,
    hls::stream <bool>& j_e_strm
    )

Hash-Join v3 primitive, it takes more resourse than hashJoinMPU and promises a better performance in large size of table.

The maximum size of small table is 256MBx8(HBM number)=2GB in this design. The total hash entries is equal to 1<<(HASHWH + HASHWL), and it is limitied to maximum of 1M entries because of the size of URAM in a single SLR.

This module can accept more than 1 input row per cycle, via multiple input channels. The small table and the big table shares the same input ports, so the width of the payload should be the max of both, while the data should be aligned to the little-end. To be different with hashJoinMPU , the small table and big table should be fed only once.

Parameters:

HASH_MODE	0 for radix and 1 for Jenkin’s Lookup3 hash.
KEYW	width of key, in bit.
PW	width of max payload, in bit.
S_PW	width of payload of small table.
B_PW	width of payload of big table.
HASHWH	number of hash bits used for PU/buffer selection, 1~3.
HASHWL	number of hash bits used for hash-table in PU.
ARW	width of address, larger than 24 is suggested.
CH_NM	number of input channels, 1,2,4.
k0_strm_arry	input of key columns of both tables.
p0_strm_arry	input of payload columns of both tables.
e0_strm_arry	input of end signal of both tables.
htb0_buf	HBM/DDR buffer of hash_table0
htb1_buf	HBM/DDR buffer of hash_table1
htb2_buf	HBM/DDR buffer of hash_table2
htb3_buf	HBM/DDR buffer of hash_table3
htb4_buf	HBM/DDR buffer of hash_table4
htb5_buf	HBM/DDR buffer of hash_table5
htb6_buf	HBM/DDR buffer of hash_table6
htb7_buf	HBM/DDR buffer of hash_table7
stb0_buf	HBM/DDR buffer of PU0
stb1_buf	HBM/DDR buffer of PU1
stb2_buf	HBM/DDR buffer of PU2
stb3_buf	HBM/DDR buffer of PU3
stb4_buf	HBM/DDR buffer of PU4
stb5_buf	HBM/DDR buffer of PU5
stb6_buf	HBM/DDR buffer of PU6
stb7_buf	HBM/DDR buffer of PU7
pu_begin_status_strms	contains hash depth, row number of join result
pu_end_status_strms	contains hash depth, row number of join result
j_strm	output of joined result
j_e_strm	end flag of joined result

hashBuildProbeV3¶

#include "xf_database/hash_join_v3.hpp"

template <
    int HASH_MODE,
    int KEYW,
    int PW,
    int S_PW,
    int B_PW,
    int HASHWH,
    int HASHWL,
    int ARW,
    int CH_NM,
    int BF_W,
    int EN_BF
    >
static void hashBuildProbeV3 (
    bool& build_probe_flag,
    hls::stream <ap_uint <KEYW>> k0_strm_arry [CH_NM],
    hls::stream <ap_uint <PW>> p0_strm_arry [CH_NM],
    hls::stream <bool> e0_strm_arry [CH_NM],
    ap_uint <256>* htb0_buf,
    ap_uint <256>* htb1_buf,
    ap_uint <256>* htb2_buf,
    ap_uint <256>* htb3_buf,
    ap_uint <256>* htb4_buf,
    ap_uint <256>* htb5_buf,
    ap_uint <256>* htb6_buf,
    ap_uint <256>* htb7_buf,
    ap_uint <256>* stb0_buf,
    ap_uint <256>* stb1_buf,
    ap_uint <256>* stb2_buf,
    ap_uint <256>* stb3_buf,
    ap_uint <256>* stb4_buf,
    ap_uint <256>* stb5_buf,
    ap_uint <256>* stb6_buf,
    ap_uint <256>* stb7_buf,
    hls::stream <ap_uint <32>>& pu_begin_status_strms,
    hls::stream <ap_uint <32>>& pu_end_status_strms,
    hls::stream <ap_uint <KEYW+S_PW+B_PW>>& j_strm,
    hls::stream <bool>& j_e_strm
    )

Hash-Build-Probe v3 primitive, it can perform hash build and hash probe separately. It needs two call of kernel to perform build and probe seperately. There is a control flag to decide buld or probe. This primitive supports multiple build and mutiple probe, for example, you can scadule a workflow as: build0->build1->probe0->probe1->build2->build3->probe3…

The maximum size of small table is 256MBx8=2GB in this design. The total hash entries is equal to 1<<(HASHWH + HASHWL), and it is limitied to maximum of 1M entries because of the size of URAM in a single SLR.

This module can accept more than 1 input row per cycle, via multiple input channels. The small table and the big table shares the same input ports, so the width of the payload should be the max of both, while the data should be aligned to the little-end. The small table and big table should be fed only ONCE.

Parameters:

HASH_MODE	0 for radix and 1 for Jenkin’s Lookup3 hash.
KEYW	width of key, in bit.
PW	width of max payload, in bit.
S_PW	width of payload of small table.
B_PW	width of payload of big table.
HASHWH	number of hash bits used for PU/buffer selection, 1~3.
HASHWL	number of hash bits used for hash-table in PU.
ARW	width of address, log2(small table max num of rows).
CH_NM	number of input channels, 1,2,4.
BF_W	bloom-filter hash width.
EN_BF	bloom-filter switch, 0 for off, 1 for on.
build_probe_flag	0:build 1:probe
k0_strm_arry	input of key columns of both tables.
p0_strm_arry	input of payload columns of both tables.
e0_strm_arry	input of end signal of both tables.
htb0_buf	HBM/DDR buffer of hash_table0
htb1_buf	HBM/DDR buffer of hash_table1
htb2_buf	HBM/DDR buffer of hash_table2
htb3_buf	HBM/DDR buffer of hash_table3
htb4_buf	HBM/DDR buffer of hash_table4
htb5_buf	HBM/DDR buffer of hash_table5
htb6_buf	HBM/DDR buffer of hash_table6
htb7_buf	HBM/DDR buffer of hash_table7
stb0_buf	HBM/DDR buffer of PU0
stb1_buf	HBM/DDR buffer of PU1
stb2_buf	HBM/DDR buffer of PU2
stb3_buf	HBM/DDR buffer of PU3
stb4_buf	HBM/DDR buffer of PU4
stb5_buf	HBM/DDR buffer of PU5
stb6_buf	HBM/DDR buffer of PU6
stb7_buf	HBM/DDR buffer of PU7
pu_begin_status_strms	contains build id, fixed hash depth, joined number of last probe and start addr of unused stb_buf for each PU
pu_end_status_strms	returns next build id, fixed hash depth, joined number of current probe and end addr of used stb_buf for each PU
j_strm	output of joined result
j_e_strm	end flag of joined result

hashJoinV4¶

#include "xf_database/hash_join_v4.hpp"

template <
    int HASH_MODE,
    int KEYW,
    int PW,
    int S_PW,
    int B_PW,
    int HASHWH,
    int HASHWL,
    int ARW,
    int CH_NM,
    int BF_HASH_NM,
    int BFW,
    bool EN_BF
    >
static void hashJoinV4 (
    hls::stream <ap_uint <KEYW>> k0_strm_arry [CH_NM],
    hls::stream <ap_uint <PW>> p0_strm_arry [CH_NM],
    hls::stream <bool> e0_strm_arry [CH_NM],
    ap_uint <64>* htb0_buf,
    ap_uint <64>* htb1_buf,
    ap_uint <64>* htb2_buf,
    ap_uint <64>* htb3_buf,
    ap_uint <64>* htb4_buf,
    ap_uint <64>* htb5_buf,
    ap_uint <64>* htb6_buf,
    ap_uint <64>* htb7_buf,
    ap_uint <64>* stb0_buf,
    ap_uint <64>* stb1_buf,
    ap_uint <64>* stb2_buf,
    ap_uint <64>* stb3_buf,
    ap_uint <64>* stb4_buf,
    ap_uint <64>* stb5_buf,
    ap_uint <64>* stb6_buf,
    ap_uint <64>* stb7_buf,
    hls::stream <ap_uint <32>>& pu_begin_status_strms,
    hls::stream <ap_uint <32>>& pu_end_status_strms,
    hls::stream <ap_uint <KEYW+S_PW+B_PW>>& j_strm,
    hls::stream <bool>& j_e_strm
    )

Hash-Join v4 primitive, using bloom filter to enhance performance of hash join.

The build and probe procedure is similar to which in hashJoinV3 , and this primitive adds a bloom filter to reduce the redundant access to HBM.

The maximum size of small table is 256MBx8=2GB in this design. The total hash entries is equal to 1<<(HASHWH + HASHWL), and it is limitied to maximum of 1M entries because of the size of URAM in a single SLR.

This module can accept more than 1 input row per cycle, via multiple input channels. The small table and the big table shares the same input ports, so the width of the payload should be the max of both, while the data should be aligned to the little-end. Similar to hashJoinV3 , small table and big table should be fed only once.

Parameters:

HASH_MODE	0 for radix and 1 for Jenkin’s Lookup3 hash.
KEYW	width of key, in bit.
PW	width of max payload, in bit.
S_PW	width of payload of small table.
B_PW	width of payload of big table.
HASHWH	number of hash bits used for PU/buffer selection, 1~3.
HASHWL	number of hash bits used for hash-table in PU.
ARW	width of address, log2(small table max num of rows).
CH_NM	number of input channels, 1,2,4.
BF_HASH_NM	number of bloom filter, 1,2,3.
BF_W	bloom-filter hash width.
EN_BF	bloom-filter switch, 0 for off, 1 for on.
k0_strm_arry	input of key columns of both tables.
p0_strm_arry	input of payload columns of both tables.
e0_strm_arry	input of end signal of both tables.
htb0_buf	HBM/DDR buffer of hash_table0
htb1_buf	HBM/DDR buffer of hash_table1
htb2_buf	HBM/DDR buffer of hash_table2
htb3_buf	HBM/DDR buffer of hash_table3
htb4_buf	HBM/DDR buffer of hash_table4
htb5_buf	HBM/DDR buffer of hash_table5
htb6_buf	HBM/DDR buffer of hash_table6
htb7_buf	HBM/DDR buffer of hash_table7
stb0_buf	HBM/DDR buffer of PU0
stb1_buf	HBM/DDR buffer of PU1
stb2_buf	HBM/DDR buffer of PU2
stb3_buf	HBM/DDR buffer of PU3
stb4_buf	HBM/DDR buffer of PU4
stb5_buf	HBM/DDR buffer of PU5
stb6_buf	HBM/DDR buffer of PU6
stb7_buf	HBM/DDR buffer of PU7
pu_begin_status_strms	contains build id, fixed hash depth
pu_end_status_strms	returns next build id, fixed hash depth, joined number
j_strm	output of joined result
j_e_strm	end flag of joined result

hashBuildProbeV4¶

#include "xf_database/hash_join_v4.hpp"

template <
    int KEYW,
    int PW,
    int S_PW,
    int B_PW,
    int HASHWH,
    int HASHWL,
    int HASHO,
    int ARW,
    int CH_NM,
    int BF_HASH_NM,
    int BFW,
    int EN_BF
    >
static void hashBuildProbeV4 (
    bool& build_probe_flag,
    hls::stream <ap_uint <KEYW>> k0_strm_arry [CH_NM],
    hls::stream <ap_uint <PW>> p0_strm_arry [CH_NM],
    hls::stream <bool> e0_strm_arry [CH_NM],
    ap_uint <64>* htb0_buf,
    ap_uint <64>* htb1_buf,
    ap_uint <64>* htb2_buf,
    ap_uint <64>* htb3_buf,
    ap_uint <64>* htb4_buf,
    ap_uint <64>* htb5_buf,
    ap_uint <64>* htb6_buf,
    ap_uint <64>* htb7_buf,
    ap_uint <64>* stb0_buf,
    ap_uint <64>* stb1_buf,
    ap_uint <64>* stb2_buf,
    ap_uint <64>* stb3_buf,
    ap_uint <64>* stb4_buf,
    ap_uint <64>* stb5_buf,
    ap_uint <64>* stb6_buf,
    ap_uint <64>* stb7_buf,
    hls::stream <ap_uint <32>>& pu_begin_status_strms,
    hls::stream <ap_uint <32>>& pu_end_status_strms,
    hls::stream <ap_uint <KEYW+S_PW+B_PW>>& j_strm,
    hls::stream <bool>& j_e_strm
    )

Hash-Build-Probe v4 primitive. Compared with HashBuildProbeV3 , it enables bloom filter to reduce redundant access to HBM which can further reduce run-time of hash join. Build and probe are separately performed and controlled by a boolean flag. Mutiple build and probe are also provided, and it should make sure all rows in build phase can be stored temporarily in HBM to maintain correctness.

The maximum size of small table is 256MBx8=2GB in this design. The total hash entries is equal to 1<<(HASHWH + HASHWL), and it is limitied to maximun of 1M entries because of the size of URAM in a single SLR.

Parameters:

KEYW	width of key, in bit.
PW	width of max payload, in bit.
S_PW	width of payload of small table.
B_PW	width of payload of big table.
HASHWH	number of hash bits used for PU/buffer selection, 1~3.
HASHWL	number of hash bits used for hash-table in PU.
HASHO	number of hash bits used for overflow hash counter, 8-12.
ARW	width of address, log2(small table max num of rows).
CH_NM	number of input channels, 1,2,4.
BF_HASH_NM	number of hash functions in bloom filter, 1,2,3.
BFW	bloom-filter hash width.
EN_BF	bloom-filter switch, 0 for off, 1 for on.
build_probe_flag	0:build 1:probe
k0_strm_arry	input of key columns of both tables.
p0_strm_arry	input of payload columns of both tables.
e0_strm_arry	input of end signal of both tables.
htb0_buf	HBM/DDR buffer of hash_table0
htb1_buf	HBM/DDR buffer of hash_table1
htb2_buf	HBM/DDR buffer of hash_table2
htb3_buf	HBM/DDR buffer of hash_table3
htb4_buf	HBM/DDR buffer of hash_table4
htb5_buf	HBM/DDR buffer of hash_table5
htb6_buf	HBM/DDR buffer of hash_table6
htb7_buf	HBM/DDR buffer of hash_table7
stb0_buf	HBM/DDR buffer of PU0
stb1_buf	HBM/DDR buffer of PU1
stb2_buf	HBM/DDR buffer of PU2
stb3_buf	HBM/DDR buffer of PU3
stb4_buf	HBM/DDR buffer of PU4
stb5_buf	HBM/DDR buffer of PU5
stb6_buf	HBM/DDR buffer of PU6
stb7_buf	HBM/DDR buffer of PU7
pu_begin_status_strms	contains build ID, probe ID, fixed hash depth, joined number of last probe and start addr of unused stb_buf for each PU
pu_end_status_strms	returns next build ID, next probe ID, fixed hash depth, joined number of current probe and end addr of stb_buf for each PU
j_strm	output of joined rows.
j_e_strm	is the end flag of joined result.

hashLookup3¶

hashLookup3 overload (1)¶

#include "xf_database/hash_lookup3.hpp"

template <int W>
void hashLookup3 (
    hls::stream <ap_uint <W>>& key_strm,
    hls::stream <ap_uint <64>>& hash_strm
    )

lookup3 algorithm, 64-bit hash. II=1 when W<=96, otherwise II=(W/96).

Parameters:

W	the bit width of ap_uint type for input message stream.
key_strm	the message being hashed.
hash_strm	the result.

hashLookup3 overload (2)¶

#include "xf_database/hash_lookup3.hpp"

template <int W>
void hashLookup3 (
    hls::stream <ap_uint <W>>& key_strm,
    hls::stream <ap_uint <32>>& hash_strm
    )

lookup3 algorithm, 32-bit hash. II=1 when W<=96, otherwise II=(W/96).

Parameters:

W	the bit width of ap_uint type for input message stream.
key_strm	the message being hashed.
hash_strm	the result.

hashLookup3 overload (3)¶

#include "xf_database/hash_lookup3.hpp"

template <
    int WK,
    int WH
    >
void hashLookup3 (
    hls::stream <ap_uint <WK>>& key_strm,
    hls::stream <bool>& e_key_strm,
    hls::stream <ap_uint <WH>>& hash_strm,
    hls::stream <bool>& e_hash_strm
    )

lookup3 algorithm, 64-bit or 32-bit hash.

Parameters:

WK	the bit width of input message stream.
WH	the bit width of output hash stream, must be 64 or 32.
key_strm	the message being hashed.
e_key_strm	end of key flag stream.
hash_strm	the result.
e_hash_strm	end of hash flag stream.

hashMultiJoin¶

#include "xf_database/hash_multi_join.hpp"

template <
    int HASH_MODE,
    int KEYW,
    int PW,
    int S_PW,
    int B_PW,
    int HASHWH,
    int HASHWL,
    int ARW,
    int CH_NM
    >
void hashMultiJoin (
    hls::stream <ap_uint <3>>& join_flag_strm,
    hls::stream <ap_uint <KEYW>> k0_strm_arry [CH_NM],
    hls::stream <ap_uint <PW>> p0_strm_arry [CH_NM],
    hls::stream <bool> e0_strm_arry [CH_NM],
    ap_uint <256>* htb0_buf,
    ap_uint <256>* htb1_buf,
    ap_uint <256>* htb2_buf,
    ap_uint <256>* htb3_buf,
    ap_uint <256>* htb4_buf,
    ap_uint <256>* htb5_buf,
    ap_uint <256>* htb6_buf,
    ap_uint <256>* htb7_buf,
    ap_uint <256>* stb0_buf,
    ap_uint <256>* stb1_buf,
    ap_uint <256>* stb2_buf,
    ap_uint <256>* stb3_buf,
    ap_uint <256>* stb4_buf,
    ap_uint <256>* stb5_buf,
    ap_uint <256>* stb6_buf,
    ap_uint <256>* stb7_buf,
    hls::stream <ap_uint <32>>& pu_begin_status_strms,
    hls::stream <ap_uint <32>>& pu_end_status_strms,
    hls::stream <ap_uint <KEYW+S_PW+B_PW>>& j_strm,
    hls::stream <bool>& j_e_strm
    )

Multi-PU Hash-Multi-Join primitive, using multiple DDR/HBM buffers.

This primitive shares most of the structure of hashJoinV3 . The inner table should be fed once, followed by the outer table once.

Parameters:

HASH_MODE	0 for radix and 1 for Jenkin’s Lookup3 hash.
KEYW	width of key, in bit.
PW	width of max payload, in bit.
S_PW	width of payload of small table.
B_PW	width of payload of big table.
HASHWH	number of hash bits used for PU/buffer selection, 1~3.
HASHWL	number of hash bits used for hash-table in PU.
ARW	width of address, larger than 24 is suggested.
CH_NM	number of input channels, 1,2,4.
join_flag_strm	specifies the join type, this flag is only read once.
k0_strm_arry	input of key columns of both tables.
p0_strm_arry	input of payload columns of both tables.
e0_strm_arry	input of end signal of both tables.
htb0_buf	HBM/DDR buffer of hash_table0
htb1_buf	HBM/DDR buffer of hash_table1
htb2_buf	HBM/DDR buffer of hash_table2
htb3_buf	HBM/DDR buffer of hash_table3
htb4_buf	HBM/DDR buffer of hash_table4
htb5_buf	HBM/DDR buffer of hash_table5
htb6_buf	HBM/DDR buffer of hash_table6
htb7_buf	HBM/DDR buffer of hash_table7
stb0_buf	HBM/DDR buffer of PU0
stb1_buf	HBM/DDR buffer of PU1
stb2_buf	HBM/DDR buffer of PU2
stb3_buf	HBM/DDR buffer of PU3
stb4_buf	HBM/DDR buffer of PU4
stb5_buf	HBM/DDR buffer of PU5
stb6_buf	HBM/DDR buffer of PU6
stb7_buf	HBM/DDR buffer of PU7
pu_begin_status_strms	constains depth of hash, row number of join result
pu_end_status_strms	constains depth of hash, row number of join result
j_strm	output of joined result
j_e_strm	end flag of joined result

hashMultiJoinBuildProbe¶

#include "xf_database/hash_multi_join_build_probe.hpp"

template <
    int HASH_MODE,
    int KEYW,
    int PW,
    int S_PW,
    int B_PW,
    int HASHWH,
    int HASHWL,
    int ARW,
    int CH_NM
    >
void hashMultiJoinBuildProbe (
    bool build_probe_flag,
    hls::stream <ap_uint <3>>& join_flag_strm,
    hls::stream <ap_uint <KEYW>> k0_strm_arry [CH_NM],
    hls::stream <ap_uint <PW>> p0_strm_arry [CH_NM],
    hls::stream <bool> e0_strm_arry [CH_NM],
    ap_uint <256>* htb0_buf,
    ap_uint <256>* htb1_buf,
    ap_uint <256>* htb2_buf,
    ap_uint <256>* htb3_buf,
    ap_uint <256>* htb4_buf,
    ap_uint <256>* htb5_buf,
    ap_uint <256>* htb6_buf,
    ap_uint <256>* htb7_buf,
    ap_uint <256>* stb0_buf,
    ap_uint <256>* stb1_buf,
    ap_uint <256>* stb2_buf,
    ap_uint <256>* stb3_buf,
    ap_uint <256>* stb4_buf,
    ap_uint <256>* stb5_buf,
    ap_uint <256>* stb6_buf,
    ap_uint <256>* stb7_buf,
    hls::stream <ap_uint <32>>& pu_begin_status_strms,
    hls::stream <ap_uint <32>>& pu_end_status_strms,
    hls::stream <ap_uint <KEYW+S_PW+B_PW>>& j_strm,
    hls::stream <bool>& j_e_strm
    )

Multi-PU Hash-Multi-Join primitive, using multiple DDR/HBM buffers.

This primitive shares most of the structure of hashJoinV3 . The inner table should be fed once, followed by the outer table once.

Parameters:

HASH_MODE	0 for radix and 1 for Jenkin’s Lookup3 hash.
KEYW	width of key, in bit.
PW	width of max payload, in bit.
S_PW	width of payload of small table.
B_PW	width of payload of big table.
HASHWH	number of hash bits used for PU/buffer selection, 1~3.
HASHWL	number of hash bits used for hash-table in PU.
ARW	width of address, larger than 24 is suggested.
CH_NM	number of input channels, 1,2,4.
join_flag_strm	specifies the join type, this flag is only read once.
k0_strm_arry	input of key columns of both tables.
p0_strm_arry	input of payload columns of both tables.
e0_strm_arry	input of end signal of both tables.
htb0_buf	HBM/DDR buffer of hash_table0
htb1_buf	HBM/DDR buffer of hash_table1
htb2_buf	HBM/DDR buffer of hash_table2
htb3_buf	HBM/DDR buffer of hash_table3
htb4_buf	HBM/DDR buffer of hash_table4
htb5_buf	HBM/DDR buffer of hash_table5
htb6_buf	HBM/DDR buffer of hash_table6
htb7_buf	HBM/DDR buffer of hash_table7
stb0_buf	HBM/DDR buffer of PU0
stb1_buf	HBM/DDR buffer of PU1
stb2_buf	HBM/DDR buffer of PU2
stb3_buf	HBM/DDR buffer of PU3
stb4_buf	HBM/DDR buffer of PU4
stb5_buf	HBM/DDR buffer of PU5
stb6_buf	HBM/DDR buffer of PU6
stb7_buf	HBM/DDR buffer of PU7
pu_begin_status_strms	constains depth of hash, row number of join result
pu_end_status_strms	constains depth of hash, row number of join result
j_strm	output of joined result
j_e_strm	end flag of joined result

hashMurmur3¶

#include "xf_database/hash_murmur3.hpp"

template <
    int W,
    int H
    >
void hashMurmur3 (
    hls::stream <ap_uint <W>>& key_strm,
    hls::stream <ap_uint <H>>& hash_strm
    )

murmur3 algorithm.

Parameters:

W	the bit width of ap_uint type for input message stream.
h	the bit width of ap_uint type for output hash stream.
key_strm	the message being hashed.
hash_strm	the result.

hashPartition¶

#include "xf_database/hash_partition.hpp"

template <
    int HASH_MODE,
    int KEYW,
    int PW,
    int EW,
    int HASHWH,
    int HASHWL,
    int ARW,
    int CH_NM,
    int COL_NM
    >
void hashPartition (
    bool mk_on,
    int depth,
    hls::stream <int>& bit_num_strm,
    hls::stream <ap_uint <KEYW>> k0_strm_arry [CH_NM],
    hls::stream <ap_uint <PW>> p0_strm_arry [CH_NM],
    hls::stream <bool> e0_strm_arry [CH_NM],
    hls::stream <ap_uint <16>>& o_bkpu_num_strm,
    hls::stream <ap_uint <10>>& o_nm_strm,
    hls::stream <ap_uint <EW>> o_kpld_strm [COL_NM]
    )

Hash-Partition primitive.

Parameters:

HASH_MODE	0 for radix and 1 for Jenkin’s Lookup3 hash.
KEYW	width of key, in bit.
PW	width of max payload, in bit.
EW	element data width of input table, in bit.
HASHWH	number of hash bits used for PU selection.
HASHWL	number of hash bits used for partition selection.
ARW	width of address for URAM
CH_NM	number of input channels, 1,2,4.
COL_NM	number of input columns, 1~8.
mk_on	input of double key flag, 0 for off, 1 for on.
depth	input of depth of each hash bucket in URAM.
bit_num_strm	input of partition number, log2(number of partition).
k0_strm_arry	input of key columns of both tables.
p0_strm_arry	input of payload columns of both tables.
e0_strm_arry	input of end signal of both tables.
o_bkpu_num_strm	output of index for bucket and PU
o_nm_strm	output of row number each time
o_kpld_strm	output of key+payload

hashSemiJoin¶

#include "xf_database/hash_semi_join.hpp"

template <
    int HashMode,
    int WKey,
    int WPayload,
    int WHashHigh,
    int WhashLow,
    int WTmpBufferAddress,
    int WTmpBuffer,
    int NChannels,
    int WBloomFilter,
    int EnBloomFilter
    >
static void hashSemiJoin (
    hls::stream <ap_uint <WKey>> key_istrms [NChannels],
    hls::stream <ap_uint <WPayload>> payload_istrms [NChannels],
    hls::stream <bool> e0_strm_arry [NChannels],
    ap_uint <WTmpBuffer>* pu0_tmp_rwtpr,
    ap_uint <WTmpBuffer>* pu1_tmp_rwptr,
    ap_uint <WTmpBuffer>* pu2_tmp_rwptr,
    ap_uint <WTmpBuffer>* pu3_tmp_rwptr,
    ap_uint <WTmpBuffer>* pu4_tmp_rwptr,
    ap_uint <WTmpBuffer>* pu5_tmp_rwptr,
    ap_uint <WTmpBuffer>* pu6_tmp_rwptr,
    ap_uint <WTmpBuffer>* pu7_tmp_rwptr,
    hls::stream <ap_uint <WPayload>>& join_ostrm,
    hls::stream <bool>& end_ostrm
    )

Multi-PU Hash-Semi-Join primitive, using multiple DDR/HBM buffers.

The max number of lines of inner table is 2M in this design. It is assumed that the hash-conflict is within 256K per bin.

This module can accept more than 1 input row per cycle, via multiple input channels. The outer table and the inner table share the same input ports, so the width of the payload should be the max of both, while the data should be aligned to the little-end. The inner table should be fed TWICE, followed by the outer table ONCE.

Parameters:

HashMode	0 for radix and 1 for Jenkin’s Lookup3 hash.
WKey	width of key, in bit.
WPayload	width of payload of outer table.
WHashHigh	number of hash bits used for PU/buffer selection, 1~3.
WhashLow	number of hash bits used for hash-table in PU.
WTmpBufferAddress	width of address, log2(inner table max num of rows).
WTmpBuffer	width of buffer.
NChannels	number of input channels, 1,2,4.
WBloomFilter	bloom-filter hash width.
EnBloomFilter	bloom-filter switch, 0 for off, 1 for on.
key_istrms	input of key columns of both tables.
payload_istrms	input of payload columns of both tables.
e0_strm_arry	input of end signal of both tables.
pu0_tmp_rwtpr	HBM/DDR buffer of PU0
pu1_tmp_rwptr	HBM/DDR buffer of PU1
pu2_tmp_rwptr	HBM/DDR buffer of PU2
pu3_tmp_rwptr	HBM/DDR buffer of PU3
pu4_tmp_rwptr	HBM/DDR buffer of PU4
pu5_tmp_rwptr	HBM/DDR buffer of PU5
pu6_tmp_rwptr	HBM/DDR buffer of PU6
pu7_tmp_rwptr	HBM/DDR buffer of PU7
join_ostrm	output of joined rows.
end_ostrm	end signal of joined rows.

insertSort¶

insertSort overload (1)¶

#include "xf_database/insert_sort.hpp"

template <
    typename KEY_TYPE,
    int MAX_SORT_NUMBER
    >
void insertSort (
    hls::stream <KEY_TYPE>& kinStrm,
    hls::stream <bool>& endInStrm,
    hls::stream <KEY_TYPE>& koutStrm,
    hls::stream <bool>& endOutStrm,
    bool order
    )

Insert sort top function.

Parameters:

KEY_TYPE	the input and output key type
MAX_SORT_NUMBER	the max number of the sequence can be sorted
kinStrm	input key stream
endInStrm	end flag stream for input
koutStrm	output key stream
endOutStrm	end flag stream for output
order	1:sort ascending 0:sort descending

insertSort overload (2)¶

#include "xf_database/insert_sort.hpp"

template <
    typename KEY_TYPE,
    typename DATA_TYPE,
    int MAX_SORT_NUMBER
    >
void insertSort (
    hls::stream <DATA_TYPE>& dinStrm,
    hls::stream <KEY_TYPE>& kinStrm,
    hls::stream <bool>& endInStrm,
    hls::stream <DATA_TYPE>& doutStrm,
    hls::stream <KEY_TYPE>& koutStrm,
    hls::stream <bool>& endOutStrm,
    bool order
    )

Insert sort top function.

Parameters:

KEY_TYPE	the input and output key type
DATA_TYPE	the input and output data type
MAX_SORT_NUMBER	the max number of the sequence can be sorted
dinStrm	input data stream
kinStrm	input key stream
endInStrm	end flag stream for input
doutStrm	output data stream
koutStrm	output key stream
endOutStrm	end flag stream for output
order	1:sort ascending 0:sort descending

mergeJoin¶

#include "xf_database/merge_join.hpp"

template <
    typename KEY_T,
    typename LEFT_FIELD_T,
    typename RIGHT_FIELD_T
    >
void mergeJoin (
    bool isascend,
    hls::stream <KEY_T>& left_strm_in_key,
    hls::stream <LEFT_FIELD_T>& left_strm_in_field,
    hls::stream <bool>& left_e_strm,
    hls::stream <KEY_T>& right_strm_in_key,
    hls::stream <RIGHT_FIELD_T>& right_strm_in_field,
    hls::stream <bool>& right_e_strm,
    hls::stream <KEY_T>& left_strm_out_key,
    hls::stream <LEFT_FIELD_T>& left_strm_out_field,
    hls::stream <KEY_T>& right_strm_out_key,
    hls::stream <RIGHT_FIELD_T>& right_strm_out_field,
    hls::stream <bool>& out_e_strm
    )

merge join function for sorted tables without duplicated key in the left table

Parameters:

KEY_T	the type of the key of left table
LEFT_FIELD_T	the type of the field of left table
RIGHT_FIELD_T	the type of the field of right table
isascend	the flag to show if the input tables are ascend or descend tables
left_strm_in_key	the key stream of the left input table
left_strm_in_field	the field stream of the left input table
left_e_strm	the end flag stream to mark the end of left input table
right_strm_in_key	the key stream of the right input table
right_strm_in_field	the field stream of the right input table
right_e_strm	the end flag stream to mark the end of right input table
left_strm_out_key	the output key stream of left table
left_strm_out_field	the output field stream of left table
right_strm_out_key	the output key stream of right table
right_strm_out_field	the output field stream of right
out_e_strm	the end flag stream to mark the end of out table

mergeLeftJoin¶

#include "xf_database/merge_left_join.hpp"

template <
    typename KEY_T,
    typename LEFT_FIELD_T,
    typename RIGHT_FIELD_T
    >
void mergeLeftJoin (
    bool isascend,
    hls::stream <KEY_T>& left_strm_in_key,
    hls::stream <LEFT_FIELD_T>& left_strm_in_field,
    hls::stream <bool>& left_e_strm,
    hls::stream <KEY_T>& right_strm_in_key,
    hls::stream <RIGHT_FIELD_T>& right_strm_in_field,
    hls::stream <bool>& right_e_strm,
    hls::stream <KEY_T>& left_strm_out_key,
    hls::stream <LEFT_FIELD_T>& left_strm_out_field,
    hls::stream <KEY_T>& right_strm_out_key,
    hls::stream <RIGHT_FIELD_T>& right_strm_out_field,
    hls::stream <bool>& out_e_strm,
    hls::stream <bool>& isnull_strm
    )

merge left join function for sorted table, left table should not have duplicated keys.

Parameters:

KEY_T	the type of the key
LEFT_FIELD_T	the type of the field of left table
RIGHT_FIELD_T	the type of the field of right table
isascend	flag to show if the input tables are ascend tables
left_strm_in_key	the key stream of the left input table
left_strm_in_field	the field stream of the left input table
left_e_strm	the end flag stream to mark the end of left input table
right_strm_in_key	the key stream of the right input table
right_strm_in_field	the field stream of the right input table
right_e_strm	the end flag stream to mark the end of right input table
left_strm_out_key	the output key stream of left table
left_strm_out_field	the output field stream of left table
right_strm_out_key	the output key stream of right table
right_strm_out_field	the output field stream of right
out_e_strm	the end flag stream to mark the end of out table
isnull_strm	the isnull stream to show if the result right table is null.

mergeSort¶

mergeSort overload (1)¶

#include "xf_database/merge_sort.hpp"

template <typename Key_Type>
void mergeSort (
    hls::stream <Key_Type>& left_kin_strm,
    hls::stream <bool>& left_strm_in_end,
    hls::stream <Key_Type>& right_kin_strm,
    hls::stream <bool>& right_strm_in_end,
    hls::stream <Key_Type>& kout_strm,
    hls::stream <bool>& strm_out_end,
    bool order
    )

Merge sort function.

Parameters:

Data_Type	the input and output key type
left_kin_strm	input key stream
left_strm_in_end	end flag stream for left input
right_kin_strm	input key stream
right_strm_in_end	end flag stream for right input
kout_strm	output key stream
strm_out_end	end flag stream for output data
order	1:ascending 0:descending

mergeSort overload (2)¶

#include "xf_database/merge_sort.hpp"

template <
    typename Data_Type,
    typename Key_Type
    >
void mergeSort (
    hls::stream <Data_Type>& left_din_strm,
    hls::stream <Key_Type>& left_kin_strm,
    hls::stream <bool>& left_strm_in_end,
    hls::stream <Data_Type>& right_din_strm,
    hls::stream <Key_Type>& right_kin_strm,
    hls::stream <bool>& right_strm_in_end,
    hls::stream <Data_Type>& dout_strm,
    hls::stream <Key_Type>& kout_strm,
    hls::stream <bool>& strm_out_end,
    bool order
    )

Merge sort function.

Parameters:

Data_Type	the input and output data type
Data_Type	the input and output key type
left_din_strm	input left data stream
left_kin_strm	input key stream
left_strm_in_end	end flag stream for left input
right_din_strm	input right data stream
right_kin_strm	input key stream
right_strm_in_end	end flag stream for right input
dout_strm	output data stream
kout_strm	output key stream
strm_out_end	end flag stream for output data
order	1:ascending 0:descending

nestedLoopJoin¶

#include "xf_database/nested_loop_join.hpp"

template <
    int CMP_NUM,
    typename KEY_T,
    typename LEFT_FIELD_T,
    typename RIGHT_FIELD_T
    >
void nestedLoopJoin (
    hls::stream <KEY_T>& strm_in_left_key,
    hls::stream <LEFT_FIELD_T>& strm_in_left_field,
    hls::stream <bool>& strm_in_left_e,
    hls::stream <KEY_T>& strm_in_right_key,
    hls::stream <RIGHT_FIELD_T>& strm_in_right_field,
    hls::stream <bool>& strm_in_right_e,
    hls::stream <KEY_T> strm_out_left_key [CMP_NUM],
    hls::stream <LEFT_FIELD_T> strm_out_left_field [CMP_NUM],
    hls::stream <KEY_T> strm_out_right_key [CMP_NUM],
    hls::stream <RIGHT_FIELD_T> strm_out_right_field [CMP_NUM],
    hls::stream <bool> strm_out_e [CMP_NUM]
    )

Nested loop join function.

Parameters:

KEY_T	the type of the key of left table
LEFT_FIELD_T	the type of the field of left table
RIGHT_FIELD_T	the type of the field of right table
strm_in_left_key	the key stream of the left input table
strm_in_left_field	the field stream of the left input table
strm_in_left_e	the end flag stream to mark the end of left input table
strm_in_right_key	the key stream of the right input table
strm_in_right_field	the field stream of the right input table
strm_in_right_e	the end flag stream to mark the end of right input table
strm_out_left_key	the output key stream of left table
strm_out_left_field	the output field stream of left table
strm_out_right_key	the output key stream of right table
strm_out_right_field	the output field stream of right
strm_out_e	the end flag stream to mark the end of out table

scanCmpStrCol¶

#include "xf_database/scan_cmp_str_col.hpp"

void scanCmpStrCol (
    ap_uint <512>* ddr_ptr,
    hls::stream <int>& size,
    hls::stream <int>& num_str,
    hls::stream <ap_uint <512>>& cnst_stream,
    hls::stream <bool>& out_stream,
    hls::stream <bool>& e_str_o
    )

sacn multiple columns of string in global memory, and compare each of them with constant string

Parameters:

ddr_ptr	input string array stored in global memory.
size	the number of times reading global memory
num_str	the number of actual strings
cnst_stream	input constant string stream, 512 bits in heading-length and padding-zero format, read only once as configuration.
out_stream	output whether each string is equal to the constant string, true indicates they are equal.
e_str_o	end flag stream for output stream.

scanCol¶

scanCol overload (1)¶

#include "xf_database/scan_col.hpp"

template <
    int burst_len,
    int vec_len,
    int size0
    >
void scanCol (
    ap_uint <8*size0*vec_len>* c0vec_ptr,
    const int nrow,
    hls::stream <ap_uint <8*size0>>& c0_strm,
    hls::stream <bool>& e_row_strm
    )

Scan 1 column from DDR/HBM buffers.

Parameters:

burst_len	burst read length, must be supported by MC.
vec_len	number of items to be scanned as a vector from AXI port.
size0	size of column 0, in byte.
c0vec_ptr	buffer pointer to column 0.
nrow	number of row to scan.
c0_strm	column 0 stream.
e_row_strm	output end flag stream.

scanCol overload (2)¶

#include "xf_database/scan_col.hpp"

template <
    int burst_len,
    int vec_len,
    int size0,
    int size1
    >
void scanCol (
    ap_uint <8*size0*vec_len>* c0vec_ptr,
    ap_uint <8*size1*vec_len>* c1vec_ptr,
    const int nrow,
    hls::stream <ap_uint <8*size0>>& c0_strm,
    hls::stream <ap_uint <8*size1>>& c1_strm,
    hls::stream <bool>& e_row_strm
    )

Scan 2 columns from DDR/HBM buffers.

Parameters:

burst_len	burst read length, must be supported by MC.
vec_len	number of items to be scanned as a vector from AXI port.
size0	size of column 0, in byte.
size1	size of column 1, in byte.
c0vec_ptr	buffer pointer to column 0.
c1vec_ptr	buffer pointer to column 1.
nrow	number of row to scan.
c0_strm	column 0 stream.
c1_strm	column 1 stream.
e_row_strm	output end flag stream.

scanCol overload (3)¶

#include "xf_database/scan_col.hpp"

template <
    int burst_len,
    int vec_len,
    int size0,
    int size1,
    int size2
    >
void scanCol (
    ap_uint <8*size0*vec_len>* c0vec_ptr,
    ap_uint <8*size1*vec_len>* c1vec_ptr,
    ap_uint <8*size2*vec_len>* c2vec_ptr,
    const int nrow,
    hls::stream <ap_uint <8*size0>>& c0_strm,
    hls::stream <ap_uint <8*size1>>& c1_strm,
    hls::stream <ap_uint <8*size2>>& c2_strm,
    hls::stream <bool>& e_row_strm
    )

Scan 3 columns from DDR/HBM buffers.

Parameters:

burst_len	burst read length, must be supported by MC.
vec_len	number of items to be scanned as a vector from AXI port.
size0	size of column 0, in byte.
size1	size of column 1, in byte.
size2	size of column 2, in byte.
c0vec_ptr	buffer pointer to column 0.
c1vec_ptr	buffer pointer to column 1.
c2vec_ptr	buffer pointer to column 2.
nrow	number of row to scan.
c0_strm	column 0 stream.
c1_strm	column 1 stream.
c2_strm	column 2 stream.
e_row_strm	output end flag stream.

scanCol overload (4)¶

#include "xf_database/scan_col.hpp"

template <
    int burst_len,
    int vec_len,
    int size0,
    int size1,
    int size2,
    int size3
    >
void scanCol (
    ap_uint <8*size0*vec_len>* c0vec_ptr,
    ap_uint <8*size1*vec_len>* c1vec_ptr,
    ap_uint <8*size2*vec_len>* c2vec_ptr,
    ap_uint <8*size3*vec_len>* c3vec_ptr,
    const int nrow,
    hls::stream <ap_uint <8*size0>>& c0_strm,
    hls::stream <ap_uint <8*size1>>& c1_strm,
    hls::stream <ap_uint <8*size2>>& c2_strm,
    hls::stream <ap_uint <8*size3>>& c3_strm,
    hls::stream <bool>& e_row_strm
    )

Scan 4 columns from DDR/HBM buffers.

Parameters:

burst_len	burst read length, must be supported by MC.
vec_len	number of items to be scanned as a vector from AXI port.
size0	size of column 0, in byte.
size1	size of column 1, in byte.
size2	size of column 2, in byte.
size3	size of column 3, in byte.
c0vec_ptr	buffer pointer to column 0.
c1vec_ptr	buffer pointer to column 1.
c2vec_ptr	buffer pointer to column 2.
c3vec_ptr	buffer pointer to column 3.
nrow	number of row to scan.
c0_strm	column 0 stream.
c1_strm	column 1 stream.
c2_strm	column 2 stream.
c3_strm	column 3 stream.
e_row_strm	output end flag stream.

scanCol overload (5)¶

#include "xf_database/scan_col.hpp"

template <
    int burst_len,
    int vec_len,
    int size0,
    int size1,
    int size2,
    int size3,
    int size4
    >
void scanCol (
    ap_uint <8*size0*vec_len>* c0vec_ptr,
    ap_uint <8*size1*vec_len>* c1vec_ptr,
    ap_uint <8*size2*vec_len>* c2vec_ptr,
    ap_uint <8*size3*vec_len>* c3vec_ptr,
    ap_uint <8*size4*vec_len>* c4vec_ptr,
    const int nrow,
    hls::stream <ap_uint <8*size0>>& c0_strm,
    hls::stream <ap_uint <8*size1>>& c1_strm,
    hls::stream <ap_uint <8*size2>>& c2_strm,
    hls::stream <ap_uint <8*size3>>& c3_strm,
    hls::stream <ap_uint <8*size4>>& c4_strm,
    hls::stream <bool>& e_row_strm
    )

Scan 5 columns from DDR/HBM buffers.

Parameters:

burst_len	burst read length, must be supported by MC.
vec_len	number of items to be scanned as a vector from AXI port.
size0	size of column 0, in byte.
size1	size of column 1, in byte.
size2	size of column 2, in byte.
size3	size of column 3, in byte.
size4	size of column 4, in byte.
c0vec_ptr	buffer pointer to column 0.
c1vec_ptr	buffer pointer to column 1.
c2vec_ptr	buffer pointer to column 2.
c3vec_ptr	buffer pointer to column 3.
c4vec_ptr	buffer pointer to column 4.
nrow	number of row to scan.
c0_strm	column 0 stream.
c1_strm	column 1 stream.
c2_strm	column 2 stream.
c3_strm	column 3 stream.
c4_strm	column 4 stream.
e_row_strm	output end flag stream.

scanCol overload (6)¶

#include "xf_database/scan_col.hpp"

template <
    int burst_len,
    int vec_len,
    int size0,
    int size1,
    int size2,
    int size3,
    int size4,
    int size5
    >
void scanCol (
    ap_uint <8*size0*vec_len>* c0vec_ptr,
    ap_uint <8*size1*vec_len>* c1vec_ptr,
    ap_uint <8*size2*vec_len>* c2vec_ptr,
    ap_uint <8*size3*vec_len>* c3vec_ptr,
    ap_uint <8*size4*vec_len>* c4vec_ptr,
    ap_uint <8*size5*vec_len>* c5vec_ptr,
    const int nrow,
    hls::stream <ap_uint <8*size0>>& c0_strm,
    hls::stream <ap_uint <8*size1>>& c1_strm,
    hls::stream <ap_uint <8*size2>>& c2_strm,
    hls::stream <ap_uint <8*size3>>& c3_strm,
    hls::stream <ap_uint <8*size4>>& c4_strm,
    hls::stream <ap_uint <8*size5>>& c5_strm,
    hls::stream <bool>& e_row_strm
    )

Scan 6 columns from DDR/HBM buffers.

Parameters:

burst_len	burst read length, must be supported by MC.
vec_len	number of items to be scanned as a vector from AXI port.
size0	size of column 0, in byte.
size1	size of column 1, in byte.
size2	size of column 2, in byte.
size3	size of column 3, in byte.
size4	size of column 4, in byte.
size5	size of column 5, in byte.
c0vec_ptr	buffer pointer to column 0.
c1vec_ptr	buffer pointer to column 1.
c2vec_ptr	buffer pointer to column 2.
c3vec_ptr	buffer pointer to column 3.
c4vec_ptr	buffer pointer to column 4.
c5vec_ptr	buffer pointer to column 5.
nrow	number of row to scan.
c0_strm	column 0 stream.
c1_strm	column 1 stream.
c2_strm	column 2 stream.
c3_strm	column 3 stream.
c4_strm	column 4 stream.
c5_strm	column 5 stream.
e_row_strm	output end flag stream.

scanCol overload (7)¶

#include "xf_database/scan_col.hpp"

template <
    int burst_len,
    int vec_len,
    int ch_num,
    int size0
    >
void scanCol (
    ap_uint <8*size0*vec_len>* c0vec_ptr,
    const int nrow,
    hls::stream <ap_uint <8*size0>> c0_strm [ch_num],
    hls::stream <bool> e_row_strm [ch_num]
    )

Scan one column from DDR/HBM buffers, emit multiple rows concurrently.

Parameters:

burst_len	burst read length, must be supported by MC.
vec_len	number of items to be scanned as a vector from AXI port.
ch_num	number of concurrent output channels per column.
size0	size of column 0, in byte.
c0vec_ptr	buffer pointer to column 0.
nrow	number of row to scan.
c0_strm	array of column 0 stream.
e_row_strm	array of output end flag stream.

scanCol overload (8)¶

#include "xf_database/scan_col.hpp"

template <
    int burst_len,
    int vec_len,
    int ch_num,
    int size0,
    int size1
    >
void scanCol (
    ap_uint <8*size0*vec_len>* c0vec_ptr,
    ap_uint <8*size1*vec_len>* c1vec_ptr,
    const int nrow,
    hls::stream <ap_uint <8*size0>> c0_strm [ch_num],
    hls::stream <ap_uint <8*size1>> c1_strm [ch_num],
    hls::stream <bool> e_row_strm [ch_num]
    )

Scan two columns from DDR/HBM buffers, emit multiple rows concurrently.

Parameters:

burst_len	burst read length, must be supported by MC.
vec_len	number of items to be scanned as a vector from AXI port.
ch_num	number of concurrent output channels per column.
size0	size of column 0, in byte.
size1	size of column 1, in byte.
c0vec_ptr	buffer pointer to column 0.
c1vec_ptr	buffer pointer to column 1.
nrow	number of row to scan.
c0_strm	array of column 0 stream.
c1_strm	array of column 1 stream.
e_row_strm	array of output end flag stream.

scanCol overload (9)¶

#include "xf_database/scan_col.hpp"

template <
    int burst_len,
    int vec_len,
    int ch_num,
    int size0,
    int size1,
    int size2
    >
void scanCol (
    ap_uint <8*size0*vec_len>* c0vec_ptr,
    ap_uint <8*size1*vec_len>* c1vec_ptr,
    ap_uint <8*size2*vec_len>* c2vec_ptr,
    const int nrow,
    hls::stream <ap_uint <8*size0>> c0_strm [ch_num],
    hls::stream <ap_uint <8*size1>> c1_strm [ch_num],
    hls::stream <ap_uint <8*size2>> c2_strm [ch_num],
    hls::stream <bool> e_row_strm [ch_num]
    )

Scan three columns from DDR/HBM buffers, emit multiple rows concurrently.

Parameters:

burst_len	burst read length, must be supported by MC.
vec_len	number of items to be scanned as a vector from AXI port.
ch_num	number of concurrent output channels per column.
size0	size of column 0, in byte.
size1	size of column 1, in byte.
size2	size of column 2, in byte.
c0vec_ptr	buffer pointer to column 0.
c1vec_ptr	buffer pointer to column 1.
c2vec_ptr	buffer pointer to column 2.
nrow	number of row to scan.
c0_strm	array of column 0 stream.
c1_strm	array of column 1 stream.
c2_strm	array of column 2 stream.
e_row_strm	array of output end flag stream.

scanCol overload (10)¶

#include "xf_database/scan_col_2.hpp"

template <
    int burst_len,
    int vec_len,
    int ch_nm,
    int size0,
    int size1
    >
static void scanCol (
    ap_uint <8*size0*vec_len>* c0vec_ptr,
    ap_uint <8*size1*vec_len>* c1vec_ptr,
    hls::stream <ap_uint <8*size0>> c0_strm [ch_nm],
    hls::stream <ap_uint <8*size1>> c1_strm [ch_nm],
    hls::stream <bool> e_row_strm [ch_nm]
    )

scan 2 columns from DDR/HBM buffers.

The LSB of first vector of first column specifies the number of rows to be scanned. For a following buffer, if the first vector is zero, same number of zeros will be emitted, otherwise, same number of rows will be read from the buffer.

Parameters:

burst_len	burst read length, must be supported by MC.
vec_len	scan this number of items as a vector from AXI port.
ch_nm	number of concurrent output channels per column.
size0	size of column 0, in byte.
size1	size of column 1, in byte.
c0vec_ptr	buffer pointer to column 0.
c1vec_ptr	buffer pointer to column 1.
c0_strm	array of column 0 stream.
c1_strm	array of column 1 stream.
e_row_strm	array of output end flag stream.

scanCol overload (11)¶

#include "xf_database/scan_col_2.hpp"

template <
    int burst_len,
    int vec_len,
    int ch_nm,
    int size0,
    int size1,
    int size2
    >
static void scanCol (
    ap_uint <8*size0*vec_len>* c0vec_ptr,
    ap_uint <8*size1*vec_len>* c1vec_ptr,
    ap_uint <8*size2*vec_len>* c2vec_ptr,
    hls::stream <ap_uint <8*size0>> c0_strm [ch_nm],
    hls::stream <ap_uint <8*size1>> c1_strm [ch_nm],
    hls::stream <ap_uint <8*size2>> c2_strm [ch_nm],
    hls::stream <bool> e_row_strm [ch_nm]
    )

scan 3 columns from DDR/HBM buffers.

The LSB of first vector of first column specifies the number of rows to be scanned. For a following buffer, if the first vector is zero, same number of zeros will be emitted, otherwise, same number of rows will be read from the buffer.

Parameters:

burst_len	burst read length, must be supported by MC.
vec_len	scan this number of items as a vector from AXI port.
ch_nm	number of concurrent output channels per column.
size0	size of column 0, in byte.
size1	size of column 1, in byte.
size2	size of column 2, in byte.
c0vec_ptr	buffer pointer to column 0.
c1vec_ptr	buffer pointer to column 1.
c2vec_ptr	buffer pointer to column 2.
c0_strm	array of column 0 stream.
c1_strm	array of column 1 stream.
c2_strm	array of column 2 stream.
e_row_strm	array of output end flag stream.

scanCol overload (12)¶

#include "xf_database/scan_col_2.hpp"

template <
    int burst_len,
    int vec_len,
    int ch_nm,
    int size0,
    int size1,
    int size2,
    int size3
    >
static void scanCol (
    ap_uint <8*size0*vec_len>* c0vec_ptr,
    ap_uint <8*size1*vec_len>* c1vec_ptr,
    ap_uint <8*size2*vec_len>* c2vec_ptr,
    ap_uint <8*size3*vec_len>* c3vec_ptr,
    hls::stream <ap_uint <8*size0>> c0_strm [ch_nm],
    hls::stream <ap_uint <8*size1>> c1_strm [ch_nm],
    hls::stream <ap_uint <8*size2>> c2_strm [ch_nm],
    hls::stream <ap_uint <8*size3>> c3_strm [ch_nm],
    hls::stream <bool> e_row_strm [ch_nm]
    )

scan 4 columns from DDR/HBM buffers.

The LSB of first vector of first column specifies the number of rows to be scanned. For a following buffer, if the first vector is zero, same number of zeros will be emitted, otherwise, same number of rows will be read from the buffer.

Parameters:

burst_len	burst read length, must be supported by MC.
vec_len	scan this number of items as a vector from AXI port.
ch_nm	number of concurrent output channels per column.
size0	size of column 0, in byte.
size1	size of column 1, in byte.
size2	size of column 2, in byte.
size3	size of column 3, in byte.
c0vec_ptr	buffer pointer to column 0.
c1vec_ptr	buffer pointer to column 1.
c2vec_ptr	buffer pointer to column 2.
c3vec_ptr	buffer pointer to column 3.
c0_strm	array of column 0 stream.
c1_strm	array of column 1 stream.
c2_strm	array of column 2 stream.
c3_strm	array of column 3 stream.
e_row_strm	array of output end flag stream.

scanCol overload (13)¶

#include "xf_database/scan_col_2.hpp"

template <
    int burst_len,
    int vec_len,
    int ch_nm,
    int size0,
    int size1,
    int size2,
    int size3,
    int size4
    >
static void scanCol (
    ap_uint <8*size0*vec_len>* c0vec_ptr,
    ap_uint <8*size1*vec_len>* c1vec_ptr,
    ap_uint <8*size2*vec_len>* c2vec_ptr,
    ap_uint <8*size3*vec_len>* c3vec_ptr,
    ap_uint <8*size4*vec_len>* c4vec_ptr,
    hls::stream <ap_uint <8*size0>> c0_strm [ch_nm],
    hls::stream <ap_uint <8*size1>> c1_strm [ch_nm],
    hls::stream <ap_uint <8*size2>> c2_strm [ch_nm],
    hls::stream <ap_uint <8*size3>> c3_strm [ch_nm],
    hls::stream <ap_uint <8*size4>> c4_strm [ch_nm],
    hls::stream <bool> e_row_strm [ch_nm]
    )

scan 5 columns from DDR/HBM buffers.

The LSB of first vector of first column specifies the number of rows to be scanned. For a following buffer, if the first vector is zero, same number of zeros will be emitted, otherwise, same number of rows will be read from the buffer.

Parameters:

burst_len	burst read length, must be supported by MC.
vec_len	scan this number of items as a vector from AXI port.
ch_nm	number of concurrent output channels per column.
size0	size of column 0, in byte.
size1	size of column 1, in byte.
size2	size of column 2, in byte.
size3	size of column 3, in byte.
size4	size of column 4, in byte.
c0vec_ptr	buffer pointer to column 0.
c1vec_ptr	buffer pointer to column 1.
c2vec_ptr	buffer pointer to column 2.
c3vec_ptr	buffer pointer to column 3.
c4vec_ptr	buffer pointer to column 4.
c0_strm	array of column 0 stream.
c1_strm	array of column 1 stream.
c2_strm	array of column 2 stream.
c3_strm	array of column 3 stream.
c4_strm	array of column 4 stream.
e_row_strm	array of output end flag stream.

staticEval¶

staticEval overload (1)¶

#include "xf_database/static_eval.hpp"

template <
    typename T,
    typename T_O,
    T_O(*)(T) opf
    >
void staticEval (
    hls::stream <T>& in_strm,
    hls::stream <bool>& e_in_strm,
    hls::stream <T_O>& out_strm,
    hls::stream <bool>& e_out_strm
    )

One stream input static evaluation.

static_eval function calculates the experssion result that user defined. This result will be passed to aggregate module as the input. When calling this API, the T T_O are the input/output data types for each parameter of user code. E.g.

// decl
long user_func(int a);
// use
 database::static_eval<int, long, user_func>(
  in1_strm, e_in_strm, out_strm, e_out_strm);

In the above call, int is the data type of input of user_func , and long is the return type of user_func .

Parameters:

T	the input stream type, inferred from argument
T_O	the output stream type, inferred from argument
opf	the user-defined expression function
in_strm	input data stream
e_in_strm	end flag stream for input data
out_strm	output data stream
e_out_strm	end flag stream for output data

staticEval overload (2)¶

#include "xf_database/static_eval.hpp"

template <
    typename T1,
    typename T2,
    typename T_O,
    T_O(*)(T1, T2) opf
    >
void staticEval (
    hls::stream <T1>& in1_strm,
    hls::stream <T2>& in2_strm,
    hls::stream <bool>& e_in_strm,
    hls::stream <T_O>& out_strm,
    hls::stream <bool>& e_out_strm
    )

Two stream input static evaluation.

static_eval function calculate the experssion result that user defined. This result will be passed to aggregate module as the input. When calling this API, the T1 T2 T_O are the input/output data types for each parameter of user code. E.g.

// decl
long user_func(int a, int b);
// use
 database::static_eval<int, int, long, user_func>(
  in1_strm, in2_strm, e_in_strm, out_strm, e_out_strm);

In the above call, two int are the data type of input of user_func , and long is the return type of user_func .

Parameters:

T1	the input stream type, inferred from argument
T2	the input stream type, inferred from argument
T_O	the output stream type, inferred from argument
opf	the user-defined expression function
in1_strm	input data stream
in2_strm	input data stream
e_in_strm	end flag stream for input data
out_strm	output data stream
e_out_strm	end flag stream for output data

staticEval overload (3)¶

#include "xf_database/static_eval.hpp"

template <
    typename T1,
    typename T2,
    typename T3,
    typename T_O,
    T_O(*)(T1, T2, T3) opf
    >
void staticEval (
    hls::stream <T1>& in1_strm,
    hls::stream <T2>& in2_strm,
    hls::stream <T3>& in3_strm,
    hls::stream <bool>& e_in_strm,
    hls::stream <T_O>& out_strm,
    hls::stream <bool>& e_out_strm
    )

Three stream input static evaluation.

static_eval function calculate the experssion result that user defined. This result will be passed to aggregate module as the input. When calling this API, the T1 T2 T3 T_O are the input/output data types for each parameter of user code. E.g.

// decl
long user_func(int a, int b, int c);
// use
 database::static_eval<int, int, int, long, user_func>(
  in1_strm, in2_strm, in3_strm, e_in_strm,
  out_strm, e_out_strm);

In the above call, three int are the data type of input of user_func , and long is the return type of user_func .

Parameters:

T1	the input stream type, inferred from argument
T2	the input stream type, inferred from argument
T3	the input stream type, inferred from argument
T_O	the output stream type, inferred from argument
opf	the user-defined expression function
in1_strm	input data stream
in2_strm	input data stream
in3_strm	input data stream
e_in_strm	end flag stream for input data
out_strm	output data stream
e_out_strm	end flag stream for output data

staticEval overload (4)¶

#include "xf_database/static_eval.hpp"

template <
    typename T1,
    typename T2,
    typename T3,
    typename T4,
    typename T_O,
    T_O(*)(T1, T2, T3, T4) opf
    >
void staticEval (
    hls::stream <T1>& in1_strm,
    hls::stream <T2>& in2_strm,
    hls::stream <T3>& in3_strm,
    hls::stream <T4>& in4_strm,
    hls::stream <bool>& e_in_strm,
    hls::stream <T_O>& out_strm,
    hls::stream <bool>& e_out_strm
    )

Four stream input static evaluation.

static_eval function calculate the experssion result that user defined. This result will be passed to aggregate module as the input. When calling this API, the T1 T2 T3 T_O are the input/output data types for each parameter of user code. E.g.

// decl
long user_func(int a, int b, int c, int d);
// use
 database::static_eval<int, int, int, int, long, user_func>(
  in1_strm, in2_strm, in3_strm, in3_strm, e_in_strm,
  out_strm, e_out_strm);

In the above call, four int are the data type of input of user_func , and long is the return type of user_func .

Parameters:

T1	the input stream type, inferred from argument
T2	the input stream type, inferred from argument
T3	the input stream type, inferred from argument
T4	the input stream type, inferred from argument
T_O	the output stream type, inferred from argument
opf	the user-defined expression function
in1_strm	input data stream
in2_strm	input data stream
in3_strm	input data stream
in4_strm	input data stream
e_in_strm	end flag stream for input data
out_strm	output data stream
e_out_strm	end flag stream for output data

Primitive APIs in xf::database¶

aggregate¶

aggregate overload (1)¶

aggregate overload (2)¶

aggregate overload (3)¶

bitonicSort¶

bfGen¶

bfGenStream¶

bfCheck¶

combineCol¶

combineCol overload (1)¶

combineCol overload (2)¶

combineCol overload (3)¶

combineCol overload (4)¶

splitCol¶

splitCol overload (1)¶

splitCol overload (2)¶

splitCol overload (3)¶

splitCol overload (4)¶

compoundSort¶

directGroupAggregate¶

directGroupAggregate overload (1)¶

directGroupAggregate overload (2)¶

duplicateCol¶

dynamicEval¶

dynamicEvalV2¶

dynamicFilter¶

dynamicFilter overload (1)¶

dynamicFilter overload (2)¶

dynamicFilter overload (3)¶

dynamicFilter overload (4)¶

groupAggregate¶

groupAggregate overload (1)¶

groupAggregate overload (2)¶

groupAggregate overload (3)¶

groupAggregate overload (4)¶

hashAntiJoin¶

hashGroupAggregate¶

hashJoinMPU¶

hashJoinMPU overload (1)¶

hashJoinMPU overload (2)¶

hashJoinV3¶

hashBuildProbeV3¶

hashJoinV4¶

hashBuildProbeV4¶

hashLookup3¶

hashLookup3 overload (1)¶

hashLookup3 overload (2)¶

hashLookup3 overload (3)¶

hashMultiJoin¶

hashMultiJoinBuildProbe¶

hashMurmur3¶

hashPartition¶

hashSemiJoin¶

insertSort¶

insertSort overload (1)¶

insertSort overload (2)¶

mergeJoin¶

mergeLeftJoin¶

mergeSort¶

mergeSort overload (1)¶

mergeSort overload (2)¶

nestedLoopJoin¶

scanCmpStrCol¶

scanCol¶

scanCol overload (1)¶

scanCol overload (2)¶

scanCol overload (3)¶

scanCol overload (4)¶

scanCol overload (5)¶

scanCol overload (6)¶

scanCol overload (7)¶

scanCol overload (8)¶

scanCol overload (9)¶

scanCol overload (10)¶

scanCol overload (11)¶

scanCol overload (12)¶

scanCol overload (13)¶

staticEval¶

staticEval overload (1)¶

Primitive APIs in `xf::database`¶