Primitive APIs in xf::database
¶
aggregate¶
aggregate overload (1)¶
#include "xf_database/aggregate.hpp"
template < AggregateOp op, typename T > void aggregate ( hls::stream <T>& in_strm, hls::stream <bool>& in_e_strm, hls::stream <T>& out_strm, hls::stream <bool>& out_e_strm )
Overload for most common aggregations.
As shown below in the parameters, this function can calculate one of a range of statistics, including minimal, maximal, average (mean), variance, L1 norm, L2 norm. It can also calculate the sum and count.
The limitation in this function is that the output data type must match with the input data type. In some cases, the sum or count may overflow the output type, but it can be safely covered by other aggregation overloads.
Note that minimum, maximum, sum, count, number of non-zero, L1 norm as well as L2 norm aggregate functions will all be returned as zero when the input is empty.
For group-by aggregation, please refer to the hashGroupAggregateMPU
primitive.
Parameters:
op | the aggregate operator: AOP_SUM, AOP_MAX, AOP_MIN, AOP_MEAN, AOP_VARIANCE, AOP_NORML1 or AOP_NORML2 |
T | the data type of input and output streams |
in_strm | input data stream |
in_e_strm | end flag stream for input data |
out_strm | output data stream |
out_e_strm | end flag stream for output data |
aggregate overload (2)¶
#include "xf_database/aggregate.hpp"
template < AggregateOp op, typename T, typename T2 > void aggregate ( hls::stream <T>& in_strm, hls::stream <bool>& in_e_strm, hls::stream <T2>& out_strm, hls::stream <bool>& out_e_strm )
Aggregate function overload for SUM operation.
The output type can be inferred to be different from input type, this allows the sum value to have more precision bits than input, and avoid overflow.
Note that sum aggregate function will be returned as zero when the input is empty.
For group-by aggregation, please refer to the hashGroupAggregateMPU
primitive.
Parameters:
op | the aggregate operator: AOP_SUM |
T | the data type of input stream, inferred from argument |
T2 | the data type of output stream, inferred from argument |
in_strm | input data stream |
in_e_strm | end flag stream for input data |
out_strm | output data stream |
out_e_strm | end flag stream for output data |
aggregate overload (3)¶
#include "xf_database/aggregate.hpp"
template < AggregateOp op, typename T > void aggregate ( hls::stream <T>& in_strm, hls::stream <bool>& in_e_strm, hls::stream <uint64_t>& out_strm, hls::stream <bool>& out_e_strm )
Aggregate function overload for counting.
This function counts the number of input rows, or number of non-zero input rows, and returns the count as uint64_t
value.
Note that count aggregate function will be returned as zero when the input is empty.
For group-by aggregation, please refer to the hashGroupAggregateMPU
primitive.
Parameters:
op | the aggregate operator: AOP_COUNT or AOP_COUNTNONZEROS |
T | the data type of input stream, inferred from argument |
in_strm | input data stream |
in_e_strm | end flag stream for input data |
out_strm | output data stream |
out_e_strm | end flag stream for output data |
bitonicSort¶
#include "xf_database/bitonic_sort.hpp"
template < typename Key_Type, int BitonicSortNumber > void bitonicSort ( hls::stream <Key_Type>& kin_strm, hls::stream <bool>& kin_strm_end, hls::stream <Key_Type>& kout_strm, hls::stream <bool>& kout_strm_end, bool order )
Bitonic sort is parallel algorithm for sorting.
This algorithms can sort a large vector of data in parallel, and by cascading the sorters into a network it can offer good theoretical throughput.
Although this algorithms is suitable for FPGA acceleration, it does not work well with the row-by-row streaming interface in database library. Please consider this primitive as a demo, and only use it by deriving from this code. Alternative sorting algorithms in this library are insertSort
and mergeSort
.
Parameters:
Key_Type | the input and output key type |
BitonicSortNumber | the parallel number |
kin_strm | input key stream |
kin_strm_end | end flag stream for input key |
kout_strm | output key stream |
kout_strm_end | end flag stream for output key |
order | 1 for ascending or 0 for descending sort |
bfGen¶
#include "xf_database/bloom_filter.hpp"
template < bool IS_BRAM, int STR_IN_W, int BV_W > void bfGen ( hls::stream <ap_uint <STR_IN_W>>& msg_strm, hls::stream <bool>& in_e_strm, ap_uint <IS_BRAM?16:72>* bit_vector_ptr0, ap_uint <IS_BRAM?16:72>* bit_vector_ptr1, ap_uint <IS_BRAM?16:72>* bit_vector_ptr2 )
Generate the bloomfilter in on-chip RAM blocks.
This primitive calculates hash of input values, and marks corresponding bits in the on-chip RAM blocks. RAM blocks can be configured to be 18-bit BRAM or 72-bit URAM.
The bloom-filter bit vectors are passed as three pointers, and behind the scene, one hash value is calculated and manipulated into three distint marker locatins in these vectors.
To check for existance of a value with generated vector, use the bfCheck
primitive.
Parameters:
STR_IN_W | W width of the streamed input message, e.g., W=512. |
BV_W | width of the hash value. ptr0, ptr1 and ptr2 should point at MEM_SPACE=2^BV_W (bit). |
msg_strm | input message stream. |
in_e_strm | the flag that indicate the end of input message stream. |
bit_vector_ptr0 | the pointer of bit_vector0. |
bit_vector_ptr1 | the pointer of bit_vector1. |
bit_vector_ptr2 | the pointer of bit_vector2. |
bfGenStream¶
#include "xf_database/bloom_filter.hpp"
template < bool IS_BRAM, int STR_IN_W, int BV_W > void bfGenStream ( hls::stream <ap_uint <STR_IN_W>>& msg_strm, hls::stream <bool>& in_e_strm, hls::stream <ap_uint <IS_BRAM?16:64>>& bit_vet_strm, hls::stream <bool>& out_e_strm )
Generate the bloomfilter in on-chip RAM blocks, and emit the vectors upon finish.
This primitive calculates hash values of input, and marks corresponding bits in the on-chip RAM blocks. RAM blocks can be configured to be 18-bit BRAM or 72-bit URAM.
The bloom-filter bit vectors are built into internally allocated buffers, and streamed out after the filter has been fully built.
Parameters:
STR_IN_W | W width of the streamed input message, e.g., W=512. |
BV_W | width of the hash value. bit_vet_strm should send out MEM_SPACE=2^BV_W (bit) data in total. |
msg_strm | input message stream. |
in_e_strm | the flag that indicate the end of input message stream. |
bit_vet_strm | the output stream of bit_vector. |
out_e_strm | the flag that indicate the end of output stream. |
bfCheck¶
#include "xf_database/bloom_filter.hpp"
template < bool IS_BRAM, int STR_IN_W, int BV_W > void bfCheck ( hls::stream <ap_uint <STR_IN_W>>& msg_strm, hls::stream <bool>& in_e_strm, ap_uint <IS_BRAM?16:72>* bit_vector_ptr0, ap_uint <IS_BRAM?16:72>* bit_vector_ptr1, ap_uint <IS_BRAM?16:72>* bit_vector_ptr2, hls::stream <bool>& out_v_strm, hls::stream <bool>& out_e_strm )
Check existance of value using bloom-filter vectors.
This primitive is designed to work with the bloom-filter vectors generated by the bfGen
primitive. Basically, it detects the existance of value by hashing it and check for the corresponding vector bits. When hit, it is likely to be in the set of generating values, otherwise, it cannot be element of the set. RAM blocks can be configured to be 18-bit BRAM or 72-bit URAM, the setting must match bfGen
.
Parameters:
IS_BRAM | choose which types of memory to use. True for BRAM. False for URAM |
STR_IN_W | W width of the streamed input message, e.g., W=512. |
BV_W | width of the hash value. ptr0, ptr1 and ptr2 should point at MEM_SPACE=2^BV_W (bit). |
msg_strm | input message stream. |
in_e_strm | the flag that indicate the end of input message stream. |
bit_vector_ptr0 | the pointer of bit_vector0. |
bit_vector_ptr1 | the pointer of bit_vector1. |
bit_vector_ptr2 | the pointer of bit_vector2. |
out_v_strm | the output stream that indicate whether the value may exist <1 for true, 0 for false>. |
out_e_strm | the output end flag stream. |
combineCol¶
combineCol overload (1)¶
#include "xf_database/combine_split_col.hpp"
template < int _WCol1, int _WCol2, int _WColOut > void combineCol ( hls::stream <ap_uint <_WCol1>>& din1_strm, hls::stream <ap_uint <_WCol2>>& din2_strm, hls::stream <bool>& in_e_strm, hls::stream <ap_uint <_WColOut>>& dout_strm, hls::stream <bool>& out_e_strm )
Combines two columns into one.
Columns are passed through streams of certain width in hardware. Normally, each column uses one stream, but for some primitives, the processing semantic abstract the columns into a couple of groups, and trait each group as a whole. To make calling such primitives easier, the combine column primitive fuses data of same row but different columns into one wide column.
The counter part of this primitive is splitCol
.
Parameters:
_WCol1 | the width of 1st input stream. |
_WCol2 | the width of 2nd input stream. |
_WColOut | the width of output stream. |
din1_strm | 1st input data stream. |
din2_strm | 2nd input data stream. |
in_e_strm | end flag stream for input data. |
dout_strm | output data stream. |
out_e_strm | end flag stream for output data. |
combineCol overload (2)¶
#include "xf_database/combine_split_col.hpp"
template < int _WCol1, int _WCol2, int _WCol3, int _WColOut > void combineCol ( hls::stream <ap_uint <_WCol1>>& din1_strm, hls::stream <ap_uint <_WCol2>>& din2_strm, hls::stream <ap_uint <_WCol3>>& din3_strm, hls::stream <bool>& in_e_strm, hls::stream <ap_uint <_WColOut>>& dout_strm, hls::stream <bool>& out_e_strm )
Combines three columns into one.
Columns are passed through streams of certain width in hardware. Normally, each column uses one stream, but for some primitives, the processing semantic abstract the columns into a couple of groups, and trait each group as a whole. To make calling such primitives easier, the combine column primitive fuses data of same row but different columns into one wide column.
The counter part of this primitive is splitCol
.
Parameters:
_WCol1 | the width of 1st input stream. |
_WCol2 | the width of 2nd input stream. |
_WCol3 | the width of 3rd input stream. |
_WColOut | the width of output stream. |
din1_strm | 1st input data stream. |
din2_strm | 2nd input data stream. |
din3_strm | 3rd input data stream. |
in_e_strm | end flag stream for input data. |
dout_strm | output data stream. |
out_e_strm | end flag stream for output data. |
combineCol overload (3)¶
#include "xf_database/combine_split_col.hpp"
template < int _WCol1, int _WCol2, int _WCol3, int _WCol4, int _WColOut > void combineCol ( hls::stream <ap_uint <_WCol1>>& din1_strm, hls::stream <ap_uint <_WCol2>>& din2_strm, hls::stream <ap_uint <_WCol3>>& din3_strm, hls::stream <ap_uint <_WCol4>>& din4_strm, hls::stream <bool>& in_e_strm, hls::stream <ap_uint <_WColOut>>& dout_strm, hls::stream <bool>& out_e_strm )
Combines four columns into one.
Columns are passed through streams of certain width in hardware. Normally, each column uses one stream, but for some primitives, the processing semantic abstract the columns into a couple of groups, and trait each group as a whole. To make calling such primitives easier, the combine column primitive fuses data of same row but different columns into one wide column.
The counter part of this primitive is splitCol
.
Parameters:
_WCol1 | the width of 1st input stream. |
_WCol2 | the width of 2nd input stream. |
_WCol3 | the width of 3rd input stream. |
_WCol4 | the width of 4th input stream. |
_WColOut | the width of output stream. |
din1_strm | 1st input data stream. |
din2_strm | 2nd input data stream. |
din3_strm | 3rd input data stream. |
din4_strm | 4th input data stream. |
in_e_strm | end flag stream for input data. |
dout_strm | output data stream. |
out_e_strm | end flag stream for output data. |
combineCol overload (4)¶
#include "xf_database/combine_split_col.hpp"
template < int _WCol1, int _WCol2, int _WCol3, int _WCol4, int _WCol5, int _WColOut > void combineCol ( hls::stream <ap_uint <_WCol1>>& din1_strm, hls::stream <ap_uint <_WCol2>>& din2_strm, hls::stream <ap_uint <_WCol3>>& din3_strm, hls::stream <ap_uint <_WCol4>>& din4_strm, hls::stream <ap_uint <_WCol5>>& din5_strm, hls::stream <bool>& in_e_strm, hls::stream <ap_uint <_WColOut>>& dout_strm, hls::stream <bool>& out_e_strm )
Combines five columns into one.
Columns are passed through streams of certain width in hardware. Normally, each column uses one stream, but for some primitives, the processing semantic abstract the columns into a couple of groups, and trait each group as a whole. To make calling such primitives easier, the combine column primitive fuses data of same row but different columns into one wide column.
The counter part of this primitive is splitCol
.
Parameters:
_WCol1 | the width of 1st input stream. |
_WCol2 | the width of 2nd input stream. |
_WCol3 | the width of 3rd input stream. |
_WCol4 | the width of 4th input stream. |
_WCol5 | the width of 5th input stream. |
_WColOut | the width of output stream. |
din1_strm | 1st input data stream. |
din2_strm | 2nd input data stream. |
din3_strm | 3rd input data stream. |
din4_strm | 4th input data stream. |
din5_strm | 5th input data stream. |
in_e_strm | end flag stream for input data. |
dout_strm | output data stream. |
out_e_strm | end flag stream for output data. |
splitCol¶
splitCol overload (1)¶
#include "xf_database/combine_split_col.hpp"
template < int _WColIn, int _WCol1, int _WCol2 > void splitCol ( hls::stream <ap_uint <_WColIn>>& din_strm, hls::stream <bool>& in_e_strm, hls::stream <ap_uint <_WCol1>>& dout1_strm, hls::stream <ap_uint <_WCol2>>& dout2_strm, hls::stream <bool>& out_e_strm )
Split previously combined columns into two.
Columns are passed through streams of certain width in hardware. Normally, each column uses one stream, but for some primitives, the processing semantic abstract the columns into a couple of groups, and trait each group as a whole. To make calling such primitives easier, the split column primitive breaks the wide output stream into independent column-specific streams.
The counter part of this primitive is combineCol
.
Parameters:
_WColIn | the width of input stream. |
_WCol1 | the width of 1st output stream. |
_WCol2 | the width of 2nd output stream. |
din_strm | input data stream. |
in_e_strm | end flag stream for input data. |
dout1_strm | 1st output data stream. |
dout2_strm | 2nd output data stream. |
out_e_strm | end flag stream for output data. |
splitCol overload (2)¶
#include "xf_database/combine_split_col.hpp"
template < int _WColIn, int _WCol1, int _WCol2, int _WCol3 > void splitCol ( hls::stream <ap_uint <_WColIn>>& din_strm, hls::stream <bool>& in_e_strm, hls::stream <ap_uint <_WCol1>>& dout1_strm, hls::stream <ap_uint <_WCol2>>& dout2_strm, hls::stream <ap_uint <_WCol3>>& dout3_strm, hls::stream <bool>& out_e_strm )
Split previously combined columns into three.
Columns are passed through streams of certain width in hardware. Normally, each column uses one stream, but for some primitives, the processing semantic abstract the columns into a couple of groups, and trait each group as a whole. To make calling such primitives easier, the split column primitive breaks the wide output stream into independent column-specific streams.
The counter part of this primitive is combineCol
.
Parameters:
_WColIn | the width of input stream. |
_WCol1 | the width of 1st output stream. |
_WCol2 | the width of 2nd output stream. |
_WCol3 | the width of 3rd output stream. |
din_strm | input data stream |
in_e_strm | end flag stream for input data |
dout1_strm | 1st output data stream |
dout2_strm | 2nd output data stream |
dout3_strm | 3rd output data stream |
out_e_strm | end flag stream for output data |
splitCol overload (3)¶
#include "xf_database/combine_split_col.hpp"
template < int _WColIn, int _WCol1, int _WCol2, int _WCol3, int _WCol4 > void splitCol ( hls::stream <ap_uint <_WColIn>>& din_strm, hls::stream <bool>& in_e_strm, hls::stream <ap_uint <_WCol1>>& dout1_strm, hls::stream <ap_uint <_WCol2>>& dout2_strm, hls::stream <ap_uint <_WCol3>>& dout3_strm, hls::stream <ap_uint <_WCol4>>& dout4_strm, hls::stream <bool>& out_e_strm )
Split previously combined columns into four.
Columns are passed through streams of certain width in hardware. Normally, each column uses one stream, but for some primitives, the processing semantic abstract the columns into a couple of groups, and trait each group as a whole. To make calling such primitives easier, the split column primitive breaks the wide output stream into independent column-specific streams.
The counter part of this primitive is combineCol
.
Parameters:
_WColIn | the width of input stream. |
_WCol1 | the width of 1st output stream. |
_WCol2 | the width of 2nd output stream. |
_WCol3 | the width of 3rd output stream. |
_WCol4 | the width of 4th output stream. |
din_strm | input data stream |
in_e_strm | end flag stream for input data |
dout1_strm | 1st output data stream |
dout2_strm | 2nd output data stream |
dout3_strm | 3rd output data stream |
dout4_strm | 4th output data stream |
out_e_strm | end flag stream for output data |
splitCol overload (4)¶
#include "xf_database/combine_split_col.hpp"
template < int _WColIn, int _WCol1, int _WCol2, int _WCol3, int _WCol4, int _WCol5 > void splitCol ( hls::stream <ap_uint <_WColIn>>& din_strm, hls::stream <bool>& in_e_strm, hls::stream <ap_uint <_WCol1>>& dout1_strm, hls::stream <ap_uint <_WCol2>>& dout2_strm, hls::stream <ap_uint <_WCol3>>& dout3_strm, hls::stream <ap_uint <_WCol4>>& dout4_strm, hls::stream <ap_uint <_WCol5>>& dout5_strm, hls::stream <bool>& out_e_strm )
Split previously combined columns into five.
Columns are passed through streams of certain width in hardware. Normally, each column uses one stream, but for some primitives, the processing semantic abstract the columns into a couple of groups, and trait each group as a whole. To make calling such primitives easier, the split column primitive breaks the wide output stream into independent column-specific streams.
The counter part of this primitive is combineCol
.
Parameters:
_WColIn | the width of input stream. |
_WCol1 | the width of 1st output stream. |
_WCol2 | the width of 2nd output stream. |
_WCol3 | the width of 3rd output stream. |
_WCol4 | the width of 4th output stream. |
_WCol5 | the width of 5th output stream. |
din_strm | input data stream |
in_e_strm | end flag stream for input data |
dout1_strm | 1st output data stream |
dout2_strm | 2nd output data stream |
dout3_strm | 3rd output data stream |
dout4_strm | 4th output data stream |
dout5_strm | 5th output data stream |
out_e_strm | end flag stream for output data |
compoundSort¶
#include "xf_database/compound_sort.hpp"
template < typename KEY_TYPE, int SORT_LEN, int INSERT_LEN > void compoundSort ( bool order, hls::stream <KEY_TYPE>& inKeyStrm, hls::stream <bool>& inEndStrm, hls::stream <KEY_TYPE>& outKeyStrm, hls::stream <bool>& outEndStrm )
compoundSort sort the key based on insert sort and merge sort.
Parameters:
KEY_TYPE | key type |
SORT_LEN | Maximum support sort length, between 16K to 2M, but it must be an integer power of 2. |
INSERT_LEN | insert sort length, maximum length 1024 (recommend) |
order | 1:sort ascending 0:sort descending |
inKeyStrm | input key stream |
inEndStrm | end flag stream for input key |
outKeyStrm | output key-sorted stream |
outEndStrm | end flag stream for output key |
directGroupAggregate¶
directGroupAggregate overload (1)¶
#include "xf_database/direct_group_aggregate.hpp"
template < int op, int DATINW, int DATOUTW, int DIRECTW > void directGroupAggregate ( hls::stream <ap_uint <DATINW>>& vin_strm, hls::stream <bool>& in_e_strm, hls::stream <ap_uint <DATOUTW>>& vout_strm, hls::stream <bool>& out_e_strm, hls::stream <ap_uint <DIRECTW>>& kin_strm, hls::stream <ap_uint <DIRECTW>>& kout_strm )
Group-by aggregation with limited key width.
This primitive is suitable for scenario in which the width of group key is limited, so that a on-chip array directly addressed by the key can be created to store the aggregation value. The total storage required is row size * (2 ^ key width)
.
The following aggregate operators are supported:
- AOP_MAX
- AOP_MIN
- AOP_SUM
- AOP_COUNT
- AOP_MEAN
- AOP_VARIANCE
- AOP_NORML1
- AOP_NORML2
The return value is typed the same as the input payload value.
Caution
Attention should be paid for overflow in sum or count.
Parameters:
op | the aggregate operator, as defined in AggregateOp enum. |
DATINW | the width of input payload |
DATOUTW | the width of output aggr-payload |
DIRECTW | the width of input and output key |
vin_strm | value input |
in_e_strm | end flag stream for input data |
vout_strm | value output |
out_e_strm | end flag stream for output data |
kin_strm | group-by key input |
kout_strm | group-by key output |
directGroupAggregate overload (2)¶
#include "xf_database/direct_group_aggregate.hpp"
template < int DATINW, int DATOUTW, int DIRECTW > void directGroupAggregate ( ap_uint <32> op, hls::stream <ap_uint <DATINW>>& vin_strm, hls::stream <bool>& in_e_strm, hls::stream <ap_uint <DATOUTW>>& vout_strm, hls::stream <bool>& out_e_strm, hls::stream <ap_uint <DIRECTW>>& kin_strm, hls::stream <ap_uint <DIRECTW>>& kout_strm )
Group-by aggregation with limited key width, runtime programmable.
This primitive is suitable for scenario in which the width of group key is limited, so that a on-chip array directly addressed by the key can be created to store the aggregation value. The total storage required is row size * (2 ^ key width)
.
The following aggregate operators are supported:
- AOP_MAX
- AOP_MIN
- AOP_SUM
- AOP_COUNT
- AOP_MEAN
- AOP_NORM1
The return value is typed the same as the input payload value.
Caution
Attention should be paid for overflow in sum or count.
Parameters:
DATINW | the width of input payload |
DATOUTW | the width of output aggr-payload |
DIRECTW | the width of input and output key |
op | the aggregate operator, as defined in AggregateOp enum. |
vin_strm | value input |
in_e_strm | end flag stream for input data |
vout_strm | value output |
out_e_strm | end flag stream for output data |
kin_strm | group-by key input |
kout_strm | group-by key output |
duplicateCol¶
#include "xf_database/duplicate_col.hpp"
template <int W> void duplicateCol ( hls::stream <ap_uint <W>>& d_in_strm, hls::stream <bool>& e_in_strm, hls::stream <ap_uint <W>>& d0_out_strm, hls::stream <ap_uint <W>>& d1_out_strm, hls::stream <bool>& e_out_strm )
Duplicate one column into two columns.
Parameters:
W | column data width in bits. |
d_in_strm | input data stream. |
e_in_strm | end flag for input data. |
d0_out_strm | output data stream 0. |
d1_out_strm | output data stream 1. |
e_out_strm | end flag for output data. |
dynamicEval¶
#include "xf_database/dynamic_eval.hpp"
template < typename TStrm1, typename TStrm2, typename TStrm3, typename TStrm4, typename TConst1, typename TConst2, typename TConst3, typename TConst4, typename TOut > void dynamicEval ( ap_uint <289> config, hls::stream <TStrm1>& strm_in1, hls::stream <TStrm2>& strm_in2, hls::stream <TStrm3>& strm_in3, hls::stream <TStrm4>& strm_in4, hls::stream <bool>& strm_in_end, hls::stream <TOut>& strm_out, hls::stream <bool>& strm_out_end )
Dynamic expression evaluation.
This primitive has four fixed number of column inputs, and allows up to four constants to be specified via configuration. The operation between the column values and constants can be defined dynamically through the configuration at run-time. The same configuration is used for all rows until the end of input.
The constant numbers are assumed to be no more than 32-bits.
For the definition of the config word, please refer to the “Design Internal” Section of the document and the corresponding test in L1/tests
.
Parameters:
TStrm1 | Type of input Stream1 |
TStrm2 | Type of input Stream2 |
TStrm3 | Type of input Stream3 |
TStrm4 | Type of input Stream4 |
TConst1 | Type of input Constant1 |
TConst2 | Type of input Constant2 |
TConst3 | Type of input Constant3 |
TConst4 | Type of input Constant4 |
TOut | Type of Compute Result |
config | configuration bits of ops and constants. |
strm_in1 | input Stream1 |
strm_in2 | input Stream2 |
strm_in3 | input Stream3 |
strm_in4 | input Stream4 |
strm_in_end | end flag of input stream |
strm_out | output Stream |
strm_out_end | end flag of output stream |
dynamicEvalV2¶
#include "xf_database/dynamic_eval_v2.hpp"
template <typename T> void dynamicEvalV2 ( hls::stream <ap_uint <32>>& cfgs, hls::stream <T>& col0_istrm, hls::stream <T>& col1_istrm, hls::stream <T>& col2_istrm, hls::stream <T>& col3_istrm, hls::stream <bool>& e_istrm, hls::stream <T>& ret_ostrm, hls::stream <bool>& e_ostrm )
Dynamic expression evaluation version 2.
This primitive has four fixed number of column inputs, and allows up to four constants to be specified via configuration. The operation between the column values and constants can be defined dynamically through the configuration at run-time. The same configuration is used for all rows until the end of input.
The constant numbers are assumed to be no more than 32-bits.
Parameters:
T | Type of input streams |
cfgs | configuration bits of ops and constants. |
col0_istrm | input Stream1 |
col1_istrm | input Stream2 |
col2_istrm | input Stream3 |
col3_istrm | input Stream4 |
e_istrm | end flag of input stream |
ret_ostrm | output Stream |
e_ostrm | end flag of output stream |
dynamicFilter¶
dynamicFilter overload (1)¶
#include "xf_database/dynamic_filter.hpp"
template < int W, int WP > void dynamicFilter ( hls::stream <ap_uint <32>>& filter_cfg_strm, hls::stream <ap_uint <W>>& v0_strm, hls::stream <ap_uint <W>>& v1_strm, hls::stream <ap_uint <W>>& v2_strm, hls::stream <ap_uint <W>>& v3_strm, hls::stream <ap_uint <WP>>& pay_in_strm, hls::stream <bool>& e_in_strm, hls::stream <ap_uint <WP>>& pay_out_strm, hls::stream <bool>& e_pay_out_strm )
Filter payloads according to conditions set during run-time.
This primitive, with its 3 overloads, supports filtering rows using up to four columns as conditions. The payload columns should be grouped together into this primitive, using combineCol
primitive, and its total width is not explicitly limited (but naturally bound by resources).
The filter conditions consists of whether each of the conditions is within a given range, and relations between any two conditions. The configuration is set once before processing the rows, and reused until the last row. For configuration generation, please refer to the “Design Internals” Section of the document and corresponding test case of this primitive.
Parameters:
W | width of all condition column streams, in bits. |
WP | width of payload column, in bits. |
filter_cfg_strm | stream of raw config bits for this primitive. |
v0_strm | condition column stream 0. |
v1_strm | condition column stream 1. |
v2_strm | condition column stream 2. |
v3_strm | condition column stream 3. |
pay_in_strm | payload input stream. |
e_in_strm | end flag stream for input table. |
pay_out_strm | payload output stream. |
e_pay_out_strm | end flag stream for payload output. |
dynamicFilter overload (2)¶
#include "xf_database/dynamic_filter.hpp"
template < int W, int WP > void dynamicFilter ( hls::stream <ap_uint <32>>& filter_cfg_strm, hls::stream <ap_uint <W>>& v0_strm, hls::stream <ap_uint <W>>& v1_strm, hls::stream <ap_uint <W>>& v2_strm, hls::stream <ap_uint <WP>>& pay_in_strm, hls::stream <bool>& e_in_strm, hls::stream <ap_uint <WP>>& pay_out_strm, hls::stream <bool>& e_pay_out_strm )
Filter payloads according to conditions set during run-time.
This function is a wrapper-around the four-condition-column dynamic_filter, just duplicating the columns to feed all its inputs. Thus they share the same configuration bit pattern. All op related to the 4th column should be set to FOP_DC
.
Parameters:
W | width of all condition column streams, in bits. |
WP | width of payload column, in bits. |
filter_cfg_strm | stream of raw config bits for this primitive. |
v0_strm | condition column stream 0. |
v1_strm | condition column stream 1. |
v2_strm | condition column stream 2. |
pay_in_strm | payload input stream. |
e_in_strm | end flag stream for input table. |
pay_out_strm | payload output stream. |
e_pay_out_strm | end flag stream for payload output. |
dynamicFilter overload (3)¶
#include "xf_database/dynamic_filter.hpp"
template < int W, int WP > void dynamicFilter ( hls::stream <ap_uint <32>>& filter_cfg_strm, hls::stream <ap_uint <W>>& v0_strm, hls::stream <ap_uint <W>>& v1_strm, hls::stream <ap_uint <WP>>& pay_in_strm, hls::stream <bool>& e_in_strm, hls::stream <ap_uint <WP>>& pay_out_strm, hls::stream <bool>& e_pay_out_strm )
Filter payloads according to conditions set during run-time.
This function is a wrapper-around the four-condition-column dynamic_filter, just duplicating the columns to feed all its inputs. Thus they share the same configuration bit pattern. All op related to the 3rd and 4th columns should be set to FOP_DC
.
Parameters:
W | width of all condition column streams, in bits. |
WP | width of payload column, in bits. |
filter_cfg_strm | stream of raw config bits for this primitive. |
v0_strm | condition column stream 0. |
v1_strm | condition column stream 1. |
pay_in_strm | payload input stream. |
e_in_strm | end flag stream for input table. |
pay_out_strm | payload output stream. |
e_pay_out_strm | end flag stream for payload output. |
dynamicFilter overload (4)¶
#include "xf_database/dynamic_filter.hpp"
template < int W, int WP > void dynamicFilter ( hls::stream <ap_uint <32>>& filter_cfg_strm, hls::stream <ap_uint <W>>& v0_strm, hls::stream <ap_uint <WP>>& pay_in_strm, hls::stream <bool>& e_in_strm, hls::stream <ap_uint <WP>>& pay_out_strm, hls::stream <bool>& e_pay_out_strm )
Filter payloads according to conditions set during run-time.
This function is a wrapper-around the four-condition-column dynamic_filter, just duplicating the columns to feed all its inputs. Thus they share the same configuration bit pattern. All op related to the 2nd to 4th columns should be set to FOP_DC
.
Parameters:
W | width of all condition column streams, in bits. |
WP | width of payload column, in bits. |
filter_cfg_strm | stream of raw config bits for this primitive. |
v0_strm | condition column stream 0. |
pay_in_strm | payload input stream. |
e_in_strm | end flag stream for input table. |
pay_out_strm | payload output stream. |
e_pay_out_strm | end flag stream for payload output. |
groupAggregate¶
groupAggregate overload (1)¶
#include "xf_database/group_aggregate.hpp"
template < AggregateOp op, typename T, typename KEY_T > void groupAggregate ( hls::stream <T>& din_strm, hls::stream <bool>& in_e_strm, hls::stream <T>& dout_strm, hls::stream <bool>& out_e_strm, hls::stream <KEY_T>& kin_strm, hls::stream <KEY_T>& kout_strm )
group aggregate function that returns same type as input
Parameters:
op | the aggregate operator: AOP_MAX, AOP_MIN, AOP_MEAN, AOP_VARIANCE, AOP_NORML1 or AOP_NORML2 |
T | the data type of input and output streams |
KEY_T | the input and output indexing key type |
din_strm | input data stream |
in_e_strm | end flag stream for input data |
dout_strm | output data stream |
out_e_strm | end flag stream for output data |
kin_strm | input indexing key stream |
kout_strm | output indexing key stream |
groupAggregate overload (2)¶
#include "xf_database/group_aggregate.hpp"
template < AggregateOp op, typename T, typename T2, typename KEY_T > void groupAggregate ( hls::stream <T>& in_strm, hls::stream <bool>& in_e_strm, hls::stream <T2>& out_strm, hls::stream <bool>& out_e_strm, hls::stream <KEY_T>& kin_strm, hls::stream <KEY_T>& kout_strm )
group aggregate function that returns different type as input
Parameters:
op | the aggregate operator: AOP_SUM |
T | the input stream type, inferred from argument |
T2 | the output stream type, inferred from argument |
KEY_T | the input and output stream type, inferred from argument |
in_strm | input data stream |
in_e_strm | end flag stream for input data |
out_strm | output data stream |
out_e_strm | end flag stream for output data |
kin_strm | input indexing key stream |
kout_strm | output indexing key stream |
groupAggregate overload (3)¶
#include "xf_database/group_aggregate.hpp"
template < AggregateOp op, typename T, typename KEY_T > void groupAggregate ( hls::stream <T>& in_strm, hls::stream <bool>& in_e_strm, hls::stream <uint64_t>& out_strm, hls::stream <bool>& out_e_strm, hls::stream <KEY_T>& kin_strm, hls::stream <KEY_T>& kout_strm )
aggregate function that counts and returns uint64_t
Parameters:
op | the aggregate operator: AOP_COUNT or AOP_COUNTNONZEROS |
T | the input stream type, inferred from argument |
KEY_T | the input and output stream type, inferred from argument |
in_strm | input data stream |
in_e_strm | end flag stream for input data |
out_strm | output data stream |
out_e_strm | end flag stream for output data |
kin_strm | input indexing key stream |
kout_strm | output indexing key stream |
groupAggregate overload (4)¶
#include "xf_database/group_aggregate.hpp"
template < AggregateOp op, typename T, typename KEY_T > void groupAggregate ( hls::stream <T>& in_strm, hls::stream <bool>& isnull_strm, hls::stream <bool>& in_e_strm, hls::stream <uint64_t>& out_strm, hls::stream <bool>& out_e_strm, hls::stream <KEY_T>& kin_strm, hls::stream <KEY_T>& kout_strm )
aggregate function that counts and returns uint64_t
Parameters:
op | the aggregate operator: AOP_COUNT |
T | the input stream type, inferred from argument |
KEY_T | the input and output stream type, inferred from argument |
in_strm | input data stream |
isnull_strm | flag to indicate the input data is null or not |
in_e_strm | end flag stream for input data |
out_strm | output data stream |
out_e_strm | end flag stream for output data |
kin_strm | input indexing key stream |
kout_strm | output indexing key stream |
hashAntiJoin¶
#include "xf_database/hash_anti_join.hpp"
template < int HASH_MODE, int KEYW, int PW, int S_PW, int B_PW, int HASHWH, int HASHWL, int ARW, int CH_NM > void hashAntiJoin ( hls::stream <ap_uint <KEYW>> k0_strm_arry [CH_NM], hls::stream <ap_uint <PW>> p0_strm_arry [CH_NM], hls::stream <bool> e0_strm_arry [CH_NM], ap_uint <256>* htb0_buf, ap_uint <256>* htb1_buf, ap_uint <256>* htb2_buf, ap_uint <256>* htb3_buf, ap_uint <256>* htb4_buf, ap_uint <256>* htb5_buf, ap_uint <256>* htb6_buf, ap_uint <256>* htb7_buf, ap_uint <256>* stb0_buf, ap_uint <256>* stb1_buf, ap_uint <256>* stb2_buf, ap_uint <256>* stb3_buf, ap_uint <256>* stb4_buf, ap_uint <256>* stb5_buf, ap_uint <256>* stb6_buf, ap_uint <256>* stb7_buf, hls::stream <ap_uint <32>>& pu_begin_status_strms, hls::stream <ap_uint <32>>& pu_end_status_strms, hls::stream <ap_uint <KEYW+S_PW+B_PW>>& j_strm, hls::stream <bool>& j_e_strm )
Multi-PU Hash-Anti-Join primitive, using multiple DDR/HBM buffers.
This primitive shares most of the structure of hashJoinV3
, but performs anti-join instead of inner-join. Both inner and outer table should be send to this primitve once, starting with the inner table.
Parameters:
HASH_MODE | 0 for radix and 1 for Jenkin’s Lookup3 hash. |
KEYW | width of key, in bit. |
PW | width of max payload, in bit. |
S_PW | width of payload of small table. |
B_PW | width of payload of big table. |
HASHWH | number of hash bits used for PU/buffer selection, 1~3. |
HASHWL | number of hash bits used for hash-table in PU. |
ARW | width of address, larger than 24 is suggested. |
CH_NM | number of input channels, 1,2,4. |
k0_strm_arry | input of key columns of both tables. |
p0_strm_arry | input of payload columns of both tables. |
e0_strm_arry | input of end signal of both tables. |
htb0_buf | HBM/DDR buffer of hash_table0 |
htb1_buf | HBM/DDR buffer of hash_table1 |
htb2_buf | HBM/DDR buffer of hash_table2 |
htb3_buf | HBM/DDR buffer of hash_table3 |
htb4_buf | HBM/DDR buffer of hash_table4 |
htb5_buf | HBM/DDR buffer of hash_table5 |
htb6_buf | HBM/DDR buffer of hash_table6 |
htb7_buf | HBM/DDR buffer of hash_table7 |
stb0_buf | HBM/DDR buffer of PU0 |
stb1_buf | HBM/DDR buffer of PU1 |
stb2_buf | HBM/DDR buffer of PU2 |
stb3_buf | HBM/DDR buffer of PU3 |
stb4_buf | HBM/DDR buffer of PU4 |
stb5_buf | HBM/DDR buffer of PU5 |
stb6_buf | HBM/DDR buffer of PU6 |
stb7_buf | HBM/DDR buffer of PU7 |
pu_begin_status_strms | the 1st element is the depth of each hash, the 2nd element is joined number |
pu_end_status_strms | |
j_strm | output of joined result |
j_e_strm | end flag of joined result |
hashGroupAggregate¶
#include "xf_database/hash_group_aggregate.hpp"
template < int _WKey, int _KeyNM, int _WPay, int _PayNM, int _HashMode, int _WHashHigh, int _WHashLow, int _CHNM, int _Wcnt, int _WBuffer, int _BurstLenW = 32, int _BurstLenR = 32 > void hashGroupAggregate ( hls::stream <ap_uint <_WKey>> strm_key_in [_CHNM][_KeyNM], hls::stream <ap_uint <_WPay>> strm_pld_in [_CHNM][_PayNM], hls::stream <bool> strm_e_in [_CHNM], hls::stream <ap_uint <32>>& config, hls::stream <ap_uint <32>>& result_info, ap_uint <_WBuffer>* ping_buf0, ap_uint <_WBuffer>* ping_buf1, ap_uint <_WBuffer>* ping_buf2, ap_uint <_WBuffer>* ping_buf3, ap_uint <_WBuffer>* pong_buf0, ap_uint <_WBuffer>* pong_buf1, ap_uint <_WBuffer>* pong_buf2, ap_uint <_WBuffer>* pong_buf3, hls::stream <ap_uint <_WKey>> aggr_key_out [_KeyNM], hls::stream <ap_uint <_WPay>> aggr_pld_out [3][_PayNM], hls::stream <bool>& strm_e_out )
Generic hash group aggregate primitive.
With this primitive, the max number of lines of aggregate table is bound by the AXI buffer size.
The group aggregation values are updated inside the chip, and when a hash-bucket overflows, the overflowed rows are spilled into external buffers. The overflow buffer will be automatically re-scanned, and within each round, a number of distinct groups will be aggregated and emitted. This algorithm ends when the overflow buffer is empty and all groups are aggregated.
Attention
- This module can accept multiple input row of key and payload pair per cycle.
- The max distinct groups aggregated in one pass is
2 ^ (1 + _WHash)
. - When the width of the input stream is not fully used, data should be aligned to the little-end.
- It is highly recommended to assign the ping buffer and pong buffer in different HBM banks, input and output in different DDR banks for a better performance.
- The max number of lines of aggregate table cannot bigger than the max DDR/HBM SIZE used in this design.
- When the bit-width of group key is known to be small, say 10-bit, please consider the
directAggregate
primitive, which offers smaller utilization, and requires no external buffer access.
Parameters:
_WKey | width of key, in bit. |
_KeyNM | maximum number of key column, maximum is 8. |
_WPay | width of max payload, in bit. |
_PayNM | maximum number of payload column, maximum is 8. |
_HashMode | control hash algotithm, 0: radix 1: lookup3. |
_WHashHigh | number of hash bits used for dispatch pu. |
_WHashLow | number of hash bits used for hash-table. |
_CHNM | number of input channels. |
_WBuffer | width of HBM/DDR buffer(ping_buf and pong_buf). |
_BurstLenW | burst len of writting unhandled data. |
_BurstLenR | burst len of reloading unhandled data. |
strm_key_in | input of key streams. |
strm_pld_in | input of payload streams. |
strm_e_in | input of end signal. |
config | information for initializing primitive, contains op for maximum of 8 columns, key column number(less than 8), pld column number(less than 8) and initial aggregate cnt. |
result_info | result information at kernel end, contains op, key_column, pld_column and aggregate result cnt |
ping_buf0 | DDR/HBM ping buffer for unhandled data. |
ping_buf1 | DDR/HBM ping buffer for unhandled data. |
ping_buf2 | DDR/HBM ping buffer for unhandled data. |
ping_buf3 | DDR/HBM ping buffer for unhandled data. |
pong_buf0 | DDR/HBM pong buffer for unhandled data. |
pong_buf1 | DDR/HBM pong buffer for unhandled data. |
pong_buf2 | DDR/HBM pong buffer for unhandled data. |
pong_buf3 | DDR/HBM pong buffer for unhandled data. |
aggr_key_out | output of key columns. |
aggr_pld_out | output of pld columns. [0][*] is the result of min/max/cnt for pld columns, [1][*] is the low-bit value of sum/average, [2][*] is the hight-bit value of sum/average. |
strm_e_out | is the end signal of output. |
hashJoinMPU¶
hashJoinMPU overload (1)¶
#include "xf_database/hash_join_v2.hpp"
template < int HASH_MODE, int KEYW, int PW, int S_PW, int B_PW, int HASHWH, int HASHWL, int ARW, int BFW, int CH_NM, int BF_W, int EN_BF > static void hashJoinMPU ( hls::stream <ap_uint <KEYW>> k0_strm_arry [CH_NM], hls::stream <ap_uint <PW>> p0_strm_arry [CH_NM], hls::stream <bool> e0_strm_arry [CH_NM], ap_uint <BFW>* stb0_buf, ap_uint <BFW>* stb1_buf, ap_uint <BFW>* stb2_buf, ap_uint <BFW>* stb3_buf, ap_uint <BFW>* stb4_buf, ap_uint <BFW>* stb5_buf, ap_uint <BFW>* stb6_buf, ap_uint <BFW>* stb7_buf, hls::stream <ap_uint <S_PW+B_PW>>& j1_strm, hls::stream <bool>& e5_strm )
Multi-PU Hash-Join primitive, using multiple DDR/HBM buffers.
The max number of lines of small table is 2M in this design. It is assumed that the hash-conflict is within 512 per bin.
This module can accept more than 1 input row per cycle, via multiple input channels. The small table and the big table shares the same input ports, so the width of the payload should be the max of both, while the data should be aligned to the little-end. The small table should be fed TWICE, followed by the big table once.
Parameters:
HASH_MODE | 0 for radix and 1 for Jenkin’s Lookup3 hash. |
KEYW | width of key, in bit. |
PW | width of max payload, in bit. |
S_PW | width of payload of small table. |
B_PW | width of payload of big table. |
HASHWH | number of hash bits used for PU/buffer selection, 1~3. |
HASHWL | number of hash bits used for hash-table in PU. |
ARW | width of address, log2(small table max num of rows). |
BFW | width of buffer. |
CH_NM | number of input channels, 1,2,4. |
BF_W | bloom-filter hash width. |
EN_BF | bloom-filter switch, 0 for off, 1 for on. |
k0_strm_arry | input of key columns of both tables. |
p0_strm_arry | input of payload columns of both tables. |
e0_strm_arry | input of end signal of both tables. |
stb0_buf | HBM/DDR buffer of PU0 |
stb1_buf | HBM/DDR buffer of PU1 |
stb2_buf | HBM/DDR buffer of PU2 |
stb3_buf | HBM/DDR buffer of PU3 |
stb4_buf | HBM/DDR buffer of PU4 |
stb5_buf | HBM/DDR buffer of PU5 |
stb6_buf | HBM/DDR buffer of PU6 |
stb7_buf | HBM/DDR buffer of PU7 |
j1_strm | output of joined rows. |
e5_strm | end signal of joined rows. |
hashJoinMPU overload (2)¶
#include "xf_database/hash_join_v2.hpp"
template < int HASH_MODE, int KEYW, int PW, int S_PW, int B_PW, int HASHWH, int HASHWL, int ARW, int BFW, int CH_NM > void hashJoinMPU ( hls::stream <ap_uint <KEYW>> k0_strm_arry [CH_NM], hls::stream <ap_uint <PW>> p0_strm_arry [CH_NM], hls::stream <bool> e0_strm_arry [CH_NM], ap_uint <BFW>* stb0_buf, ap_uint <BFW>* stb1_buf, ap_uint <BFW>* stb2_buf, ap_uint <BFW>* stb3_buf, ap_uint <BFW>* stb4_buf, ap_uint <BFW>* stb5_buf, ap_uint <BFW>* stb6_buf, ap_uint <BFW>* stb7_buf, hls::stream <ap_uint <S_PW+B_PW>>& j1_strm, hls::stream <bool>& e5_strm )
Multi-PU Hash-Join primitive, using multiple DDR/HBM buffers.
The max number of lines of small table is 8M in this design. It is assumed that the hash-conflict is within 512 per bin.
This module can accept more than 1 input row per cycle, via multiple input channels. The small table and the big table shares the same input ports, so the width of the payload should be the max of both, while the data should be aligned to the little-end. The small table should be fed TWICE, followed by the big table once.
Parameters:
HASH_MODE | 0 for radix and 1 for Jenkin’s Lookup3 hash. |
KEYW | width of key, in bit. |
PW | width of max payload, in bit. |
S_PW | width of payload of small table. |
B_PW | width of payload of big table. |
HASHWH | number of hash bits used for PU/buffer selection, 1~3. |
HASHWL | number of hash bits used for hash-table in PU. |
ARW | width of address, log2(small table max num of rows). |
BFW | width of buffer. |
CH_NM | number of input channels, 1,2,4. |
k0_strm_arry | input of key columns of both tables. |
p0_strm_arry | input of payload columns of both tables. |
e0_strm_arry | input of end signal of both tables. |
stb0_buf | HBM/DDR buffer of PU0 |
stb1_buf | HBM/DDR buffer of PU1 |
stb2_buf | HBM/DDR buffer of PU2 |
stb3_buf | HBM/DDR buffer of PU3 |
stb4_buf | HBM/DDR buffer of PU4 |
stb5_buf | HBM/DDR buffer of PU5 |
stb6_buf | HBM/DDR buffer of PU6 |
stb7_buf | HBM/DDR buffer of PU7 |
j1_strm | output of joined rows. |
e5_strm | end signal of joined rows. |
hashJoinV3¶
#include "xf_database/hash_join_v3.hpp"
template < int HASH_MODE, int KEYW, int PW, int S_PW, int B_PW, int HASHWH, int HASHWL, int ARW, int CH_NM > void hashJoinV3 ( hls::stream <ap_uint <KEYW>> k0_strm_arry [CH_NM], hls::stream <ap_uint <PW>> p0_strm_arry [CH_NM], hls::stream <bool> e0_strm_arry [CH_NM], ap_uint <256>* htb0_buf, ap_uint <256>* htb1_buf, ap_uint <256>* htb2_buf, ap_uint <256>* htb3_buf, ap_uint <256>* htb4_buf, ap_uint <256>* htb5_buf, ap_uint <256>* htb6_buf, ap_uint <256>* htb7_buf, ap_uint <256>* stb0_buf, ap_uint <256>* stb1_buf, ap_uint <256>* stb2_buf, ap_uint <256>* stb3_buf, ap_uint <256>* stb4_buf, ap_uint <256>* stb5_buf, ap_uint <256>* stb6_buf, ap_uint <256>* stb7_buf, hls::stream <ap_uint <32>>& pu_begin_status_strms, hls::stream <ap_uint <32>>& pu_end_status_strms, hls::stream <ap_uint <KEYW+S_PW+B_PW>>& j_strm, hls::stream <bool>& j_e_strm )
Hash-Join v3 primitive, it takes more resourse than hashJoinMPU
and promises a better performance in large size of table.
The maximum size of small table is 256MBx8(HBM number)=2GB in this design. The total hash entries is equal to 1<<(HASHWH + HASHWL), and it is limitied to maximum of 1M entries because of the size of URAM in a single SLR.
This module can accept more than 1 input row per cycle, via multiple input channels. The small table and the big table shares the same input ports, so the width of the payload should be the max of both, while the data should be aligned to the little-end. To be different with hashJoinMPU
, the small table and big table should be fed only once.
Parameters:
HASH_MODE | 0 for radix and 1 for Jenkin’s Lookup3 hash. |
KEYW | width of key, in bit. |
PW | width of max payload, in bit. |
S_PW | width of payload of small table. |
B_PW | width of payload of big table. |
HASHWH | number of hash bits used for PU/buffer selection, 1~3. |
HASHWL | number of hash bits used for hash-table in PU. |
ARW | width of address, larger than 24 is suggested. |
CH_NM | number of input channels, 1,2,4. |
k0_strm_arry | input of key columns of both tables. |
p0_strm_arry | input of payload columns of both tables. |
e0_strm_arry | input of end signal of both tables. |
htb0_buf | HBM/DDR buffer of hash_table0 |
htb1_buf | HBM/DDR buffer of hash_table1 |
htb2_buf | HBM/DDR buffer of hash_table2 |
htb3_buf | HBM/DDR buffer of hash_table3 |
htb4_buf | HBM/DDR buffer of hash_table4 |
htb5_buf | HBM/DDR buffer of hash_table5 |
htb6_buf | HBM/DDR buffer of hash_table6 |
htb7_buf | HBM/DDR buffer of hash_table7 |
stb0_buf | HBM/DDR buffer of PU0 |
stb1_buf | HBM/DDR buffer of PU1 |
stb2_buf | HBM/DDR buffer of PU2 |
stb3_buf | HBM/DDR buffer of PU3 |
stb4_buf | HBM/DDR buffer of PU4 |
stb5_buf | HBM/DDR buffer of PU5 |
stb6_buf | HBM/DDR buffer of PU6 |
stb7_buf | HBM/DDR buffer of PU7 |
pu_begin_status_strms | contains hash depth, row number of join result |
pu_end_status_strms | contains hash depth, row number of join result |
j_strm | output of joined result |
j_e_strm | end flag of joined result |
hashBuildProbeV3¶
#include "xf_database/hash_join_v3.hpp"
template < int HASH_MODE, int KEYW, int PW, int S_PW, int B_PW, int HASHWH, int HASHWL, int ARW, int CH_NM, int BF_W, int EN_BF > static void hashBuildProbeV3 ( bool& build_probe_flag, hls::stream <ap_uint <KEYW>> k0_strm_arry [CH_NM], hls::stream <ap_uint <PW>> p0_strm_arry [CH_NM], hls::stream <bool> e0_strm_arry [CH_NM], ap_uint <256>* htb0_buf, ap_uint <256>* htb1_buf, ap_uint <256>* htb2_buf, ap_uint <256>* htb3_buf, ap_uint <256>* htb4_buf, ap_uint <256>* htb5_buf, ap_uint <256>* htb6_buf, ap_uint <256>* htb7_buf, ap_uint <256>* stb0_buf, ap_uint <256>* stb1_buf, ap_uint <256>* stb2_buf, ap_uint <256>* stb3_buf, ap_uint <256>* stb4_buf, ap_uint <256>* stb5_buf, ap_uint <256>* stb6_buf, ap_uint <256>* stb7_buf, hls::stream <ap_uint <32>>& pu_begin_status_strms, hls::stream <ap_uint <32>>& pu_end_status_strms, hls::stream <ap_uint <KEYW+S_PW+B_PW>>& j_strm, hls::stream <bool>& j_e_strm )
Hash-Build-Probe v3 primitive, it can perform hash build and hash probe separately. It needs two call of kernel to perform build and probe seperately. There is a control flag to decide buld or probe. This primitive supports multiple build and mutiple probe, for example, you can scadule a workflow as: build0->build1->probe0->probe1->build2->build3->probe3…
The maximum size of small table is 256MBx8=2GB in this design. The total hash entries is equal to 1<<(HASHWH + HASHWL), and it is limitied to maximum of 1M entries because of the size of URAM in a single SLR.
This module can accept more than 1 input row per cycle, via multiple input channels. The small table and the big table shares the same input ports, so the width of the payload should be the max of both, while the data should be aligned to the little-end. The small table and big table should be fed only ONCE.
Parameters:
HASH_MODE | 0 for radix and 1 for Jenkin’s Lookup3 hash. |
KEYW | width of key, in bit. |
PW | width of max payload, in bit. |
S_PW | width of payload of small table. |
B_PW | width of payload of big table. |
HASHWH | number of hash bits used for PU/buffer selection, 1~3. |
HASHWL | number of hash bits used for hash-table in PU. |
ARW | width of address, log2(small table max num of rows). |
CH_NM | number of input channels, 1,2,4. |
BF_W | bloom-filter hash width. |
EN_BF | bloom-filter switch, 0 for off, 1 for on. |
build_probe_flag | 0:build 1:probe |
k0_strm_arry | input of key columns of both tables. |
p0_strm_arry | input of payload columns of both tables. |
e0_strm_arry | input of end signal of both tables. |
htb0_buf | HBM/DDR buffer of hash_table0 |
htb1_buf | HBM/DDR buffer of hash_table1 |
htb2_buf | HBM/DDR buffer of hash_table2 |
htb3_buf | HBM/DDR buffer of hash_table3 |
htb4_buf | HBM/DDR buffer of hash_table4 |
htb5_buf | HBM/DDR buffer of hash_table5 |
htb6_buf | HBM/DDR buffer of hash_table6 |
htb7_buf | HBM/DDR buffer of hash_table7 |
stb0_buf | HBM/DDR buffer of PU0 |
stb1_buf | HBM/DDR buffer of PU1 |
stb2_buf | HBM/DDR buffer of PU2 |
stb3_buf | HBM/DDR buffer of PU3 |
stb4_buf | HBM/DDR buffer of PU4 |
stb5_buf | HBM/DDR buffer of PU5 |
stb6_buf | HBM/DDR buffer of PU6 |
stb7_buf | HBM/DDR buffer of PU7 |
pu_begin_status_strms | contains build id, fixed hash depth, joined number of last probe and start addr of unused stb_buf for each PU |
pu_end_status_strms | returns next build id, fixed hash depth, joined number of current probe and end addr of used stb_buf for each PU |
j_strm | output of joined result |
j_e_strm | end flag of joined result |
hashJoinV4¶
#include "xf_database/hash_join_v4.hpp"
template < int HASH_MODE, int KEYW, int PW, int S_PW, int B_PW, int HASHWH, int HASHWL, int ARW, int CH_NM, int BF_HASH_NM, int BFW, bool EN_BF > static void hashJoinV4 ( hls::stream <ap_uint <KEYW>> k0_strm_arry [CH_NM], hls::stream <ap_uint <PW>> p0_strm_arry [CH_NM], hls::stream <bool> e0_strm_arry [CH_NM], ap_uint <64>* htb0_buf, ap_uint <64>* htb1_buf, ap_uint <64>* htb2_buf, ap_uint <64>* htb3_buf, ap_uint <64>* htb4_buf, ap_uint <64>* htb5_buf, ap_uint <64>* htb6_buf, ap_uint <64>* htb7_buf, ap_uint <64>* stb0_buf, ap_uint <64>* stb1_buf, ap_uint <64>* stb2_buf, ap_uint <64>* stb3_buf, ap_uint <64>* stb4_buf, ap_uint <64>* stb5_buf, ap_uint <64>* stb6_buf, ap_uint <64>* stb7_buf, hls::stream <ap_uint <32>>& pu_begin_status_strms, hls::stream <ap_uint <32>>& pu_end_status_strms, hls::stream <ap_uint <KEYW+S_PW+B_PW>>& j_strm, hls::stream <bool>& j_e_strm )
Hash-Join v4 primitive, using bloom filter to enhance performance of hash join.
The build and probe procedure is similar to which in hashJoinV3
, and this primitive adds a bloom filter to reduce the redundant access to HBM.
The maximum size of small table is 256MBx8=2GB in this design. The total hash entries is equal to 1<<(HASHWH + HASHWL), and it is limitied to maximum of 1M entries because of the size of URAM in a single SLR.
This module can accept more than 1 input row per cycle, via multiple input channels. The small table and the big table shares the same input ports, so the width of the payload should be the max of both, while the data should be aligned to the little-end. Similar to hashJoinV3
, small table and big table should be fed only once.
Parameters:
HASH_MODE | 0 for radix and 1 for Jenkin’s Lookup3 hash. |
KEYW | width of key, in bit. |
PW | width of max payload, in bit. |
S_PW | width of payload of small table. |
B_PW | width of payload of big table. |
HASHWH | number of hash bits used for PU/buffer selection, 1~3. |
HASHWL | number of hash bits used for hash-table in PU. |
ARW | width of address, log2(small table max num of rows). |
CH_NM | number of input channels, 1,2,4. |
BF_HASH_NM | number of bloom filter, 1,2,3. |
BF_W | bloom-filter hash width. |
EN_BF | bloom-filter switch, 0 for off, 1 for on. |
k0_strm_arry | input of key columns of both tables. |
p0_strm_arry | input of payload columns of both tables. |
e0_strm_arry | input of end signal of both tables. |
htb0_buf | HBM/DDR buffer of hash_table0 |
htb1_buf | HBM/DDR buffer of hash_table1 |
htb2_buf | HBM/DDR buffer of hash_table2 |
htb3_buf | HBM/DDR buffer of hash_table3 |
htb4_buf | HBM/DDR buffer of hash_table4 |
htb5_buf | HBM/DDR buffer of hash_table5 |
htb6_buf | HBM/DDR buffer of hash_table6 |
htb7_buf | HBM/DDR buffer of hash_table7 |
stb0_buf | HBM/DDR buffer of PU0 |
stb1_buf | HBM/DDR buffer of PU1 |
stb2_buf | HBM/DDR buffer of PU2 |
stb3_buf | HBM/DDR buffer of PU3 |
stb4_buf | HBM/DDR buffer of PU4 |
stb5_buf | HBM/DDR buffer of PU5 |
stb6_buf | HBM/DDR buffer of PU6 |
stb7_buf | HBM/DDR buffer of PU7 |
pu_begin_status_strms | contains build id, fixed hash depth |
pu_end_status_strms | returns next build id, fixed hash depth, joined number |
j_strm | output of joined result |
j_e_strm | end flag of joined result |
hashBuildProbeV4¶
#include "xf_database/hash_join_v4.hpp"
template < int KEYW, int PW, int S_PW, int B_PW, int HASHWH, int HASHWL, int HASHO, int ARW, int CH_NM, int BF_HASH_NM, int BFW, int EN_BF > static void hashBuildProbeV4 ( bool& build_probe_flag, hls::stream <ap_uint <KEYW>> k0_strm_arry [CH_NM], hls::stream <ap_uint <PW>> p0_strm_arry [CH_NM], hls::stream <bool> e0_strm_arry [CH_NM], ap_uint <64>* htb0_buf, ap_uint <64>* htb1_buf, ap_uint <64>* htb2_buf, ap_uint <64>* htb3_buf, ap_uint <64>* htb4_buf, ap_uint <64>* htb5_buf, ap_uint <64>* htb6_buf, ap_uint <64>* htb7_buf, ap_uint <64>* stb0_buf, ap_uint <64>* stb1_buf, ap_uint <64>* stb2_buf, ap_uint <64>* stb3_buf, ap_uint <64>* stb4_buf, ap_uint <64>* stb5_buf, ap_uint <64>* stb6_buf, ap_uint <64>* stb7_buf, hls::stream <ap_uint <32>>& pu_begin_status_strms, hls::stream <ap_uint <32>>& pu_end_status_strms, hls::stream <ap_uint <KEYW+S_PW+B_PW>>& j_strm, hls::stream <bool>& j_e_strm )
Hash-Build-Probe v4 primitive. Compared with HashBuildProbeV3
, it enables bloom filter to reduce redundant access to HBM which can further reduce run-time of hash join. Build and probe are separately performed and controlled by a boolean flag. Mutiple build and probe are also provided, and it should make sure all rows in build phase can be stored temporarily in HBM to maintain correctness.
The maximum size of small table is 256MBx8=2GB in this design. The total hash entries is equal to 1<<(HASHWH + HASHWL), and it is limitied to maximun of 1M entries because of the size of URAM in a single SLR.
Parameters:
KEYW | width of key, in bit. |
PW | width of max payload, in bit. |
S_PW | width of payload of small table. |
B_PW | width of payload of big table. |
HASHWH | number of hash bits used for PU/buffer selection, 1~3. |
HASHWL | number of hash bits used for hash-table in PU. |
HASHO | number of hash bits used for overflow hash counter, 8-12. |
ARW | width of address, log2(small table max num of rows). |
CH_NM | number of input channels, 1,2,4. |
BF_HASH_NM | number of hash functions in bloom filter, 1,2,3. |
BFW | bloom-filter hash width. |
EN_BF | bloom-filter switch, 0 for off, 1 for on. |
build_probe_flag | 0:build 1:probe |
k0_strm_arry | input of key columns of both tables. |
p0_strm_arry | input of payload columns of both tables. |
e0_strm_arry | input of end signal of both tables. |
htb0_buf | HBM/DDR buffer of hash_table0 |
htb1_buf | HBM/DDR buffer of hash_table1 |
htb2_buf | HBM/DDR buffer of hash_table2 |
htb3_buf | HBM/DDR buffer of hash_table3 |
htb4_buf | HBM/DDR buffer of hash_table4 |
htb5_buf | HBM/DDR buffer of hash_table5 |
htb6_buf | HBM/DDR buffer of hash_table6 |
htb7_buf | HBM/DDR buffer of hash_table7 |
stb0_buf | HBM/DDR buffer of PU0 |
stb1_buf | HBM/DDR buffer of PU1 |
stb2_buf | HBM/DDR buffer of PU2 |
stb3_buf | HBM/DDR buffer of PU3 |
stb4_buf | HBM/DDR buffer of PU4 |
stb5_buf | HBM/DDR buffer of PU5 |
stb6_buf | HBM/DDR buffer of PU6 |
stb7_buf | HBM/DDR buffer of PU7 |
pu_begin_status_strms | contains build ID, probe ID, fixed hash depth, joined number of last probe and start addr of unused stb_buf for each PU |
pu_end_status_strms | returns next build ID, next probe ID, fixed hash depth, joined number of current probe and end addr of stb_buf for each PU |
j_strm | output of joined rows. |
j_e_strm | is the end flag of joined result. |
hashLookup3¶
hashLookup3 overload (1)¶
#include "xf_database/hash_lookup3.hpp"
template <int W> void hashLookup3 ( hls::stream <ap_uint <W>>& key_strm, hls::stream <ap_uint <64>>& hash_strm )
lookup3 algorithm, 64-bit hash. II=1 when W<=96, otherwise II=(W/96).
Parameters:
W | the bit width of ap_uint type for input message stream. |
key_strm | the message being hashed. |
hash_strm | the result. |
hashLookup3 overload (2)¶
#include "xf_database/hash_lookup3.hpp"
template <int W> void hashLookup3 ( hls::stream <ap_uint <W>>& key_strm, hls::stream <ap_uint <32>>& hash_strm )
lookup3 algorithm, 32-bit hash. II=1 when W<=96, otherwise II=(W/96).
Parameters:
W | the bit width of ap_uint type for input message stream. |
key_strm | the message being hashed. |
hash_strm | the result. |
hashLookup3 overload (3)¶
#include "xf_database/hash_lookup3.hpp"
template < int WK, int WH > void hashLookup3 ( hls::stream <ap_uint <WK>>& key_strm, hls::stream <bool>& e_key_strm, hls::stream <ap_uint <WH>>& hash_strm, hls::stream <bool>& e_hash_strm )
lookup3 algorithm, 64-bit or 32-bit hash.
Parameters:
WK | the bit width of input message stream. |
WH | the bit width of output hash stream, must be 64 or 32. |
key_strm | the message being hashed. |
e_key_strm | end of key flag stream. |
hash_strm | the result. |
e_hash_strm | end of hash flag stream. |
hashMultiJoin¶
#include "xf_database/hash_multi_join.hpp"
template < int HASH_MODE, int KEYW, int PW, int S_PW, int B_PW, int HASHWH, int HASHWL, int ARW, int CH_NM > void hashMultiJoin ( hls::stream <ap_uint <3>>& join_flag_strm, hls::stream <ap_uint <KEYW>> k0_strm_arry [CH_NM], hls::stream <ap_uint <PW>> p0_strm_arry [CH_NM], hls::stream <bool> e0_strm_arry [CH_NM], ap_uint <256>* htb0_buf, ap_uint <256>* htb1_buf, ap_uint <256>* htb2_buf, ap_uint <256>* htb3_buf, ap_uint <256>* htb4_buf, ap_uint <256>* htb5_buf, ap_uint <256>* htb6_buf, ap_uint <256>* htb7_buf, ap_uint <256>* stb0_buf, ap_uint <256>* stb1_buf, ap_uint <256>* stb2_buf, ap_uint <256>* stb3_buf, ap_uint <256>* stb4_buf, ap_uint <256>* stb5_buf, ap_uint <256>* stb6_buf, ap_uint <256>* stb7_buf, hls::stream <ap_uint <32>>& pu_begin_status_strms, hls::stream <ap_uint <32>>& pu_end_status_strms, hls::stream <ap_uint <KEYW+S_PW+B_PW>>& j_strm, hls::stream <bool>& j_e_strm )
Multi-PU Hash-Multi-Join primitive, using multiple DDR/HBM buffers.
This primitive shares most of the structure of hashJoinV3
. The inner table should be fed once, followed by the outer table once.
Parameters:
HASH_MODE | 0 for radix and 1 for Jenkin’s Lookup3 hash. |
KEYW | width of key, in bit. |
PW | width of max payload, in bit. |
S_PW | width of payload of small table. |
B_PW | width of payload of big table. |
HASHWH | number of hash bits used for PU/buffer selection, 1~3. |
HASHWL | number of hash bits used for hash-table in PU. |
ARW | width of address, larger than 24 is suggested. |
CH_NM | number of input channels, 1,2,4. |
join_flag_strm | specifies the join type, this flag is only read once. |
k0_strm_arry | input of key columns of both tables. |
p0_strm_arry | input of payload columns of both tables. |
e0_strm_arry | input of end signal of both tables. |
htb0_buf | HBM/DDR buffer of hash_table0 |
htb1_buf | HBM/DDR buffer of hash_table1 |
htb2_buf | HBM/DDR buffer of hash_table2 |
htb3_buf | HBM/DDR buffer of hash_table3 |
htb4_buf | HBM/DDR buffer of hash_table4 |
htb5_buf | HBM/DDR buffer of hash_table5 |
htb6_buf | HBM/DDR buffer of hash_table6 |
htb7_buf | HBM/DDR buffer of hash_table7 |
stb0_buf | HBM/DDR buffer of PU0 |
stb1_buf | HBM/DDR buffer of PU1 |
stb2_buf | HBM/DDR buffer of PU2 |
stb3_buf | HBM/DDR buffer of PU3 |
stb4_buf | HBM/DDR buffer of PU4 |
stb5_buf | HBM/DDR buffer of PU5 |
stb6_buf | HBM/DDR buffer of PU6 |
stb7_buf | HBM/DDR buffer of PU7 |
pu_begin_status_strms | constains depth of hash, row number of join result |
pu_end_status_strms | constains depth of hash, row number of join result |
j_strm | output of joined result |
j_e_strm | end flag of joined result |
hashMultiJoinBuildProbe¶
#include "xf_database/hash_multi_join_build_probe.hpp"
template < int HASH_MODE, int KEYW, int PW, int S_PW, int B_PW, int HASHWH, int HASHWL, int ARW, int CH_NM > void hashMultiJoinBuildProbe ( bool build_probe_flag, hls::stream <ap_uint <3>>& join_flag_strm, hls::stream <ap_uint <KEYW>> k0_strm_arry [CH_NM], hls::stream <ap_uint <PW>> p0_strm_arry [CH_NM], hls::stream <bool> e0_strm_arry [CH_NM], ap_uint <256>* htb0_buf, ap_uint <256>* htb1_buf, ap_uint <256>* htb2_buf, ap_uint <256>* htb3_buf, ap_uint <256>* htb4_buf, ap_uint <256>* htb5_buf, ap_uint <256>* htb6_buf, ap_uint <256>* htb7_buf, ap_uint <256>* stb0_buf, ap_uint <256>* stb1_buf, ap_uint <256>* stb2_buf, ap_uint <256>* stb3_buf, ap_uint <256>* stb4_buf, ap_uint <256>* stb5_buf, ap_uint <256>* stb6_buf, ap_uint <256>* stb7_buf, hls::stream <ap_uint <32>>& pu_begin_status_strms, hls::stream <ap_uint <32>>& pu_end_status_strms, hls::stream <ap_uint <KEYW+S_PW+B_PW>>& j_strm, hls::stream <bool>& j_e_strm )
Multi-PU Hash-Multi-Join primitive, using multiple DDR/HBM buffers.
This primitive shares most of the structure of hashJoinV3
. The inner table should be fed once, followed by the outer table once.
Parameters:
HASH_MODE | 0 for radix and 1 for Jenkin’s Lookup3 hash. |
KEYW | width of key, in bit. |
PW | width of max payload, in bit. |
S_PW | width of payload of small table. |
B_PW | width of payload of big table. |
HASHWH | number of hash bits used for PU/buffer selection, 1~3. |
HASHWL | number of hash bits used for hash-table in PU. |
ARW | width of address, larger than 24 is suggested. |
CH_NM | number of input channels, 1,2,4. |
join_flag_strm | specifies the join type, this flag is only read once. |
k0_strm_arry | input of key columns of both tables. |
p0_strm_arry | input of payload columns of both tables. |
e0_strm_arry | input of end signal of both tables. |
htb0_buf | HBM/DDR buffer of hash_table0 |
htb1_buf | HBM/DDR buffer of hash_table1 |
htb2_buf | HBM/DDR buffer of hash_table2 |
htb3_buf | HBM/DDR buffer of hash_table3 |
htb4_buf | HBM/DDR buffer of hash_table4 |
htb5_buf | HBM/DDR buffer of hash_table5 |
htb6_buf | HBM/DDR buffer of hash_table6 |
htb7_buf | HBM/DDR buffer of hash_table7 |
stb0_buf | HBM/DDR buffer of PU0 |
stb1_buf | HBM/DDR buffer of PU1 |
stb2_buf | HBM/DDR buffer of PU2 |
stb3_buf | HBM/DDR buffer of PU3 |
stb4_buf | HBM/DDR buffer of PU4 |
stb5_buf | HBM/DDR buffer of PU5 |
stb6_buf | HBM/DDR buffer of PU6 |
stb7_buf | HBM/DDR buffer of PU7 |
pu_begin_status_strms | constains depth of hash, row number of join result |
pu_end_status_strms | constains depth of hash, row number of join result |
j_strm | output of joined result |
j_e_strm | end flag of joined result |
hashMurmur3¶
#include "xf_database/hash_murmur3.hpp"
template < int W, int H > void hashMurmur3 ( hls::stream <ap_uint <W>>& key_strm, hls::stream <ap_uint <H>>& hash_strm )
murmur3 algorithm.
Parameters:
W | the bit width of ap_uint type for input message stream. |
h | the bit width of ap_uint type for output hash stream. |
key_strm | the message being hashed. |
hash_strm | the result. |
hashMurmur3Hive¶
#include "xf_database/hash_murmur3_hive.hpp"
static void hashMurmur3Hive ( hls::stream <ap_int <64>>& keyStrm, hls::stream <ap_int <64>>& hashStrm )
Murmur3 algorithm in 64-bit version.
Parameters:
keyStrm | Message being hashed. |
hashStrm | The hash value. |
hashPartition¶
#include "xf_database/hash_partition.hpp"
template < int HASH_MODE, int KEYW, int PW, int EW, int HASHWH, int HASHWL, int ARW, int CH_NM, int COL_NM > void hashPartition ( bool mk_on, int depth, hls::stream <int>& bit_num_strm, hls::stream <ap_uint <KEYW>> k0_strm_arry [CH_NM], hls::stream <ap_uint <PW>> p0_strm_arry [CH_NM], hls::stream <bool> e0_strm_arry [CH_NM], hls::stream <ap_uint <16>>& o_bkpu_num_strm, hls::stream <ap_uint <10>>& o_nm_strm, hls::stream <ap_uint <EW>> o_kpld_strm [COL_NM] )
Hash-Partition primitive.
Parameters:
HASH_MODE | 0 for radix and 1 for Jenkin’s Lookup3 hash. |
KEYW | width of key, in bit. |
PW | width of max payload, in bit. |
EW | element data width of input table, in bit. |
HASHWH | number of hash bits used for PU selection. |
HASHWL | number of hash bits used for partition selection. |
ARW | width of address for URAM |
CH_NM | number of input channels, 1,2,4. |
COL_NM | number of input columns, 1~8. |
mk_on | input of double key flag, 0 for off, 1 for on. |
depth | input of depth of each hash bucket in URAM. |
bit_num_strm | input of partition number, log2(number of partition). |
k0_strm_arry | input of key columns of both tables. |
p0_strm_arry | input of payload columns of both tables. |
e0_strm_arry | input of end signal of both tables. |
o_bkpu_num_strm | output of index for bucket and PU |
o_nm_strm | output of row number each time |
o_kpld_strm | output of key+payload |
hashSemiJoin¶
#include "xf_database/hash_semi_join.hpp"
template < int HashMode, int WKey, int WPayload, int WHashHigh, int WhashLow, int WTmpBufferAddress, int WTmpBuffer, int NChannels, int WBloomFilter, int EnBloomFilter > static void hashSemiJoin ( hls::stream <ap_uint <WKey>> key_istrms [NChannels], hls::stream <ap_uint <WPayload>> payload_istrms [NChannels], hls::stream <bool> e0_strm_arry [NChannels], ap_uint <WTmpBuffer>* pu0_tmp_rwtpr, ap_uint <WTmpBuffer>* pu1_tmp_rwptr, ap_uint <WTmpBuffer>* pu2_tmp_rwptr, ap_uint <WTmpBuffer>* pu3_tmp_rwptr, ap_uint <WTmpBuffer>* pu4_tmp_rwptr, ap_uint <WTmpBuffer>* pu5_tmp_rwptr, ap_uint <WTmpBuffer>* pu6_tmp_rwptr, ap_uint <WTmpBuffer>* pu7_tmp_rwptr, hls::stream <ap_uint <WPayload>>& join_ostrm, hls::stream <bool>& end_ostrm )
Multi-PU Hash-Semi-Join primitive, using multiple DDR/HBM buffers.
The max number of lines of inner table is 2M in this design. It is assumed that the hash-conflict is within 256K per bin.
This module can accept more than 1 input row per cycle, via multiple input channels. The outer table and the inner table share the same input ports, so the width of the payload should be the max of both, while the data should be aligned to the little-end. The inner table should be fed TWICE, followed by the outer table ONCE.
Parameters:
HashMode | 0 for radix and 1 for Jenkin’s Lookup3 hash. |
WKey | width of key, in bit. |
WPayload | width of payload of outer table. |
WHashHigh | number of hash bits used for PU/buffer selection, 1~3. |
WhashLow | number of hash bits used for hash-table in PU. |
WTmpBufferAddress | width of address, log2(inner table max num of rows). |
WTmpBuffer | width of buffer. |
NChannels | number of input channels, 1,2,4. |
WBloomFilter | bloom-filter hash width. |
EnBloomFilter | bloom-filter switch, 0 for off, 1 for on. |
key_istrms | input of key columns of both tables. |
payload_istrms | input of payload columns of both tables. |
e0_strm_arry | input of end signal of both tables. |
pu0_tmp_rwtpr | HBM/DDR buffer of PU0 |
pu1_tmp_rwptr | HBM/DDR buffer of PU1 |
pu2_tmp_rwptr | HBM/DDR buffer of PU2 |
pu3_tmp_rwptr | HBM/DDR buffer of PU3 |
pu4_tmp_rwptr | HBM/DDR buffer of PU4 |
pu5_tmp_rwptr | HBM/DDR buffer of PU5 |
pu6_tmp_rwptr | HBM/DDR buffer of PU6 |
pu7_tmp_rwptr | HBM/DDR buffer of PU7 |
join_ostrm | output of joined rows. |
end_ostrm | end signal of joined rows. |
insertSort¶
insertSort overload (1)¶
#include "xf_database/insert_sort.hpp"
template < typename KEY_TYPE, int MAX_SORT_NUMBER > void insertSort ( hls::stream <KEY_TYPE>& kinStrm, hls::stream <bool>& endInStrm, hls::stream <KEY_TYPE>& koutStrm, hls::stream <bool>& endOutStrm, bool order )
Insert sort top function.
Parameters:
KEY_TYPE | the input and output key type |
MAX_SORT_NUMBER | the max number of the sequence can be sorted |
kinStrm | input key stream |
endInStrm | end flag stream for input |
koutStrm | output key stream |
endOutStrm | end flag stream for output |
order | 1:sort ascending 0:sort descending |
insertSort overload (2)¶
#include "xf_database/insert_sort.hpp"
template < typename KEY_TYPE, typename DATA_TYPE, int MAX_SORT_NUMBER > void insertSort ( hls::stream <DATA_TYPE>& dinStrm, hls::stream <KEY_TYPE>& kinStrm, hls::stream <bool>& endInStrm, hls::stream <DATA_TYPE>& doutStrm, hls::stream <KEY_TYPE>& koutStrm, hls::stream <bool>& endOutStrm, bool order )
Insert sort top function.
Parameters:
KEY_TYPE | the input and output key type |
DATA_TYPE | the input and output data type |
MAX_SORT_NUMBER | the max number of the sequence can be sorted |
dinStrm | input data stream |
kinStrm | input key stream |
endInStrm | end flag stream for input |
doutStrm | output data stream |
koutStrm | output key stream |
endOutStrm | end flag stream for output |
order | 1:sort ascending 0:sort descending |
mergeJoin¶
#include "xf_database/merge_join.hpp"
template < typename KEY_T, typename LEFT_FIELD_T, typename RIGHT_FIELD_T > void mergeJoin ( bool isascend, hls::stream <KEY_T>& left_strm_in_key, hls::stream <LEFT_FIELD_T>& left_strm_in_field, hls::stream <bool>& left_e_strm, hls::stream <KEY_T>& right_strm_in_key, hls::stream <RIGHT_FIELD_T>& right_strm_in_field, hls::stream <bool>& right_e_strm, hls::stream <KEY_T>& left_strm_out_key, hls::stream <LEFT_FIELD_T>& left_strm_out_field, hls::stream <KEY_T>& right_strm_out_key, hls::stream <RIGHT_FIELD_T>& right_strm_out_field, hls::stream <bool>& out_e_strm )
merge join function for sorted tables without duplicated key in the left table
Parameters:
KEY_T | the type of the key of left table |
LEFT_FIELD_T | the type of the field of left table |
RIGHT_FIELD_T | the type of the field of right table |
isascend | the flag to show if the input tables are ascend or descend tables |
left_strm_in_key | the key stream of the left input table |
left_strm_in_field | the field stream of the left input table |
left_e_strm | the end flag stream to mark the end of left input table |
right_strm_in_key | the key stream of the right input table |
right_strm_in_field | the field stream of the right input table |
right_e_strm | the end flag stream to mark the end of right input table |
left_strm_out_key | the output key stream of left table |
left_strm_out_field | the output field stream of left table |
right_strm_out_key | the output key stream of right table |
right_strm_out_field | the output field stream of right |
out_e_strm | the end flag stream to mark the end of out table |
mergeLeftJoin¶
#include "xf_database/merge_left_join.hpp"
template < typename KEY_T, typename LEFT_FIELD_T, typename RIGHT_FIELD_T > void mergeLeftJoin ( bool isascend, hls::stream <KEY_T>& left_strm_in_key, hls::stream <LEFT_FIELD_T>& left_strm_in_field, hls::stream <bool>& left_e_strm, hls::stream <KEY_T>& right_strm_in_key, hls::stream <RIGHT_FIELD_T>& right_strm_in_field, hls::stream <bool>& right_e_strm, hls::stream <KEY_T>& left_strm_out_key, hls::stream <LEFT_FIELD_T>& left_strm_out_field, hls::stream <KEY_T>& right_strm_out_key, hls::stream <RIGHT_FIELD_T>& right_strm_out_field, hls::stream <bool>& out_e_strm, hls::stream <bool>& isnull_strm )
merge left join function for sorted table, left table should not have duplicated keys.
Parameters:
KEY_T | the type of the key |
LEFT_FIELD_T | the type of the field of left table |
RIGHT_FIELD_T | the type of the field of right table |
isascend | flag to show if the input tables are ascend tables |
left_strm_in_key | the key stream of the left input table |
left_strm_in_field | the field stream of the left input table |
left_e_strm | the end flag stream to mark the end of left input table |
right_strm_in_key | the key stream of the right input table |
right_strm_in_field | the field stream of the right input table |
right_e_strm | the end flag stream to mark the end of right input table |
left_strm_out_key | the output key stream of left table |
left_strm_out_field | the output field stream of left table |
right_strm_out_key | the output key stream of right table |
right_strm_out_field | the output field stream of right |
out_e_strm | the end flag stream to mark the end of out table |
isnull_strm | the isnull stream to show if the result right table is null. |
mergeSort¶
mergeSort overload (1)¶
#include "xf_database/merge_sort.hpp"
template <typename Key_Type> void mergeSort ( hls::stream <Key_Type>& left_kin_strm, hls::stream <bool>& left_strm_in_end, hls::stream <Key_Type>& right_kin_strm, hls::stream <bool>& right_strm_in_end, hls::stream <Key_Type>& kout_strm, hls::stream <bool>& strm_out_end, bool order )
Merge sort function.
Parameters:
Data_Type | the input and output key type |
left_kin_strm | input key stream |
left_strm_in_end | end flag stream for left input |
right_kin_strm | input key stream |
right_strm_in_end | end flag stream for right input |
kout_strm | output key stream |
strm_out_end | end flag stream for output data |
order | 1:ascending 0:descending |
mergeSort overload (2)¶
#include "xf_database/merge_sort.hpp"
template < typename Data_Type, typename Key_Type > void mergeSort ( hls::stream <Data_Type>& left_din_strm, hls::stream <Key_Type>& left_kin_strm, hls::stream <bool>& left_strm_in_end, hls::stream <Data_Type>& right_din_strm, hls::stream <Key_Type>& right_kin_strm, hls::stream <bool>& right_strm_in_end, hls::stream <Data_Type>& dout_strm, hls::stream <Key_Type>& kout_strm, hls::stream <bool>& strm_out_end, bool order )
Merge sort function.
Parameters:
Data_Type | the input and output data type |
Data_Type | the input and output key type |
left_din_strm | input left data stream |
left_kin_strm | input key stream |
left_strm_in_end | end flag stream for left input |
right_din_strm | input right data stream |
right_kin_strm | input key stream |
right_strm_in_end | end flag stream for right input |
dout_strm | output data stream |
kout_strm | output key stream |
strm_out_end | end flag stream for output data |
order | 1:ascending 0:descending |
nestedLoopJoin¶
#include "xf_database/nested_loop_join.hpp"
template < int CMP_NUM, typename KEY_T, typename LEFT_FIELD_T, typename RIGHT_FIELD_T > void nestedLoopJoin ( hls::stream <KEY_T>& strm_in_left_key, hls::stream <LEFT_FIELD_T>& strm_in_left_field, hls::stream <bool>& strm_in_left_e, hls::stream <KEY_T>& strm_in_right_key, hls::stream <RIGHT_FIELD_T>& strm_in_right_field, hls::stream <bool>& strm_in_right_e, hls::stream <KEY_T> strm_out_left_key [CMP_NUM], hls::stream <LEFT_FIELD_T> strm_out_left_field [CMP_NUM], hls::stream <KEY_T> strm_out_right_key [CMP_NUM], hls::stream <RIGHT_FIELD_T> strm_out_right_field [CMP_NUM], hls::stream <bool> strm_out_e [CMP_NUM] )
Nested loop join function.
Parameters:
KEY_T | the type of the key of left table |
LEFT_FIELD_T | the type of the field of left table |
RIGHT_FIELD_T | the type of the field of right table |
strm_in_left_key | the key stream of the left input table |
strm_in_left_field | the field stream of the left input table |
strm_in_left_e | the end flag stream to mark the end of left input table |
strm_in_right_key | the key stream of the right input table |
strm_in_right_field | the field stream of the right input table |
strm_in_right_e | the end flag stream to mark the end of right input table |
strm_out_left_key | the output key stream of left table |
strm_out_left_field | the output field stream of left table |
strm_out_right_key | the output key stream of right table |
strm_out_right_field | the output field stream of right |
strm_out_e | the end flag stream to mark the end of out table |
scanCmpStrCol¶
#include "xf_database/scan_cmp_str_col.hpp"
void scanCmpStrCol ( ap_uint <512>* ddr_ptr, hls::stream <int>& size, hls::stream <int>& num_str, hls::stream <ap_uint <512>>& cnst_stream, hls::stream <bool>& out_stream, hls::stream <bool>& e_str_o )
sacn multiple columns of string in global memory, and compare each of them with constant string
Parameters:
ddr_ptr | input string array stored in global memory. |
size | the number of times reading global memory |
num_str | the number of actual strings |
cnst_stream | input constant string stream, 512 bits in heading-length and padding-zero format, read only once as configuration. |
out_stream | output whether each string is equal to the constant string, true indicates they are equal. |
e_str_o | end flag stream for output stream. |
scanCol¶
scanCol overload (1)¶
#include "xf_database/scan_col.hpp"
template < int burst_len, int vec_len, int size0 > void scanCol ( ap_uint <8*size0*vec_len>* c0vec_ptr, const int nrow, hls::stream <ap_uint <8*size0>>& c0_strm, hls::stream <bool>& e_row_strm )
Scan 1 column from DDR/HBM buffers.
Parameters:
burst_len | burst read length, must be supported by MC. |
vec_len | number of items to be scanned as a vector from AXI port. |
size0 | size of column 0, in byte. |
c0vec_ptr | buffer pointer to column 0. |
nrow | number of row to scan. |
c0_strm | column 0 stream. |
e_row_strm | output end flag stream. |
scanCol overload (2)¶
#include "xf_database/scan_col.hpp"
template < int burst_len, int vec_len, int size0, int size1 > void scanCol ( ap_uint <8*size0*vec_len>* c0vec_ptr, ap_uint <8*size1*vec_len>* c1vec_ptr, const int nrow, hls::stream <ap_uint <8*size0>>& c0_strm, hls::stream <ap_uint <8*size1>>& c1_strm, hls::stream <bool>& e_row_strm )
Scan 2 columns from DDR/HBM buffers.
Parameters:
burst_len | burst read length, must be supported by MC. |
vec_len | number of items to be scanned as a vector from AXI port. |
size0 | size of column 0, in byte. |
size1 | size of column 1, in byte. |
c0vec_ptr | buffer pointer to column 0. |
c1vec_ptr | buffer pointer to column 1. |
nrow | number of row to scan. |
c0_strm | column 0 stream. |
c1_strm | column 1 stream. |
e_row_strm | output end flag stream. |
scanCol overload (3)¶
#include "xf_database/scan_col.hpp"
template < int burst_len, int vec_len, int size0, int size1, int size2 > void scanCol ( ap_uint <8*size0*vec_len>* c0vec_ptr, ap_uint <8*size1*vec_len>* c1vec_ptr, ap_uint <8*size2*vec_len>* c2vec_ptr, const int nrow, hls::stream <ap_uint <8*size0>>& c0_strm, hls::stream <ap_uint <8*size1>>& c1_strm, hls::stream <ap_uint <8*size2>>& c2_strm, hls::stream <bool>& e_row_strm )
Scan 3 columns from DDR/HBM buffers.
Parameters:
burst_len | burst read length, must be supported by MC. |
vec_len | number of items to be scanned as a vector from AXI port. |
size0 | size of column 0, in byte. |
size1 | size of column 1, in byte. |
size2 | size of column 2, in byte. |
c0vec_ptr | buffer pointer to column 0. |
c1vec_ptr | buffer pointer to column 1. |
c2vec_ptr | buffer pointer to column 2. |
nrow | number of row to scan. |
c0_strm | column 0 stream. |
c1_strm | column 1 stream. |
c2_strm | column 2 stream. |
e_row_strm | output end flag stream. |
scanCol overload (4)¶
#include "xf_database/scan_col.hpp"
template < int burst_len, int vec_len, int size0, int size1, int size2, int size3 > void scanCol ( ap_uint <8*size0*vec_len>* c0vec_ptr, ap_uint <8*size1*vec_len>* c1vec_ptr, ap_uint <8*size2*vec_len>* c2vec_ptr, ap_uint <8*size3*vec_len>* c3vec_ptr, const int nrow, hls::stream <ap_uint <8*size0>>& c0_strm, hls::stream <ap_uint <8*size1>>& c1_strm, hls::stream <ap_uint <8*size2>>& c2_strm, hls::stream <ap_uint <8*size3>>& c3_strm, hls::stream <bool>& e_row_strm )
Scan 4 columns from DDR/HBM buffers.
Parameters:
burst_len | burst read length, must be supported by MC. |
vec_len | number of items to be scanned as a vector from AXI port. |
size0 | size of column 0, in byte. |
size1 | size of column 1, in byte. |
size2 | size of column 2, in byte. |
size3 | size of column 3, in byte. |
c0vec_ptr | buffer pointer to column 0. |
c1vec_ptr | buffer pointer to column 1. |
c2vec_ptr | buffer pointer to column 2. |
c3vec_ptr | buffer pointer to column 3. |
nrow | number of row to scan. |
c0_strm | column 0 stream. |
c1_strm | column 1 stream. |
c2_strm | column 2 stream. |
c3_strm | column 3 stream. |
e_row_strm | output end flag stream. |
scanCol overload (5)¶
#include "xf_database/scan_col.hpp"
template < int burst_len, int vec_len, int size0, int size1, int size2, int size3, int size4 > void scanCol ( ap_uint <8*size0*vec_len>* c0vec_ptr, ap_uint <8*size1*vec_len>* c1vec_ptr, ap_uint <8*size2*vec_len>* c2vec_ptr, ap_uint <8*size3*vec_len>* c3vec_ptr, ap_uint <8*size4*vec_len>* c4vec_ptr, const int nrow, hls::stream <ap_uint <8*size0>>& c0_strm, hls::stream <ap_uint <8*size1>>& c1_strm, hls::stream <ap_uint <8*size2>>& c2_strm, hls::stream <ap_uint <8*size3>>& c3_strm, hls::stream <ap_uint <8*size4>>& c4_strm, hls::stream <bool>& e_row_strm )
Scan 5 columns from DDR/HBM buffers.
Parameters:
burst_len | burst read length, must be supported by MC. |
vec_len | number of items to be scanned as a vector from AXI port. |
size0 | size of column 0, in byte. |
size1 | size of column 1, in byte. |
size2 | size of column 2, in byte. |
size3 | size of column 3, in byte. |
size4 | size of column 4, in byte. |
c0vec_ptr | buffer pointer to column 0. |
c1vec_ptr | buffer pointer to column 1. |
c2vec_ptr | buffer pointer to column 2. |
c3vec_ptr | buffer pointer to column 3. |
c4vec_ptr | buffer pointer to column 4. |
nrow | number of row to scan. |
c0_strm | column 0 stream. |
c1_strm | column 1 stream. |
c2_strm | column 2 stream. |
c3_strm | column 3 stream. |
c4_strm | column 4 stream. |
e_row_strm | output end flag stream. |
scanCol overload (6)¶
#include "xf_database/scan_col.hpp"
template < int burst_len, int vec_len, int size0, int size1, int size2, int size3, int size4, int size5 > void scanCol ( ap_uint <8*size0*vec_len>* c0vec_ptr, ap_uint <8*size1*vec_len>* c1vec_ptr, ap_uint <8*size2*vec_len>* c2vec_ptr, ap_uint <8*size3*vec_len>* c3vec_ptr, ap_uint <8*size4*vec_len>* c4vec_ptr, ap_uint <8*size5*vec_len>* c5vec_ptr, const int nrow, hls::stream <ap_uint <8*size0>>& c0_strm, hls::stream <ap_uint <8*size1>>& c1_strm, hls::stream <ap_uint <8*size2>>& c2_strm, hls::stream <ap_uint <8*size3>>& c3_strm, hls::stream <ap_uint <8*size4>>& c4_strm, hls::stream <ap_uint <8*size5>>& c5_strm, hls::stream <bool>& e_row_strm )
Scan 6 columns from DDR/HBM buffers.
Parameters:
burst_len | burst read length, must be supported by MC. |
vec_len | number of items to be scanned as a vector from AXI port. |
size0 | size of column 0, in byte. |
size1 | size of column 1, in byte. |
size2 | size of column 2, in byte. |
size3 | size of column 3, in byte. |
size4 | size of column 4, in byte. |
size5 | size of column 5, in byte. |
c0vec_ptr | buffer pointer to column 0. |
c1vec_ptr | buffer pointer to column 1. |
c2vec_ptr | buffer pointer to column 2. |
c3vec_ptr | buffer pointer to column 3. |
c4vec_ptr | buffer pointer to column 4. |
c5vec_ptr | buffer pointer to column 5. |
nrow | number of row to scan. |
c0_strm | column 0 stream. |
c1_strm | column 1 stream. |
c2_strm | column 2 stream. |
c3_strm | column 3 stream. |
c4_strm | column 4 stream. |
c5_strm | column 5 stream. |
e_row_strm | output end flag stream. |
scanCol overload (7)¶
#include "xf_database/scan_col.hpp"
template < int burst_len, int vec_len, int ch_num, int size0 > void scanCol ( ap_uint <8*size0*vec_len>* c0vec_ptr, const int nrow, hls::stream <ap_uint <8*size0>> c0_strm [ch_num], hls::stream <bool> e_row_strm [ch_num] )
Scan one column from DDR/HBM buffers, emit multiple rows concurrently.
Parameters:
burst_len | burst read length, must be supported by MC. |
vec_len | number of items to be scanned as a vector from AXI port. |
ch_num | number of concurrent output channels per column. |
size0 | size of column 0, in byte. |
c0vec_ptr | buffer pointer to column 0. |
nrow | number of row to scan. |
c0_strm | array of column 0 stream. |
e_row_strm | array of output end flag stream. |
scanCol overload (8)¶
#include "xf_database/scan_col.hpp"
template < int burst_len, int vec_len, int ch_num, int size0, int size1 > void scanCol ( ap_uint <8*size0*vec_len>* c0vec_ptr, ap_uint <8*size1*vec_len>* c1vec_ptr, const int nrow, hls::stream <ap_uint <8*size0>> c0_strm [ch_num], hls::stream <ap_uint <8*size1>> c1_strm [ch_num], hls::stream <bool> e_row_strm [ch_num] )
Scan two columns from DDR/HBM buffers, emit multiple rows concurrently.
Parameters:
burst_len | burst read length, must be supported by MC. |
vec_len | number of items to be scanned as a vector from AXI port. |
ch_num | number of concurrent output channels per column. |
size0 | size of column 0, in byte. |
size1 | size of column 1, in byte. |
c0vec_ptr | buffer pointer to column 0. |
c1vec_ptr | buffer pointer to column 1. |
nrow | number of row to scan. |
c0_strm | array of column 0 stream. |
c1_strm | array of column 1 stream. |
e_row_strm | array of output end flag stream. |
scanCol overload (9)¶
#include "xf_database/scan_col.hpp"
template < int burst_len, int vec_len, int ch_num, int size0, int size1, int size2 > void scanCol ( ap_uint <8*size0*vec_len>* c0vec_ptr, ap_uint <8*size1*vec_len>* c1vec_ptr, ap_uint <8*size2*vec_len>* c2vec_ptr, const int nrow, hls::stream <ap_uint <8*size0>> c0_strm [ch_num], hls::stream <ap_uint <8*size1>> c1_strm [ch_num], hls::stream <ap_uint <8*size2>> c2_strm [ch_num], hls::stream <bool> e_row_strm [ch_num] )
Scan three columns from DDR/HBM buffers, emit multiple rows concurrently.
Parameters:
burst_len | burst read length, must be supported by MC. |
vec_len | number of items to be scanned as a vector from AXI port. |
ch_num | number of concurrent output channels per column. |
size0 | size of column 0, in byte. |
size1 | size of column 1, in byte. |
size2 | size of column 2, in byte. |
c0vec_ptr | buffer pointer to column 0. |
c1vec_ptr | buffer pointer to column 1. |
c2vec_ptr | buffer pointer to column 2. |
nrow | number of row to scan. |
c0_strm | array of column 0 stream. |
c1_strm | array of column 1 stream. |
c2_strm | array of column 2 stream. |
e_row_strm | array of output end flag stream. |
scanCol overload (10)¶
#include "xf_database/scan_col_2.hpp"
template < int burst_len, int vec_len, int ch_nm, int size0, int size1 > static void scanCol ( ap_uint <8*size0*vec_len>* c0vec_ptr, ap_uint <8*size1*vec_len>* c1vec_ptr, hls::stream <ap_uint <8*size0>> c0_strm [ch_nm], hls::stream <ap_uint <8*size1>> c1_strm [ch_nm], hls::stream <bool> e_row_strm [ch_nm] )
scan 2 columns from DDR/HBM buffers.
The LSB of first vector of first column specifies the number of rows to be scanned. For a following buffer, if the first vector is zero, same number of zeros will be emitted, otherwise, same number of rows will be read from the buffer.
Parameters:
burst_len | burst read length, must be supported by MC. |
vec_len | scan this number of items as a vector from AXI port. |
ch_nm | number of concurrent output channels per column. |
size0 | size of column 0, in byte. |
size1 | size of column 1, in byte. |
c0vec_ptr | buffer pointer to column 0. |
c1vec_ptr | buffer pointer to column 1. |
c0_strm | array of column 0 stream. |
c1_strm | array of column 1 stream. |
e_row_strm | array of output end flag stream. |
scanCol overload (11)¶
#include "xf_database/scan_col_2.hpp"
template < int burst_len, int vec_len, int ch_nm, int size0, int size1, int size2 > static void scanCol ( ap_uint <8*size0*vec_len>* c0vec_ptr, ap_uint <8*size1*vec_len>* c1vec_ptr, ap_uint <8*size2*vec_len>* c2vec_ptr, hls::stream <ap_uint <8*size0>> c0_strm [ch_nm], hls::stream <ap_uint <8*size1>> c1_strm [ch_nm], hls::stream <ap_uint <8*size2>> c2_strm [ch_nm], hls::stream <bool> e_row_strm [ch_nm] )
scan 3 columns from DDR/HBM buffers.
The LSB of first vector of first column specifies the number of rows to be scanned. For a following buffer, if the first vector is zero, same number of zeros will be emitted, otherwise, same number of rows will be read from the buffer.
Parameters:
burst_len | burst read length, must be supported by MC. |
vec_len | scan this number of items as a vector from AXI port. |
ch_nm | number of concurrent output channels per column. |
size0 | size of column 0, in byte. |
size1 | size of column 1, in byte. |
size2 | size of column 2, in byte. |
c0vec_ptr | buffer pointer to column 0. |
c1vec_ptr | buffer pointer to column 1. |
c2vec_ptr | buffer pointer to column 2. |
c0_strm | array of column 0 stream. |
c1_strm | array of column 1 stream. |
c2_strm | array of column 2 stream. |
e_row_strm | array of output end flag stream. |
scanCol overload (12)¶
#include "xf_database/scan_col_2.hpp"
template < int burst_len, int vec_len, int ch_nm, int size0, int size1, int size2, int size3 > static void scanCol ( ap_uint <8*size0*vec_len>* c0vec_ptr, ap_uint <8*size1*vec_len>* c1vec_ptr, ap_uint <8*size2*vec_len>* c2vec_ptr, ap_uint <8*size3*vec_len>* c3vec_ptr, hls::stream <ap_uint <8*size0>> c0_strm [ch_nm], hls::stream <ap_uint <8*size1>> c1_strm [ch_nm], hls::stream <ap_uint <8*size2>> c2_strm [ch_nm], hls::stream <ap_uint <8*size3>> c3_strm [ch_nm], hls::stream <bool> e_row_strm [ch_nm] )
scan 4 columns from DDR/HBM buffers.
The LSB of first vector of first column specifies the number of rows to be scanned. For a following buffer, if the first vector is zero, same number of zeros will be emitted, otherwise, same number of rows will be read from the buffer.
Parameters:
burst_len | burst read length, must be supported by MC. |
vec_len | scan this number of items as a vector from AXI port. |
ch_nm | number of concurrent output channels per column. |
size0 | size of column 0, in byte. |
size1 | size of column 1, in byte. |
size2 | size of column 2, in byte. |
size3 | size of column 3, in byte. |
c0vec_ptr | buffer pointer to column 0. |
c1vec_ptr | buffer pointer to column 1. |
c2vec_ptr | buffer pointer to column 2. |
c3vec_ptr | buffer pointer to column 3. |
c0_strm | array of column 0 stream. |
c1_strm | array of column 1 stream. |
c2_strm | array of column 2 stream. |
c3_strm | array of column 3 stream. |
e_row_strm | array of output end flag stream. |
scanCol overload (13)¶
#include "xf_database/scan_col_2.hpp"
template < int burst_len, int vec_len, int ch_nm, int size0, int size1, int size2, int size3, int size4 > static void scanCol ( ap_uint <8*size0*vec_len>* c0vec_ptr, ap_uint <8*size1*vec_len>* c1vec_ptr, ap_uint <8*size2*vec_len>* c2vec_ptr, ap_uint <8*size3*vec_len>* c3vec_ptr, ap_uint <8*size4*vec_len>* c4vec_ptr, hls::stream <ap_uint <8*size0>> c0_strm [ch_nm], hls::stream <ap_uint <8*size1>> c1_strm [ch_nm], hls::stream <ap_uint <8*size2>> c2_strm [ch_nm], hls::stream <ap_uint <8*size3>> c3_strm [ch_nm], hls::stream <ap_uint <8*size4>> c4_strm [ch_nm], hls::stream <bool> e_row_strm [ch_nm] )
scan 5 columns from DDR/HBM buffers.
The LSB of first vector of first column specifies the number of rows to be scanned. For a following buffer, if the first vector is zero, same number of zeros will be emitted, otherwise, same number of rows will be read from the buffer.
Parameters:
burst_len | burst read length, must be supported by MC. |
vec_len | scan this number of items as a vector from AXI port. |
ch_nm | number of concurrent output channels per column. |
size0 | size of column 0, in byte. |
size1 | size of column 1, in byte. |
size2 | size of column 2, in byte. |
size3 | size of column 3, in byte. |
size4 | size of column 4, in byte. |
c0vec_ptr | buffer pointer to column 0. |
c1vec_ptr | buffer pointer to column 1. |
c2vec_ptr | buffer pointer to column 2. |
c3vec_ptr | buffer pointer to column 3. |
c4vec_ptr | buffer pointer to column 4. |
c0_strm | array of column 0 stream. |
c1_strm | array of column 1 stream. |
c2_strm | array of column 2 stream. |
c3_strm | array of column 3 stream. |
c4_strm | array of column 4 stream. |
e_row_strm | array of output end flag stream. |
staticEval¶
staticEval overload (1)¶
#include "xf_database/static_eval.hpp"
template < typename T, typename T_O, T_O(*)(T) opf > void staticEval ( hls::stream <T>& in_strm, hls::stream <bool>& e_in_strm, hls::stream <T_O>& out_strm, hls::stream <bool>& e_out_strm )
One stream input static evaluation.
static_eval function calculates the experssion result that user defined. This result will be passed to aggregate module as the input. When calling this API, the T
T_O
are the input/output data types for each parameter of user code. E.g.
// decl
long user_func(int a);
// use
database::static_eval<int, long, user_func>(
in1_strm, e_in_strm, out_strm, e_out_strm);
In the above call, int
is the data type of input of user_func
, and long
is the return type of user_func
.
Parameters:
T | the input stream type, inferred from argument |
T_O | the output stream type, inferred from argument |
opf | the user-defined expression function |
in_strm | input data stream |
e_in_strm | end flag stream for input data |
out_strm | output data stream |
e_out_strm | end flag stream for output data |
staticEval overload (2)¶
#include "xf_database/static_eval.hpp"
template < typename T1, typename T2, typename T_O, T_O(*)(T1, T2) opf > void staticEval ( hls::stream <T1>& in1_strm, hls::stream <T2>& in2_strm, hls::stream <bool>& e_in_strm, hls::stream <T_O>& out_strm, hls::stream <bool>& e_out_strm )
Two stream input static evaluation.
static_eval function calculate the experssion result that user defined. This result will be passed to aggregate module as the input. When calling this API, the T1
T2
T_O
are the input/output data types for each parameter of user code. E.g.
// decl
long user_func(int a, int b);
// use
database::static_eval<int, int, long, user_func>(
in1_strm, in2_strm, e_in_strm, out_strm, e_out_strm);
In the above call, two int
are the data type of input of user_func
, and long
is the return type of user_func
.
Parameters:
T1 | the input stream type, inferred from argument |
T2 | the input stream type, inferred from argument |
T_O | the output stream type, inferred from argument |
opf | the user-defined expression function |
in1_strm | input data stream |
in2_strm | input data stream |
e_in_strm | end flag stream for input data |
out_strm | output data stream |
e_out_strm | end flag stream for output data |
staticEval overload (3)¶
#include "xf_database/static_eval.hpp"
template < typename T1, typename T2, typename T3, typename T_O, T_O(*)(T1, T2, T3) opf > void staticEval ( hls::stream <T1>& in1_strm, hls::stream <T2>& in2_strm, hls::stream <T3>& in3_strm, hls::stream <bool>& e_in_strm, hls::stream <T_O>& out_strm, hls::stream <bool>& e_out_strm )
Three stream input static evaluation.
static_eval function calculate the experssion result that user defined. This result will be passed to aggregate module as the input. When calling this API, the T1
T2
T3
T_O
are the input/output data types for each parameter of user code. E.g.
// decl
long user_func(int a, int b, int c);
// use
database::static_eval<int, int, int, long, user_func>(
in1_strm, in2_strm, in3_strm, e_in_strm,
out_strm, e_out_strm);
In the above call, three int
are the data type of input of user_func
, and long
is the return type of user_func
.
Parameters:
T1 | the input stream type, inferred from argument |
T2 | the input stream type, inferred from argument |
T3 | the input stream type, inferred from argument |
T_O | the output stream type, inferred from argument |
opf | the user-defined expression function |
in1_strm | input data stream |
in2_strm | input data stream |
in3_strm | input data stream |
e_in_strm | end flag stream for input data |
out_strm | output data stream |
e_out_strm | end flag stream for output data |
staticEval overload (4)¶
#include "xf_database/static_eval.hpp"
template < typename T1, typename T2, typename T3, typename T4, typename T_O, T_O(*)(T1, T2, T3, T4) opf > void staticEval ( hls::stream <T1>& in1_strm, hls::stream <T2>& in2_strm, hls::stream <T3>& in3_strm, hls::stream <T4>& in4_strm, hls::stream <bool>& e_in_strm, hls::stream <T_O>& out_strm, hls::stream <bool>& e_out_strm )
Four stream input static evaluation.
static_eval function calculate the experssion result that user defined. This result will be passed to aggregate module as the input. When calling this API, the T1
T2
T3
T_O
are the input/output data types for each parameter of user code. E.g.
// decl
long user_func(int a, int b, int c, int d);
// use
database::static_eval<int, int, int, int, long, user_func>(
in1_strm, in2_strm, in3_strm, in3_strm, e_in_strm,
out_strm, e_out_strm);
In the above call, four int
are the data type of input of user_func
, and long
is the return type of user_func
.
Parameters:
T1 | the input stream type, inferred from argument |
T2 | the input stream type, inferred from argument |
T3 | the input stream type, inferred from argument |
T4 | the input stream type, inferred from argument |
T_O | the output stream type, inferred from argument |
opf | the user-defined expression function |
in1_strm | input data stream |
in2_strm | input data stream |
in3_strm | input data stream |
in4_strm | input data stream |
e_in_strm | end flag stream for input data |
out_strm | output data stream |
e_out_strm | end flag stream for output data |