GQE L3 APIs¶
These APIs are implemented as excutable classes for providing a structure-clean and easy-of-use software user interface.
class xf::database::gqe::Table¶
#include "gqe_table.hpp"
Methods¶
addCol¶
addCol overload (1)¶
void addCol ( std::string _name, TypeEnum type_size, void* _ptr, int row_num )
add one column into the Table with user-provided buffer pointer.
usage: tab.addCol(“o_orderkey”, TypeEnum::TypeInt32 , tab_o_col0, 10000);
Parameters:
_name | column name |
type_size | size of the column element data type in bytes |
_ptr | user-provided column buffer pointer |
row_num | number of rows |
addCol overload (2)¶
void addCol ( std::string _name, TypeEnum type_size, int row_num )
allocate buffer for one column and add it into the Table .
usage: tab.addCol(“o_orderkey”, TypeEnum::TypeInt32 , 10000);
Parameters:
_name | column name |
type_size | size of the column element data type in bytes |
row_num | number of rows |
addCol overload (3)¶
void addCol ( std::string _name, TypeEnum type_size, std::vector <std::string> dat_list )
create one column with several sections by loading rows from data file.
usage: tab.addCol(“o_orderkey”, TypeEnum::TypeInt32 , {file1.dat,file2.dat});
Parameters:
_name | column name |
type_size | size of the column element data type in bytes |
dat_list | data file list |
addCol overload (4)¶
void addCol ( std::string _name, TypeEnum type_size, std::vector <struct ColPtr> ptr_pairs )
create one column with several sections by user-provided pointer list
usage: tab.addCol(“o_orderkey”, TypeEnum::TypeInt32 , {{ptr1,10000},{ptr2,20000}});
Parameters:
_name | column name |
type_size | size of the column element data type in bytes |
ptr_pairs | vector of (ptr,row_num) pairs |
genRowIDWithValidation¶
genRowIDWithValidation overload (1)¶
void genRowIDWithValidation ( std::string _rowid_name, std::string _valid_name, bool _rowid_en, bool _valid_en, std::vector <char*> validationPtrVector )
add validation column with user-provided validation pointer list
Caution This is an experimental-only API, will be deprecated in the next release.
Parameters:
_rowid_name | name of row-id column |
_valid_name | name of validation bits column |
_rowid_en | enable flag of row-id |
_valid_en | enable flag of validation bits |
validationPtrVector | validation bits pointer list |
genRowIDWithValidation overload (2)¶
void genRowIDWithValidation ( std::string _rowid_name, std::string _valid_name, bool _rowid_en, bool _valid_en, void* ptr, int row_num )
add validation column with user-provided pointer
Caution This is an experimental-only API, will be deprecated in the next release.
Parameters:
_rowid_name | name of row-id column |
_valid_name | name of validation bits column |
_rowid_en | enable flag of row-id |
_valid_en | enable flag of validation bits |
ptr | validation bits column pointer |
row_num | number of rows |
genRowIDWithValidation overload (3)¶
void genRowIDWithValidation ( std::string _rowid_name, std::string _valid_name, bool _rowid_en, bool _valid_en, std::vector <std::string> dat_list )
add validation column with user-provided data file list
Caution This is an experimental-only API, will be deprecated in the next release.
Parameters:
_rowid_name | name of row-id column |
_valid_name | name of validation bits column |
_rowid_en | enable flag of row-id |
_valid_en | enable flag of validation bits |
dat_list | data file list |
setRowNum¶
void setRowNum (int _num)
set number of rows for the entire table.
Parameters:
_num | number of rows of the entire table |
getRowNum¶
size_t getRowNum () const
get number of rows of the entire table.
Returns:
number of rows of the entire table
getSecRowNum¶
size_t getSecRowNum (int sid) const
get number of rows for the specified section.
Parameters:
sid | section ID |
Returns:
number of rows of the specified section
getColNum¶
size_t getColNum () const
get number of columns.
Returns:
number of columns of the table.
getSecNum¶
size_t getSecNum () const
get number of sections.
Returns:
number of sections of the table.
checkSecNum¶
void checkSecNum (int sec_l)
divide the columns evenly if section number is greater than 0.
Parameters:
sec_l | number of sections, if 0, do nothing since everything is done by addCol with json input. |
getColTypeSize¶
size_t getColTypeSize (int cid) const
get column data type size.
Parameters:
cid | column ID |
Returns:
date type size of input column id.
getColPointer¶
char* getColPointer ( int i, int _slice_num, int j = 0 ) const
get buffer pointer.
when getColPointer(2,4,1), it means the 2nd column was divied into 4 sections, return the pointer of the 2nd section
Parameters:
i | column id |
_slice_num | divide column i into _slice_num parts |
j | get the j’th part pointer after dividing |
Returns:
column buffer pointer
getValColPointer¶
char* getValColPointer ( int _slice_num, int j ) const
get the validation buffer pointer
Parameters:
_slice_num | number of sections of the validation column |
j | the index of the section |
Returns:
the pointer of the specified section
getColPointer¶
char* getColPointer (int i) const
get column pointer.
Parameters:
i | column id |
Returns:
column pointer
setColNames¶
void setColNames (std::vector <std::string> col_names)
set col_names
Parameters:
col_name | column name list |
class xf::database::gqe::Joiner¶
#include "gqe_join.hpp"
Methods¶
Joiner¶
Joiner (FpgaInit& obj)
constructor of Joiner .
Passing FpgaInit obj to Joiner class. Splitting FpgaInit (OpenCL context, program, commandqueue, host/device buffers creation/allocation etc.) and Joiner Init, guaranteens OpenCL stuff are not released after each join call. So the joiner may launch multi-times.
Parameters:
obj | the FpgaInit instance. |
run¶
ErrCode run ( Table& tab_a, std::string filter_a, Table& tab_b, std::string filter_b, std::string join_str, Table& tab_c, std::string output_str, int join_type = INNER_JOIN, JoinStrategyBase* strategyimp = nullptr )
Run join with the input arguments defined strategy, which includes.
- solution: the join solution (direct-join or partation-join)
- sec_o: left table sec number
- sec_l: right table sec number
- slice_num: the slice number that used in probe
- log_part, the partition number of left/right table
- coef_exp_partO: the expansion coefficient of table O result buffer size / input buffer size, this param affects the output buffer size, but not the perf
- coef_exp_partL: the expansion coefficient of table L result buffer size / input buffer size, this param affects the output buffer size, but not the perf
- coef_exp_join: the expansion coefficient of result buffer size / input buffer size, this param affects the output buffer size, but not the perf
Usage:
auto smanual = new gqe::JoinStrategyManualSet(solution, sec_o, sec_l, slice_num, log_part, coef_exp_partO,
coef_exp_partL, coef_exp_join);
ErrCode err = bigjoin.run(
tab_o, "o_rowid > 0",
tab_l, "",
"o_orderkey = l_orderkey",
tab_c, "c1=l_orderkey, c2=o_rowid, c3=l_rowid",
gqe::INNER_JOIN,
smanual);
delete smanual;
Table tab_o filter condition like “o_rowid > 0”, o_rowid is the col name of tab_o when no filter conditions, given empty fitler condition “”
The join condition like “left_join_key_0=right_join_key_0” when dual key join is enabled, using comma as the seperator in join condition, e.g. “left_join_key_0=right_join_key_0,left_join_key_1=right_join_key_1”
Output strings are like “output_c0 = tab_a_col/tab_b_col”, when several columns are output, using comma as the seperator
Parameters:
tab_a | left table |
filter_a | filter condition of left table |
tab_b | right table |
filter_b | filter condition of right table |
join_str | join condition(s) |
tab_c | result table |
output_str | output columns |
join_type | INNER_JOIN(default) | SEMI_JOIN | ANTI_JOIN. |
strategyimp | pointer to an object of JoinStrategyBase or its derived type. |
class xf::database::gqe::BloomFilter¶
#include "gqe_bloomfilter.hpp"
Overview¶
Methods¶
BloomFilter¶
BloomFilter ( uint64_t num_keys, float fpp = 0.05f )
constructor of BloomFilter
Calculates the size of the bloom-filter based on the number of unique keys and the equation provided in: https://en.wikipedia.org/wiki/Bloom_filter, as well as allocates buffer for the internal hash-table
Parameters:
num_keys | number of unique keys to be built into the hash-table of the bloom-filter |
fpp | false positive probability (5% by default) |
build¶
void build ( Table tab_in, std::string col_names )
build the hash-table with the given key column from input table,
key_names_str should be comma separated, e.g. “key0, key1”
Parameters:
tab_in | input table |
key_names_str | key column names (comma separated) of the input table to be built into hash-table |
merge¶
void merge (BloomFilter& bf_in)
merge the input bloom-filter into the current one
Parameters:
bf_in | input bloom-filter |
getHashTable¶
ap_uint <256>** getHashTable () const
get the bloom-filter hash-table
Returns:
hash-table of the bloom-filter
getBloomFilterSize¶
uint64_t getBloomFilterSize () const
get the bloom-filter size
Returns:
size of the bloom-filter
class xf::database::gqe::Filter¶
#include "gqe_filter.hpp"
Methods¶
Filter¶
Filter (FpgaInit& obj)
constructor of Filter .
Initializes hardware as well as loads binary to FPGA by class Base & FpgaInit
Parameters:
obj | FpgaInit class object |
~Filter¶
~Filter ()
deconstructor of Filter .
clProgram, commandQueue, and Context will be released by class Base
run¶
ErrCode run ( Table& tab_in, std::string input_str, BloomFilter& bf_in, std::string filter_condition, Table& tab_out, std::string output_str, StrategySet params )
gqeFilter run function.
Usage:
err_code = Filter.run(
tab_in,
"l_orderkey",
bf_in,
"19940101<=l_orderdate && l_orderdate<19950101",
tab_c1,
"c1=l_extendedprice, c2=l_discount, c3=o_orderdate, c4=l_orderkey",
params);
Input filter_condition like “19940101<=l_orderdate && l_orderdate<19950101”, l_orderdate must be exsisted in colunm names of the input table, when no filter conditions, input “”
Input key name(s) string like “l_orderkey_0”, when enable dual key join, use comma as seperator, “l_orderkey_0, l_orderkey_1”
Output mapping is like “output_c0 = tab_in_col”, when contains several columns, use comma as seperator
Parameters:
tab_in | input table |
input_str | key column names(s) of the input table to be bloom-filtered |
bf_in | input bloom-filter from which the hash-table used |
filter_condition | filter condition used in dynamic filter |
tab_out | result table |
output_str | output column mapping |
params | StrategySet struct contatins number of sections of the input table. params.sec_l = 0: uses section info from input table; params.sec_l >= 1: separates input table into params.sec_l sections evenly |
Returns:
error code
class xf::database::gqe::Aggregator¶
#include "xf_database/gqe_aggr.hpp"
Overview¶
Methods¶
Aggregator¶
Aggregator (std::string xclbin)
construct of Aggregator .
Parameters:
xclbin | xclbin path |
aggregate¶
ErrCode aggregate ( Table& tab_in, std::vector <EvaluationInfo> evals_info, std::string filter_str, std::string group_keys_str, std::string output_str, Table& tab_out, AggrStrategyBase* strategyImp = nullptr )
aggregate function.
Usage:
err_code = bigaggr.aggregate(tab_l, //input table
{{"l_extendedprice * (-l_discount+c2) / 100", {0, 100}},
{"l_extendedprice * (-l_discount+c2) * (l_tax+c3) / 10000", {0, 100, 100}}
}, // evaluation
"l_shipdate<=19980902", //filter
"l_returnflag,l_linestatus", // group keys
"c0=l_returnflag, c1=l_linestatus,c2=sum(eval0),c3=sum(eval1)", // mapping
tab_c, //output table
sptr); //strategy
Input filter_str like “19940101<=o_orderdate && o_orderdate<19950101”, o_orderdate and o_orderdate must be exsisted colunm names in input table when no filter conditions, input “”
Input evaluation information as a struct EvaluationInfo , creata a valid Evaluation struct using initializer list, e.g. {“l_extendedprice * (-l_discount+c2) / 100”, {0, 100}} EvaluationInfo has two members: evaluation string and evaluation constants. In the evaluation string, you can input a final division calculation. Divisor only supports: 10,100,1000,10000 In the evaluation constants, input a constant for each column, if no constant, like “l_extendedprice” above, input zero.
Input Group keys in a string, like “group_key0, group_key1”, use comma as seperator
Output strings are like “c0=tab_in_col1, c1=tab_in_col2”, when contains several columns, use comma as seperator
StrategyImp class pointer of derived class of AggrStrategyBase .
Parameters:
tab_in | input table |
evals_info | Evalutaion information |
filter_str | filter condition |
group_keys_str | group keys |
out_ptr | output list, output1 = tab_a_col1 |
tab_out | result table |
strategyImp | pointer to an object of AggrStrategyBase or its derived type. |