GQE Kernel Acceleration Demo¶
Introduction¶
The GQE kernel has a demo suite using TPC-H benchmark queries and data. It shows performance of the 21 out 22 TPC-H queries, on scale factor 1 and 30.
Two xclbin files containing gqeJoin
, gqeAggr
and gqePart
kernels are used
across all the tested queries,
and can accelerate most of the time-significant part of the queries.
One xclbin file contains gqeJoin
and gqePart
, and it can be used to execute most of the execution steps
in the queries. The exceptions are queries with non-trivial group-aggregation, for them the second xclbin file
which provides both gqeAggr
and gqePart
needs to be used.
Performance Data¶
The performance of FPGA accelerated query execution is compared against a C++ implementation of the query. The result is summarized in the table below.
For both FPGA and C++, time is measured assuming the data is already loaded into CPU main memory. Each of the tests has been repeated multiple times, and the table shows the average time.
In the table below, PostgreSQL number is collected with version 9.6 on Intel(R) Xeon(R) CPU E5-2690 v4, clocked at 2.60GHz.
Scale Factor 1¶
Query | Kernel Used | End-to-End | Speedup | Speedup | PostgreSQL | C++ |
---|---|---|---|---|---|---|
(ms) | vs PG | vs C++ | (ms) | (ms) | ||
geo mean | 88 | 26 | 8 | 2251 | 728 | |
1 | aggr | 85 | 260 | 26 | 22061 | 2206 |
2 | join | 12 | 73 | 11 | 876 | 135 |
3 | join, aggr | 73 | 26 | 11 | 1863 | 824 |
4 | join | 82 | 11 | 18 | 931* | 1486 |
5 | join | 67 | 29 | 15 | 1926 | 979 |
6 | join | 50 | 47 | 3 | 2355 | 128 |
7 | join | 81 | 45 | 16 | 3666 | 1280 |
8 | join | 59 | 14 | 4460 | 4460 | 1036 |
9 | join | 236 | 23 | 9 | 5476 | 2222 |
10 | join, aggr | 118 | 29 | 5 | 3478 | 536 |
11 | join, aggr | 28 | 8 | 4 | 220 | 130 |
12 | join | 304 | 11 | 4 | 3460 | 1169 |
13 | join, aggr | 424 | 12 | 4 | 5048 | 1582 |
14 | join | 98 | 23 | 2 | 2225 | 234 |
15 | aggr | 72 | 62 | 2 | 4447 | 110 |
16 | join | 204 | 5 | 3 | 1098 | 634 |
17 | join | 45 | 16 | 14 | 720* | 649 |
18 | join, aggr | 193 | 51 | 14 | 9830 | 2795 |
19 | none | 983* | 596 | |||
20 | join, aggr | 108 | 11 | 10 | 1220 | 1032 |
21 | join | 147 | 28 | 33 | 4168* | 4790 |
22 | join | 28 | 13 | 18 | 357* | 508 |
*using index |
Scale Factor 30¶
Query | Kernel Used | End-to-End | Speedup | Speedup | PostgreSQL | C++ |
---|---|---|---|---|---|---|
(ms) | vs PG | vs C++ | (ms) | (ms) | ||
geo mean | 2875 | 32 | 9 | 92563 | 25012 | |
1 | aggr | 2494 | 267 | 26 | 667061 | 65985 |
2 | join | 286 | 236 | 19 | 67406 | 5292 |
3 | part, join | 3930 | 43 | 8 | 167664 | 30194 |
4 | part, join | 3078 | 11 | 15 | 33415* | 45923 |
5 | join | 2540 | 19 | 13 | 47470 | 32778 |
6 | join | 1450 | 48 | 3 | 70277 | 3835 |
7 | join | 3017 | 39 | 13 | 118037 | 40071 |
8 | join | 4465 | 20 | 8 | 87355 | 34204 |
9 | part, join | 7651 | 70 | 12 | 537117 | 92879 |
10 | join, aggr | 3365 | 48 | 6 | 161976 | 19308 |
11 | join, aggr | 538 | 33 | 7 | 17578 | 3768 |
12 | part, join | 8245 | 14 | 4 | 116094 | 34133 |
13 | part, join, aggr | 14699 | 10 | 5 | 153483 | 70607 |
14 | join | 2841 | 28 | 3 | 78913 | 7972 |
15 | aggr | 2040 | 74 | 2 | 150533 | 3643 |
16 | join | 6774 | 7 | 3 | 47168 | 19473 |
17 | join | 1250 | 6 | 20 | 7272* | 24456 |
18 | part, join, aggr | 7718 | 36 | 10 | 275186 | 79747 |
19 | none | 9843* | 17428 | |||
20 | part, join, aggr | 4468 | 16 | 10 | 71483 | 42501 |
21 | part, join | 4642 | 161 | 34 | 746008* | 155607 |
22 | part, join | 750 | 19 | 31 | 13977* | 23173 |
*using index |
Runnig the Demos¶
The demo code is located in L2/demos
folder.
A top makefile serves as unified entrance to build all host code. Please refer the README
file for detailed
instructions.