GQE Kernel Acceleration Demo

Introduction

The GQE kernel has a demo suite using TPC-H benchmark queries and data. It shows performance of the 21 out 22 TPC-H queries, on scale factor 1 and 30.

Two xclbin files containing gqeJoin, gqeAggr and gqePart kernels are used across all the tested queries, and can accelerate most of the time-significant part of the queries.

One xclbin file contains gqeJoin and gqePart, and it can be used to execute most of the execution steps in the queries. The exceptions are queries with non-trivial group-aggregation, for them the second xclbin file which provides both gqeAggr and gqePart needs to be used.

Performance Data

The performance of FPGA accelerated query execution is compared against a C++ implementation of the query. The result is summarized in the table below.

For both FPGA and C++, time is measured assuming the data is already loaded into CPU main memory. Each of the tests has been repeated multiple times, and the table shows the average time.

In the table below, PostgreSQL number is collected with version 9.6 on Intel(R) Xeon(R) CPU E5-2690 v4, clocked at 2.60GHz.

Scale Factor 1

Query Kernel Used End-to-End Speedup Speedup PostgreSQL C++
    (ms) vs PG vs C++ (ms) (ms)
geo mean   88 26 8 2251 728
1 aggr 85 260 26 22061 2206
2 join 12 73 11 876 135
3 join, aggr 73 26 11 1863 824
4 join 82 11 18 931* 1486
5 join 67 29 15 1926 979
6 join 50 47 3 2355 128
7 join 81 45 16 3666 1280
8 join 59 14 4460 4460 1036
9 join 236 23 9 5476 2222
10 join, aggr 118 29 5 3478 536
11 join, aggr 28 8 4 220 130
12 join 304 11 4 3460 1169
13 join, aggr 424 12 4 5048 1582
14 join 98 23 2 2225 234
15 aggr 72 62 2 4447 110
16 join 204 5 3 1098 634
17 join 45 16 14 720* 649
18 join, aggr 193 51 14 9830 2795
19 none       983* 596
20 join, aggr 108 11 10 1220 1032
21 join 147 28 33 4168* 4790
22 join 28 13 18 357* 508
          *using index  

Scale Factor 30

Query Kernel Used End-to-End Speedup Speedup PostgreSQL C++
    (ms) vs PG vs C++ (ms) (ms)
geo mean   2875 32 9 92563 25012
1 aggr 2494 267 26 667061 65985
2 join 286 236 19 67406 5292
3 part, join 3930 43 8 167664 30194
4 part, join 3078 11 15 33415* 45923
5 join 2540 19 13 47470 32778
6 join 1450 48 3 70277 3835
7 join 3017 39 13 118037 40071
8 join 4465 20 8 87355 34204
9 part, join 7651 70 12 537117 92879
10 join, aggr 3365 48 6 161976 19308
11 join, aggr 538 33 7 17578 3768
12 part, join 8245 14 4 116094 34133
13 part, join, aggr 14699 10 5 153483 70607
14 join 2841 28 3 78913 7972
15 aggr 2040 74 2 150533 3643
16 join 6774 7 3 47168 19473
17 join 1250 6 20 7272* 24456
18 part, join, aggr 7718 36 10 275186 79747
19 none       9843* 17428
20 part, join, aggr 4468 16 10 71483 42501
21 part, join 4642 161 34 746008* 155607
22 part, join 750 19 31 13977* 23173
          *using index  

Runnig the Demos

The demo code is located in L2/demos folder. A top makefile serves as unified entrance to build all host code. Please refer the README file for detailed instructions.