Benchmark of European Option Pricing Engine¶
Overview¶
This is a benchmark of MC (Monte-Carlo) European Engine using the Xilinx Vitis environment to compare with QuantLib. It supports software and hardware emulation as well as running the hardware accelerator on the Alveo U250.
Highlights¶
The performance of the MCEuropeanEngine is shown in the table below, our cold run has 488X and warm run has 1954X compared to baseline. Baseline is Quantlib, a Widely Used C++ Open Source Library, running on platform with 2 Intel(R) Xeon(R) CPU E5-2690 v4 @3.20GHz, 8 cores per processor and 2 threads per core.
Platform | Execution time | |
cold run | warm run | |
Baseline | 20.155ms | 20.155ms |
Runtime on U250 | 0.053ms | 0.01325ms |
Accelaration Ratio | 380X | 1521X |
Option Type | put |
strike | 40 |
underlying | 36 |
risk-free rate | 6% |
volatility | 20% |
dividend yield | 0 |
maturity | 1 year |
tolerance | 0.02 |
workload | 1 steps, 47000 paths |
Profiling¶
The resource utilization and performance of MCEuropeanEngine on U250 FPGA card is listed in the following tables (with Vivado 2018.3). There are 4 PUs on Alveo U250 to pricing the option in parallel. Each PU have the same resource utilization.
Implemetation | Kernels | LUT | FF | BRAM | URAM | DSP |
1 PU | kernel_mc_0 (UN config:8) | 234072 | 376207 | 49 | 0 | 1594 |
4 PUs | kernel_mc_0 kernel_mc_1 kernel_mc_2 kernel_mc_3 | 936288 | 1504828 | 196 | 0 | 6376 |
total resource of board | 1728000 | 3456000 | 2688 | 1280 | 12288 | |
utilization ratio (not include platform) | 54.18% | 43.54% | 7.29% | 0 | 51.88% |
Table 15 gives the resource utilization report of four MCEuropeanEngine PUs (Processing Unit). Note that the resource statistics are under specific UN (Unroll Number) configurations. These UN configurations are the templated parameters of the corresponding API.
The complete Vitis demos of European Option Engine are executed with a U250 card on Nimbix. The performance of this demo is listed in Table 16. In this table, kernel execution time and end-to-end execution time (E2E) are calculated.
Engine | Frequency | Execution Time (ms) | |
kernel | E2E | ||
4 PUs | 250MHz | 7.1ms (1000 loop) | 53ms (1000 loop) |
Because only output data is transferred from device to host, The kernel execution time doesn’t differentiate so much to E2E time.
Note
What is cold run and warm run?
- Cold run means to run one application on board 1 time.
- Warm run means to run the application multiple times on board. The E2E is calculated as the average time of multiple runs.
In order to maximize the resource utilization on FPGA, the MCEuropeanEngine PU is duplicated. There four MCEuropeaEngine PUs are placed different SLRs on U250. Due to place and route on FPGA, the kernel runs at 250MHz finally.
Note
Analyzation of the execution time of MCEuropeanEngine
There are 4 PUs. Each PU could execution one application at one time. When there are multiple applications, they are distributed on different PUs and could by executed at the same time. So the warm run time is 1/4 of the cold run.