Benchmark of European Option Pricing Engine¶

Overview¶

This is a benchmark of MC (Monte-Carlo) European Engine using the Xilinx Vitis environment to compare with QuantLib. It supports software and hardware emulation as well as running the hardware accelerator on the Alveo U250.

Highlights¶

The performance of the MCEuropeanEngine is shown in the table below, our cold run has 488X and warm run has 1954X compared to baseline. Baseline is Quantlib, a Widely Used C++ Open Source Library, running on platform with 2 Intel(R) Xeon(R) CPU E5-2690 v4 @3.20GHz, 8 cores per processor and 2 threads per core.

Table 16 Performance¶
Platform	Execution time
Platform	cold run	warm run
Baseline	20.155ms	20.155ms
Runtime on U250	0.053ms	0.01325ms
Accelaration Ratio	380X	1521X

Table 17 Application Scenario¶
Option Type	put
strike	40
underlying	36
risk-free rate	6%
volatility	20%
dividend yield	0
maturity	1 year
tolerance	0.02
workload	1 steps, 47000 paths

Profiling¶

The resource utilization and performance of MCEuropeanEngine on U250 FPGA card is listed in the following tables (with Vivado 2018.3). There are 4 PUs on Alveo U250 to pricing the option in parallel. Each PU have the same resource utilization.

Table 18 Resource utilization report of European Option APIs on U250¶
Implemetation	Kernels	LUT	FF	BRAM	URAM	DSP
1 PU	kernel_mc_0 (UN config:8)	234072	376207	49	0	1594
4 PUs	kernel_mc_0 kernel_mc_1 kernel_mc_2 kernel_mc_3	936288	1504828	196	0	6376
total resource of board		1728000	3456000	2688	1280	12288
utilization ratio (not include platform)		54.18%	43.54%	7.29%	0	51.88%

Table 18 gives the resource utilization report of four MCEuropeanEngine PUs (Processing Unit). Note that the resource statistics are under specific UN (Unroll Number) configurations. These UN configurations are the templated parameters of the corresponding API.

The complete Vitis demos of European Option Engine are executed with a U250 card on Nimbix. The performance of this demo is listed in Table 19. In this table, kernel execution time and end-to-end execution time (E2E) are calculated.

Table 19 Performance of European Option demos on U250¶
Engine	Frequency	Execution Time (ms)
		kernel	E2E
4 PUs	250MHz	7.1ms (1000 loop)	53ms (1000 loop)

Because only output data is transferred from device to host, The kernel execution time doesn’t differentiate so much to E2E time.

Note

What is cold run and warm run?

Cold run means to run one application on board 1 time.
Warm run means to run the application multiple times on board. The E2E is calculated as the average time of multiple runs.

In order to maximize the resource utilization on FPGA, the MCEuropeanEngine PU is duplicated. There four MCEuropeaEngine PUs are placed different SLRs on U250. Due to place and route on FPGA, the kernel runs at 250MHz finally.

Note

Analyzation of the execution time of MCEuropeanEngine

There are 4 PUs. Each PU could execution one application at one time. When there are multiple applications, they are distributed on different PUs and could by executed at the same time. So the warm run time is 1/4 of the cold run.