Benchmark of TreeEngine¶
Overview¶
This is a serial of benchmark based on tree structure using the Xilinx Vitis environment to compare with QuantLib, where the Instrument supports multiple instruments, including swaption, swap, capfloor, callablebond, the Rate Model supports multiple models, including Vasicek, HullWhite, BlackKarasinski, CoxIngersollRoss, ExtendedCoxIngersollRoss, Two-additive-factor gaussian. It supports software and hardware emulation as well as running the hardware accelerator on the Alveo U250.
These examples reside in L2/benchmarks/TreeEngine
directory. Take TreeSwaptionEngineHWMOdel as example, the tutorial bellow provides a step-by-step guide that covers commands for build and runging kernel.
Executable Usage¶
- Work Directory(Step 1)
The steps for library download and environment setup can be found in Vitis Quantitative_Finance Library. For getting the design,
cd L2/benchmarks/TreeEngine/TreeSwaptionEngineHWModel
- Build kernel(Step 2)
Run the following make command to build your XCLBIN and host binary targeting a specific device. Please be noticed that this process will take a long time, maybe couple of hours.
source /opt/xilinx/Vitis/2021.1/settings64.sh source /opt/xilinx/xrt/setenv.sh export DEVICE=/opt/xilinx/platforms/xilinx_u250_xdma_201830_2/xilinx_u250_xdma_201830_2.xpfm export TARGET=hw make run
- Run kernel(Step 3)
To get the benchmark results, please run the following command.
./build_dir.hw.xilinx_u250_xdma_201830_2/host.exe -xclbin build_dir.hw.xilinx_u250_xdma_201830_2/scanTreeKernel.xclbin
Input Arguments:
Usage: test.exe -[-xclbin ] -xclbin TreeEngine binary;
- Example output(Step 4)
----------------------Tree Bermudan (HW) Engine----------------- timestep=50 Found Platform Platform Name: Xilinx Found Device=xilinx_u250_xdma_201830_2 INFO: Importing xclbin_xilinx_u250_xdma_201830_2_hw/scanTreeKernel.xclbin Loading: 'xclbin_xilinx_u250_xdma_201830_2_hw/scanTreeKernel.xclbin' kernel has been created kernel start------ kernel end------ FPGA Execution time: 0.28ms NPV= 13.1903146433444 ,diff/NPV= -1.30631162395178e-14
Profiling¶
The application scenarios in this case is:
Instrument | Model | type | index | fixedRate | timestep | initSize | a | sigma | flatRate | x0 | nominal | spread |
Swaption | HWModel | 0 | 1 | 0.049995924285639641 | 50/100/1000 | 12 | 0.055228873373796609 | 0.0061062754654949824 | 0.04875825 | 0.0 | 1000.0 | 0.0 |
BKModel | 0 | 1 | 0.049995924285639641 | 50/100/1000 | 12 | 0.043389447297063261 | 0.12074597086680797 | 0.04875825 | 0.0 | 1000.0 | 0.0 | |
CIRModel | 0 | 1 | 0.049995924285639641 | 50/100/1000 | 12 | 0.043389447297063261 | 0.068963597413997324 | 0.04875825 | 0.18580295883843218 | 1000.0 | 0.0 | |
ECIRModel | 0 | 1 | 0.049995924285639641 | 50/100/1000 | 12 | 0.043389447297063261 | 0.015974847434765481 | 0.04875825 | 0.50137133948380941 | 1000.0 | 0.0 | |
VModel | 0 | 1 | 0.049995924285639641 | 50/100/1000 | 12 | 0.16046325834281869 | 0.0037370022855109613 | 0.04875825 | 0.0079988201434896891 | 1000.0 | 0.0 | |
G2Model | 0 | 1 | 0.049995924285639641 | 50/100/1000 | 12 | 0.050055733653096922 | 0.0094424342056787739 | 0.04875825 | 0.0 | 1000.0 | 0.0 | |
Swap | HWModel | 0 | 1 | 0.049995924285639641 | 50/100/1000 | 12 | 0.055228873373796609 | 0.0061062754654949824 | 0.04875825 | 0.0 | 1000.0 | 0.0 |
CapFloor | HWModel | 0 | 1 | 0.049995924285639641 | 50/100/1000 | 12 | 0.055228873373796609 | 0.0061062754654949824 | 0.04875825 | 0.0 | 1000.0 | 0.0 |
Callable | HWModel | 0 | 1 | 0.0465 | 50/100/1000 | 6 | 0.03 | 0.01 | 0.055 | 0.0 | 100.0 | 0.0 |
the benchmarks include 2 parts: TreeSwaptionEngine based on different Rate Model, different Instrument based on HullWhite Rate Model. Baseline is Quantlib, a Widely Used C++ Open Source Library, running on platform with 2 Intel(R) Xeon(R) CPU E5-2667 v3 @3.200GHz, 8 cores per procssor and 2 threads per core.
TreeSwaptionEngine¶
The performance of the TreeSwaptionEngine based on HullWhite and other different rate models are shown in the table below.
Rate Model | platform | Timesteps | |||
50 | 100 | 500 | 1000 | ||
HWModel | Baseline (ms) | 1.0 | 4.8 | 353.9 | 2493.5 |
FinTech on U250 (ms) | 0.018 | 0.042 | 0.485 | 1.650 | |
Accelaration Ratio | 55X | 114X | 729X | 1511X | |
BKModel | Baseline (ms) | 1.9 | 8.6 | 438.2 | 2813.1 |
FinTech on U250 (ms) | 0.069 | 0.156 | 1.471 | 4.601 | |
Accelaration Ratio | 27X | 55X | 297X | 611X | |
CIRModel | Baseline (ms) | 0.5 | 1.4 | 26.2 | 100.7 |
FinTech on U250 (ms) | 0.007 | 0.014 | 0.119 | 0.361 | |
Accelaration Ratio | 71X | 100X | 223X | 278X | |
ECIRModel | Baseline (ms) | 1.1 | 5.5 | 439.5 | 3322.5 |
FinTech on U250 (ms) | 0.058 | 0.114 | 0.997 | 3.088 | |
Accelaration Ratio | 19X | 48X | 440X | 1093X | |
VModel | Baseline (ms) | 0.5 | 1.8 | 40.1 | 161.7 |
FinTech on U250 (ms) | 0.005 | 0.010 | 0.096 | 0.322 | |
Accelaration Ratio | 100X | 180X | 417X | 502X | |
G2Model | Baseline (ms) | 258.0 | 2133.5 | ||
FinTech on U250 (ms) | 0.574 | 4.496 | |||
Accelaration Ratio | 449X | 474X |
TreeEngine Based on HullWhite Rate Model¶
The performance comparison of the diffferent TreeEngine Instruments based on the HullWhite rate model are shown in the table below.
Instrument | platform | Timesteps | |||
50 | 100 | 500 | 1000 | ||
Swaption | Baseline (ms) | 1.0 | 4.8 | 353.9 | 2493.5 |
FinTech on U250 (ms) | 0.018 | 0.042 | 0.485 | 1.650 | |
Accelaration Ratio | 55X | 114X | 729X | 1511X | |
Swap | Baseline (ms) | 1.0 | 4.3 | 291.2 | 2056.5 |
FinTech on U250 (ms) | 0.014 | 0.032 | 0.361 | 1.226 | |
Accelaration Ratio | 71X | 134X | 806X | 1677X | |
CapFloor | Baseline (ms) | 0.7 | 3.4 | 217.6 | 1581.3 |
FinTech on U250 (ms) | 0.014 | 0.031 | 0.344 | 1.160 | |
Accelaration Ratio | 50X | 109X | 632X | 1363X | |
Callable | Baseline (ms) | 1.4 | 3.5 | 155.2 | 1142.1 |
FinTech on U250 (ms) | 0.015 | 0.033 | 0.374 | 1.260 | |
Accelaration Ratio | 93X | 106X | 414X | 906X |
The resource utilization and performance of TreeEngine on U250 FPGA card are listed in the following tables.
Engine | PUs | BRAM | URAM | DSP | REG | LUT | FPGA Clock |
treeSwaptionEngineHWModel | 16 | 1136 | 0 | 6128 | 1080922 | 1051628 | 198.3MHz |
treeSwaptionEngineBKModel | 12 | 936 | 0 | 5128 | 951861 | 970889 | 223.6MHz |
treeSwaptionEngineCIRModel | 20 | 1064 | 0 | 4896 | 977294 | 870446 | 266.5MHz |
treeSwaptionEngineECIRModel | 12 | 932 | 0 | 4456 | 963209 | 962656 | 229.8MHz |
treeSwaptionEngineVModel | 20 | 1276 | 0 | 6636 | 1166999 | 1076988 | 249.4MHz |
treeSwaptionEngineG2Model | 8 | 308 | 1088 | 4112 | 736199 | 699702 | 204.9MHz |
treeSwapEngineHWModel | 16 | 1056 | 0 | 5552 | 1045252 | 1024854 | 252.9MHz |
treeCapFloorEngineHWModel | 16 | 1036 | 0 | 5040 | 981075 | 1009950 | 267.2MHz |
treeCallableEngineHWModel | 16 | 1056 | 0 | 4528 | 961311 | 983068 | 242.0MHz |