Benchmark of TreeEngine

Overview

This is a serial of benchmark based on tree structure using the Xilinx Vitis environment to compare with QuantLib, where the Instrument supports multiple instruments, including swaption, swap, capfloor, callablebond, the Rate Model supports multiple models, including Vasicek, HullWhite, BlackKarasinski, CoxIngersollRoss, ExtendedCoxIngersollRoss, Two-additive-factor gaussian. It supports software and hardware emulation as well as running the hardware accelerator on the Alveo U250.

These examples reside in L2/benchmarks/TreeEngine directory. Take TreeSwaptionEngineHWMOdel as example, the tutorial bellow provides a step-by-step guide that covers commands for build and runging kernel.

Executable Usage

  • Work Directory(Step 1)

The steps for library download and environment setup can be found in Vitis Quantitative_Finance Library. For getting the design,

cd L2/benchmarks/TreeEngine/TreeSwaptionEngineHWModel
  • Build kernel(Step 2)

Run the following make command to build your XCLBIN and host binary targeting a specific device. Please be noticed that this process will take a long time, maybe couple of hours.

source /opt/xilinx/Vitis/2021.1/settings64.sh
source /opt/xilinx/xrt/setenv.sh
export DEVICE=/opt/xilinx/platforms/xilinx_u250_xdma_201830_2/xilinx_u250_xdma_201830_2.xpfm
export TARGET=hw
make run
  • Run kernel(Step 3)

To get the benchmark results, please run the following command.

./build_dir.hw.xilinx_u250_xdma_201830_2/host.exe -xclbin build_dir.hw.xilinx_u250_xdma_201830_2/scanTreeKernel.xclbin

Input Arguments:

Usage: test.exe    -[-xclbin ]
       -xclbin     TreeEngine binary;
  • Example output(Step 4)
----------------------Tree Bermudan (HW) Engine-----------------
timestep=50
Found Platform
Platform Name: Xilinx
Found Device=xilinx_u250_xdma_201830_2
INFO: Importing xclbin_xilinx_u250_xdma_201830_2_hw/scanTreeKernel.xclbin
Loading: 'xclbin_xilinx_u250_xdma_201830_2_hw/scanTreeKernel.xclbin'
kernel has been created
kernel start------
kernel end------
FPGA Execution time: 0.28ms
NPV= 13.1903146433444 ,diff/NPV= -1.30631162395178e-14

Profiling

The application scenarios in this case is:

Table 26 Application Scenarios
Instrument Model type index fixedRate timestep initSize a sigma flatRate x0 nominal spread
Swaption HWModel 0 1 0.049995924285639641 50/100/1000 12 0.055228873373796609 0.0061062754654949824 0.04875825 0.0 1000.0 0.0
BKModel 0 1 0.049995924285639641 50/100/1000 12 0.043389447297063261 0.12074597086680797 0.04875825 0.0 1000.0 0.0
CIRModel 0 1 0.049995924285639641 50/100/1000 12 0.043389447297063261 0.068963597413997324 0.04875825 0.18580295883843218 1000.0 0.0
ECIRModel 0 1 0.049995924285639641 50/100/1000 12 0.043389447297063261 0.015974847434765481 0.04875825 0.50137133948380941 1000.0 0.0
VModel 0 1 0.049995924285639641 50/100/1000 12 0.16046325834281869 0.0037370022855109613 0.04875825 0.0079988201434896891 1000.0 0.0
G2Model 0 1 0.049995924285639641 50/100/1000 12 0.050055733653096922 0.0094424342056787739 0.04875825 0.0 1000.0 0.0
Swap HWModel 0 1 0.049995924285639641 50/100/1000 12 0.055228873373796609 0.0061062754654949824 0.04875825 0.0 1000.0 0.0
CapFloor HWModel 0 1 0.049995924285639641 50/100/1000 12 0.055228873373796609 0.0061062754654949824 0.04875825 0.0 1000.0 0.0
Callable HWModel 0 1 0.0465 50/100/1000 6 0.03 0.01 0.055 0.0 100.0 0.0

the benchmarks include 2 parts: TreeSwaptionEngine based on different Rate Model, different Instrument based on HullWhite Rate Model. Baseline is Quantlib, a Widely Used C++ Open Source Library, running on platform with 2 Intel(R) Xeon(R) CPU E5-2667 v3 @3.200GHz, 8 cores per procssor and 2 threads per core.

TreeSwaptionEngine

The performance of the TreeSwaptionEngine based on HullWhite and other different rate models are shown in the table below.

Table 27 Performance Comparison: TreeSwaptionEngine with different Rate Model
Rate Model platform Timesteps
50 100 500 1000
HWModel Baseline (ms) 1.0 4.8 353.9 2493.5
FinTech on U250 (ms) 0.018 0.042 0.485 1.650
Accelaration Ratio 55X 114X 729X 1511X
BKModel Baseline (ms) 1.9 8.6 438.2 2813.1
FinTech on U250 (ms) 0.069 0.156 1.471 4.601
Accelaration Ratio 27X 55X 297X 611X
CIRModel Baseline (ms) 0.5 1.4 26.2 100.7
FinTech on U250 (ms) 0.007 0.014 0.119 0.361
Accelaration Ratio 71X 100X 223X 278X
ECIRModel Baseline (ms) 1.1 5.5 439.5 3322.5
FinTech on U250 (ms) 0.058 0.114 0.997 3.088
Accelaration Ratio 19X 48X 440X 1093X
VModel Baseline (ms) 0.5 1.8 40.1 161.7
FinTech on U250 (ms) 0.005 0.010 0.096 0.322
Accelaration Ratio 100X 180X 417X 502X
G2Model Baseline (ms) 258.0 2133.5    
FinTech on U250 (ms) 0.574 4.496    
Accelaration Ratio 449X 474X    

TreeEngine Based on HullWhite Rate Model

The performance comparison of the diffferent TreeEngine Instruments based on the HullWhite rate model are shown in the table below.

Table 28 Performance Comparison: Different TreeEngine Instruments with HullWhite Mode
Instrument platform Timesteps
50 100 500 1000
Swaption Baseline (ms) 1.0 4.8 353.9 2493.5
FinTech on U250 (ms) 0.018 0.042 0.485 1.650
Accelaration Ratio 55X 114X 729X 1511X
Swap Baseline (ms) 1.0 4.3 291.2 2056.5
FinTech on U250 (ms) 0.014 0.032 0.361 1.226
Accelaration Ratio 71X 134X 806X 1677X
CapFloor Baseline (ms) 0.7 3.4 217.6 1581.3
FinTech on U250 (ms) 0.014 0.031 0.344 1.160
Accelaration Ratio 50X 109X 632X 1363X
Callable Baseline (ms) 1.4 3.5 155.2 1142.1
FinTech on U250 (ms) 0.015 0.033 0.374 1.260
Accelaration Ratio 93X 106X 414X 906X

The resource utilization and performance of TreeEngine on U250 FPGA card are listed in the following tables.

Table 29 Resource utilization report of TreeEngine APIs on U250
Engine PUs BRAM URAM DSP REG LUT FPGA Clock
treeSwaptionEngineHWModel 16 1136 0 6128 1080922 1051628 198.3MHz
treeSwaptionEngineBKModel 12 936 0 5128 951861 970889 223.6MHz
treeSwaptionEngineCIRModel 20 1064 0 4896 977294 870446 266.5MHz
treeSwaptionEngineECIRModel 12 932 0 4456 963209 962656 229.8MHz
treeSwaptionEngineVModel 20 1276 0 6636 1166999 1076988 249.4MHz
treeSwaptionEngineG2Model 8 308 1088 4112 736199 699702 204.9MHz
treeSwapEngineHWModel 16 1056 0 5552 1045252 1024854 252.9MHz
treeCapFloorEngineHWModel 16 1036 0 5040 981075 1009950 267.2MHz
treeCallableEngineHWModel 16 1056 0 4528 961311 983068 242.0MHz