Maximal Independent set¶

Maximal Independent Set example resides in L2/benchmarks/maximal_independent_set directory. The tutorial provides a step-by-step guide that covers commands for building and running kernel.

Executable Usage¶

Work Directory(Step 1)

The steps for library download and environment setup can be found in Vitis Graph Library. For getting the design,

cd L2/benchmarks/maximal_independent_set

Build kernel(Step 2)

Run the following make command to build your XCLBIN and host binary targeting a specific device. Please be noticed that this process will take a long time, maybe couple of hours.

make run TARGET=hw DEVICE=xilinx_u50_gen3x16_xdma_5_202210_1

Run kernel(Step 3)

To get the benchmark results, please run the following command.

./build_dir.hw.xilinx_u50_gen3x16_xdma_5_202210_1/host.exe -xclbin build_dir.hw.xilinx_u50_gen3x16_xdma_5_202210_1/mis_kernel.xclbin -o data/data-csr-offset.mtx -i data/data-csr-indicesweights.mtx  -mis data/mis.txt

Renumber Input Arguments:

Usage: host.exe -[-xclbin -o -i -mis]
      -xclbin:        the kernel name
      -o              offset file of input graph in CSR format
      -i              edge file of input graph in CSR format
      -mis            golden reference file for validatation

Note: Default arguments are set in Makefile, you can use other Datasets listed in the table.

Profiling¶

The hardware resource utilizations are listed in the following table. Different tool versions may result slightly different resource.

Table 1 : Hardware resources for Renumber¶
Kernel	BRAM	URAM	FF	LUT	Frequency(MHz)
mis_kernel	75	0	15751	11255	300.0

Benchmark¶

The performance is shown in the table below.

Table 2 Comparison between CPU and FPGA (iteration=30)¶
Datasets	Vertex	Edges	CPU time	FPGA time	Speedup
coPapersCiteseer	434102	16036720	0.26	0.309	84.14%
coPapersDBLP	540486	15245729	0.31	0.317	97.79%
hollywood	1139905	57515616	1.78	1.21	147.11%
as-Skitter	1694616	11094209	0.74	0.344	215.12%
cit-Patents	3774768	16518948	2.07	0.584	354.45%

Note

Maximal independent set CPU time benchmarking is running on Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz, cache(2048 KB), cores(31)
time unit: ms.
This mis implementation focus on single-kernel-level design and focusing on mid-scale dataset processing. As showed in table, with the increasing of the graph vertex number, the FPGA show increasingly advantage over CPU offloading.