Stream Chain Matrix Multiplication

This is a kernel containing the cascaded Matrix Multiplication using dataflow. ap_ctrl_chain is enabled for this kernel to showcase how multiple enqueue of Kernel calls can be overlapped to give higher performance. ap_ctrl_chain allow kernel to start processing of next kernel operation before completing the current kernel operation.

KEY CONCEPTS: ap_ctrl_chain, PLRAM

This example focuses on using the ap_ctrl_chain that implements a set of block-level control ports to start the design operation, continue operation, and indicate when the design is idle, done, and ready for new input data. The ap_ctrl_chain interface mode is similar to ap_ctrl_hs but provides an additional input signal ap_continue to apply back pressure. Xilinx recommends using the ap_ctrl_chain block-level I/O protocol when chaining Vivado HLS blocks together.

In this example, we are having a series of Vivado HLS blocks chained together to perform this operation. For declaring this protocol, the kernel interface should contain this special pragma:

#pragma HLS INTERFACE s_axilite port = return bundle = control
#pragma HLS INTERFACE ap_ctrl_chain port = return bundle = control

SUPPORTED SHELLS

SHELL

Board

Software Version

xilinx_u200_qdma

Xilinx Alveo U200

Vitis 2020.1

xilinx_u200_xdma

Xilinx Alveo U200

Vitis 2020.1

xilinx_u250_qdma

Xilinx Alveo U250

Vitis 2020.1

xilinx_u250_xdma

Xilinx Alveo U250

Vitis 2020.1

xilinx_u280_xdma

Xilinx Alveo U280

Vitis 2020.1

DESIGN FILES

Application code is located in the src directory. Accelerator binary files will be compiled to the xclbin directory. The xclbin directory is required by the Makefile and its contents will be filled during compilation. A listing of all the files in this example is shown below

src/host.cpp
src/krnl_stream_mmult.cpp

COMMAND LINE ARGUMENTS

Once the environment has been configured, the application can be executed by

./vadd_mmult <krnl_stream_mmult XCLBIN>