Debug Profile¶
This is simple example of vector addition and printing profile data (wall clock time taken between start and stop). It also dump a waveform file which can be reloaded to vivado to see the waveform. Run command ‘vivado -source ./scripts/open_waveform.tcl -tclargs <device_name>-<kernel_name>.<target>.<device_name>.wdb’ to launch waveform viewer. User can also update batch to gui in xrt.ini file to see the live waveform while running application. The example also demonstrates the use of hls::print to print a format string/int/double argument to standard output, and to the simulation log in cosim and HW_EMU.
KEY CONCEPTS: Use of Profile API, Waveform Dumping and loading
KEYWORDS: debug_mode=gui/batch, user_range, user_event, hls::print
The Vitis development environment can generate a waveform view and
launch a live waveform viewer
when running hardware emulation. It
displays in-depth details on the emulation results at system level,
compute unit level, and at function level. The details include data
transfers between the kernel and global memory, data flow via
inter-kernel pipes as well as data flow via intrakernel pipes. They
provide many insights into the performance bottleneck from the system
level down to individual function call to help developers optimize their
applications.
Example uses a simple vector addition kernel to demonstrate the debugging information that can be viewed in the waveform.
xrt.ini
file is used to launch the waveform. Waveform can be viewed
at runtime by launching GUI with the following command in this file.
[Emulation]
debug_mode=GUI
Waveform can also be generated by using .wdb
file generated during
hardware emulation which can be opened in Vivado
with the commands
written in the script provided under scripts/open_waveform.tcl
. For
this case, we need to add the following flags in the xrt.ini
file:
[Emulation]
debug_mode=batch
Waveforms are helpful to view data transfers to memory from host as well
as data transfer from each AXI Master ports. Another feature which
waveform viewer provides is the CU Stalls
. The stall bus compiles
all of the lowest level stall signals and reports the percentage that
are stalling at any point in time. This provides a factor of how much of
the kernel is stalling at any point in the simulation and user can
optimize the design to improve the utility of hardware based on these
stall signals.
If the user wants to record profiling information for arbitrary sections of his code, the following 2 features can be used -
user_range - Profiles and captures the data in the specified range
user_event - Marks the event in the timeliene trace
The user can also use the hls::print function to print a format string/int/double argument to standard output, and to the simulation log in cosim and HW_EMU. It can be used to trace the order in which code blocks are executed across complex control and concurrent execution (e.g. in dataflow) or trace the values of some selected variables.
When used in this simple example:
#include "hls_print.h"
...
hls::print("Number of elements : %d\n", length_r);
hls::print("Buffer size : %d\n", BUFFER_SIZE);
...
it prints the “Number of elements” and “buffer size” for C simulation, SW emulation, RTL cosimulation and HW emulation. It is ignored in HW flow and does not impact the kernel functionality.
EXCLUDED PLATFORMS:
All NoDMA Platforms, i.e u50 nodma etc
DESIGN FILES¶
Application code is located in the src directory. Accelerator binary files will be compiled to the xclbin directory. The xclbin directory is required by the Makefile and its contents will be filled during compilation. A listing of all the files in this example is shown below
src/host.cpp
src/host.h
src/vadd.cpp
Access these files in the github repo by clicking here.
COMMAND LINE ARGUMENTS¶
Once the environment has been configured, the application can be executed by
./debug_profile <vadd XCLBIN>