XRT Native APIs¶
From 2020.2 release XRT provides a new XRT API set in C, C++, and Python flavor. This document introduces the usability of C and C++ APIs.
To use the native XRT APIs, the host application must link with the xrt_coreutil library.
Example g++ command
g++ -g -std=c++14 -I$XILINX_XRT/include -L$XILINX_XRT/lib -o host.exe host.cpp -lxrt_coreutil -pthread
The core data structures in C and C++ are as below
C++ Class | C Type (Handle) | |
---|---|---|
Device | xrt::device | xrtDeviceHandle |
XCLBIN | xrt::xclbin | xrtXclbinHandle |
Buffer | xrt::bo | xrtBufferHandle |
Kernel | xrt::kernel | xrtKernelHandle |
Run | xrt::run | xrtRunHandle |
Graph | TBD | xrtGraphHandle |
All the core data structures are defined inside in the header files at $XILINX_XRT/include/experimental/
directory. In the user host code, it is sufficient to include "experimental/xrt_kernel.h"
and "experimental/xrt_aie.h"
(when using Graph APIs) to access all the APIs related to these data structure.
5 #include "experimental/xrt_kernel.h" 6 #include "experimental/xrt_aie.h"
The common host code flow using the above data structures is as below
- Open Xilinx Device and Load the XCLBIN
- Set up the Buffers that are used to transfer the data between the host and the device
- Use the Buffer APIs for the data transfer between host and device (before and after the kernel execution).
- Use Kernel and Run handle/objects to offload and manage the compute-intensive tasks running on FPGA.
Below we will walk through the common API usage to accomplish the above tasks.
Device and XCLBIN¶
Device and XCLBIN class provide fundamental infrastructure-related interfaces. The primary objective of the device and XCLBIN related APIs are
- Open a Device
- Load compiled kernel binary (or XCLBIN) onto the device
Example C API based code
10 xrtDeviceHandle device = xrtDeviceOpen(0); 11 12 xrtXclbinHandle xclbin = xrtXclbinAllocFilename("kernel.xclbin"); 13 14 xrtDeviceLoadXclbinHandle(device,xclbin); 15 .............. 16 .............. 17 xrtDeviceClose(device);
The above code block shows
Opening the device (enumerated as 0) and get device handle
xrtDeviceHandle
(line 10)
- Device indices are enumerated as 0,1,2 and can be observed by
xbutil scan
>>xbutil scan INFO: Found total 2 card(s), 2 are usable ............. [0] 0000:b3:00.1 xilinx_u250_gen3x16_base_1 user(inst=129) [1] 0000:65:00.1 xilinx_u50_gen3x16_base_1 user(inst=128)Opening the XCLBIN from the filename and get an XCLBIN handle
xrtXclbinHandle
(line 12)Loading the XCLBIN onto the Device by using the XCLBIN handle by API
xrtDeviceLoadXclbinHandle
(line 14)Closing the device handle at the end of the application (line 19)
C++: The equivalent C++ API based code
10 unsigned int dev_index = 0; 11 auto device = xrt::device(dev_index); 12 auto xclbin_uuid = device.load_xclbin("kernel.xclbin");
The above code block shows
- The
xrt::device
class’s constructor is used to open the device- The member function
xrt::device::load_xclbin
is used to load the XCLBIN from the filename.- The member function
xrt::device::load_xclbin
returns the XCLBIN UUID, which is required to open the kernel (refer the Kernel Section).
Buffers¶
Buffers are primarily used to transfer the data between the host and the device. The Buffer related APIs are discussed in the following three subsections
- Buffer allocation and deallocation
- Data transfer using Buffers
- Miscellaneous other Buffer APIs
1. Buffer allocation and deallocation¶
XRT APIs provides API for
xrtBOAlloc
: Allocates a buffer object 4K aligned, the API must be called with appropriate flags.xrtBOAllocUserPtr
: Allocates a buffer object using pointer provided by the user. The user pointer must be aligned to 4K boundary.xrtBOFree
: Deallocates the allocated buffer.
15 xrtMemoryGroup bank_grp_idx_0 = xrtKernelArgGroupId(kernel, 0); 16 xrtMemoryGroup bank_grp_idx_1 = xrtKernelArgGroupId(kernel, 1); 17 18 xrtBufferHandle input_buffer = xrtBOAlloc(device, buffer_size_in_bytes, XRT_BO_FLAGS_NONE, bank_grp_idx_0); 19 xrtBufferHandle output_buffer = xrtBOAlloc(device, buffer_size_in_bytes, XRT_BO_FLAGS_NONE, bank_grp_idx_1); 20 21 .... 22 .... 23 xrtBOFree(input_buffer); 24 xrtBOFree(output_buffer);
The above code block shows
- Buffer allocation API
xrtBOAlloc
at lines 15,16- Buffer deallocation API
xrtBOFree
at lines 23,24
The various arguments of the API xrtBOAlloc
are
Argument 1: The device on which the buffer should be allocated
Argument 2: The size (in bytes) of the buffer
Argument 3:
xrtBufferFlags
: Used to specify the buffer type, most commonly used types are
XRT_BO_FLAGS_NONE
: Regular BufferXRT_BO_FLAGS_DEV_ONLY
: Device only Buffer (meant to be used only by the kernel).XRT_BO_FLAGS_HOST_ONLY
: Host Only Buffer (buffers reside in the host memory directly transferred to/from the kernel)XRT_BO_FLAGS_P2P
: P2P Buffer, buffer for NVMe transferXRT_BO_FLAGS_CACHEABLE
: Cacheable buffer can be used when host CPU frequently accessing the buffer (applicable for embedded platform).Argument 4:
xrtMemoryGroup
: Enumerated Memory Bank to specify the location on the device where the buffer should be allocated. ThexrtMemoryGroup
is obtained by the APIxrtKernelArgGroupId
as shown in line 15 (for more details of this API refer to the Kernel section).
C++: The equivalent C++ API based code
15 auto bank_grp_idx_0 = kernel.group_id(0); 16 auto bank_grp_idx_1 = kernel.group_id(1); 17 18 auto input_buffer = xrt::bo(device, buffer_size_in_bytes,bank_grp_idx_0); 19 auto output_buffer = xrt::bo(device, buffer_size_in_bytes, bank_grp_idx_1);
In the above code xrt::bo
buffer objects are created using the class’s constructor. Note the buffer flag is not used as constructor by default created regular buffer. Nonetheless, the available buffer flags for xrt::bo
are described using enum class
argument with the following enumerator values
xrt::bo::flags::normal
: Default, Regular Bufferxrt::bo::flags::device_only
: Device only Buffer (meant to be used only by the kernel).xrt::bo::flags::host_only
: Host Only Buffer (buffer resides in the host memory directly transferred to/from the kernel)xrt::bo::flags::p2p
: P2P Buffer, buffer for NVMe transferxrt::bo::flags::cacheable
: Cacheable buffer can be used when host CPU frequently accessing the buffer (applicable for embedded platform).
2. Data transfer using Buffers¶
XRT Buffer API library provides a rich set of APIs helping the data transfers between the host and the device, between the buffers, etc. We will discuss the following data transfer style
- Data transfer between host and device by Buffer read/write API
- Data transfer between host and device by Buffer map API
- Data transfer between buffers by copy API
I. Data transfer between host and device by Buffer read/write API¶
To transfer the data from the host to the device, the user first needs to update the host-side buffer backing pointer followed by a DMA transfer to the device.
The following C APIs are used for the above tasks
xrtBOWrite
xrtBOSync
with flagXCL_BO_SYNC_BO_TO_DEVICE
In C++, xrt::bo
class has following member functions for the same functionality
xrt::bo::write
xrt::bo::sync
with flagXCL_BO_SYNC_BO_TO_DEVICE
To transfer the data from the device to the host, the steps are reverse, the user first needs to do a DMA transfer from the device followed by the reading data from the host-side buffer backing pointer.
The following C APIs are used for the above tasks
xrtBOSync
with flagXCL_BO_SYNC_BO_FROM_DEVICE
xrtBORead
In C++ the corresponding xrt::bo
class’s member functions are
xrt::bo::sync
with flagXCL_BO_SYNC_BO_FROM_DEVICE
xrt::bo::read
Code example of transferring data from the host to the device
20 xrtBufferHandle input_buffer = xrtBOAlloc(device, buffer_size_in_bytes, XRT_BO_FLAGS_NONE, bank_grp_idx_0); 21 22 // Prepare the input data 23 int buff_data[data_size]; 24 for (int i=0; i<data_size; ++i) { 25 buff_data[i] = i; 26 } 27 28 xrtBOWrite(input_buffer,buff_data,data_size*sizeof(int),0); 29 xrtSyncBO(input_buffer,XCL_BO_SYNC_BO_TO_DEVICE, data_size*sizeof(int),0);
C++: The equivalent C++ API based code
20 auto input_buffer = xrt::bo(device, buffer_size_in_bytes, bank_grp_idx_0); 21 // Prepare the input data 22 int buff_data[data_size]; 23 for (auto i=0; i<data_size; ++i) { 24 buff_data[i] = i; 25 } 26 27 input_buffer.write(buff_data); 28 input_buffer.sync(XCL_BO_SYNC_BO_TO_DEVICE);
Note the C++ xrt::bo::sync
, xrt::bo::write
, xrt::bo::read
etc has overloaded version that can be used for paritial buffer sync/read/write by specifying the size and the offset. For the above code example, the full buffer size and 0 offset are used as default arguments.
II. Data transfer between host and device by Buffer map API¶
The API xrtBOMap
(C++: xrt::bo::map
) allows mapping the host-side buffer backing pointer to a user pointer. The host code can subsequently exercise the user pointer for the data reads and writes. However, after writing to the mapped pointer (or before reading from the mapped pointer) the API xrtBOSync
(C++: xrt::bo::sync
) should be used with direction flag for the DMA operation.
Code example of transferring data from the host to the device by this approach
20 xrtBufferHandle input_buffer = xrtBOAlloc(device, buffer_size_in_bytes, XRT_BO_FLAGS_NONE, bank_grp_idx_0); 21 int* input_buffer_mapped = (int*)xrtBOMap(input_buffer); 22 23 for (int i=0;i<data_size;++i) { 24 input_buffer_mappped[i] = i; 25 } 26 27 xrtBOSync(input_buffer, XCL_BO_SYNC_BO_TO_DEVICE, buffer_size_in_bytes, 0);
C++: The equivalent C++ API based code
20 auto input_buffer = xrt::bo(device, buffer_size_in_bytes, bank_grp_idx_0); 21 auto input_buffer_mapped = input_buffer.map<int*>(); 22 23 for (auto i=0;i<data_size;++i) { 24 input_buffer_mapped[i] = i; 25 } 26 27 input_buffer.sync(XCL_BO_SYNC_BO_TO_DEVICE);
III. Data transfer between the buffers by copy API¶
XRT provides xrtBOCopy
(C++: xrt::bo::copy
) API for deep copy between the two buffer objects if the platform supports a deep-copy (for detail refer M2M feature described in Memory-to-Memory (M2M)). If deep copy is not supported by the platform the data transfer happens by shallow copy (the data transfer happens via host).
API Example in C, all arguments are self-explanatory
25 size_t dst_buffer_offset = 0; 26 size_t src_buffer_offset = 0; 27 xrtBOCopy(dst_buffer, src_buffer, size_of_copy, dst_buffer_offset, src_buffer_offset);
C++: The equivalent C++ API based code
25 dst_buffer.copy(src_buffer, copy_size_in_bytes);
The API xrt::bo::copy
also has overloaded version to provide a different offset than 0 for both the source and the destination buffer.
3. Miscellaneous other Buffer APIs¶
This section describes a few other specific use-cases using buffers.
DMA-BUF API¶
XRT provides Buffer export and import APIs primarily used for sharing buffer across devices (P2P application) and processes.
xrtBOExport
(C++:xrt::bo::export_buffer
): Export the buffer to an exported buffer handlexrtBOImport
(C++:xrt::bo
constructor) : Allocate a BO imported from exported buffer handle
Consider the situation of exporting buffer from device 1 to device 2.
18 xclBufferExportHandle buffer_exported = xrtBOExport(buffer_device_1); 19 xrtBufferHandle buffer_device_2 = xrtBOImport(device_2, buffer_exported);
In the above example
- The buffer buffer_device_1 is a buffer allocated on device 1
- buffer_device_1 is exported to an
xclBufferExportHandle
by APIxrtBOExport
- The exported buffer of type
xclBufferExportHandle
is imported to device 2 by APIxrtBOImport
C++: The equivalent C++ API based code
18 auto buffer_exported = buffer_device_1.export_buffer(); 19 auto buffer_device_2 = xrt::bo(device_2, buffer_exported);
In the above example
- The buffer buffer_device_1 is a buffer allocated on device 1
- buffer_device_1 is exported by the member function
xrt::bo::export_buffer
- The new buffer buffer_device_2 is imported for device_2 by the constructor
xrt::bo
Sub-buffer support¶
The API xrtBOSubAlloc
(C++: supported by an xrt::bo
class constructor) allocates a sub-buffer from a parent buffer by specifying a start offset and the size.
In the example below a sub-buffer is created from a parent buffer of size 4 bytes staring from its offset 0
18 xrtBufferHandle parent_buffer; 19 xrtBufferHandle sub_buffer; 20 21 size_t sub_buffer_size = 4; 22 size_t sub_buffer_offset = 0; 23 24 sub_buffer = xrtBOSubAlloc(parent_buffer, sub_buffer_size, sub_buffer_offset);
C++: The equivalent C++ API based code
In C++ a sub-buffer is created by using the xrt::bo class’s constructor using the parent buffer, size, and offset as parameters.
18 size_t sub_buffer_size = 4; 19 size_t sub_buffer_offset = 0; 20 21 auto sub_buffer = xrt::bo(parent_buffer, sub_buffer_size, sub_buffer_offset);
Buffer information¶
XRT provides few other APIs to obtain information related to the buffer.
xrtBOSize
(C++: member functionxrt::bo::size
): Size of the bufferxrtBOAddr
(C++: member functionxrt::bo::address
) : Physical address of the buffer
Kernel and Run¶
The XRT kernel APIs support creating of kernel handle (or object in C++) from currently loaded xclbin. The kernel handle is used to execute the kernel function on the hardware instance (Compute Unit or CU) of the kernel.
A Run handle/object represents an execution of the kernel. Upon finishing the kernel execution, the Run handle/object can be reused to invoke the same kernel function if desired.
The following topics are discussed below
- Obtaining kernel handle/object from XCLBIN
- Getting the bank group index of a kernel argument
- Reading and write CU mapped registers
- Execution of kernel and dealing with the associated run
- Other kernel execution related API
Obtaining kernel handle/object from XCLBIN¶
The kernel handle (or object) is created from the device, XCLBIN UUID and the kernel name.
35 xuid_t xclbin_uuid; 36 xrtXclbinGetUUID(xclbin,xclbin_uuid); 37 38 xrtKernelHandle kernel = xrtPLKernelOpen(device, xclbin_uuid, "kernel_name"); 39 .... 40 .... 41 xrtKernelClose(kernel);
In the above code example
- The UUID of the XCLBIN is retrieved by the API
xrtXclbinGetUUID
- The kernel is created by the API
xrtPLKernelOpen
- The kernel is closed by the API
xrtKernelClose
Note: For the kernel with more than 1 CU, a kernel handle (or object) should represent all the CUs having identical interface connectivity. If all the CUs of the kernel are not having identical connectivity, the specific CU name(s) should be used to obtain a kernel handle (or object) to represent the subset of CUs with identical connectivity. Otherwise XRT will do this selection internally to select a group of CUs and discard the rest of the CUs (discarded CUs are not used during the execution of a kernel).
As an example, assume a kernel name is foo having 3 CUs foo_1, foo_2, foo_3. The CUs foo_1 and foo_2 are connected to DDR bank 0, but the CU foo_3 is connected to DDR bank 1.
Opening kernel handle for foo_1 and foo_2 (as they have identical interface connection)
35 cu_group_1 = xrtPLKernelOpen(device, xclbin_uuid, "foo:{foo_1,foo_2}");Opening kernel handle for foo_3
35 cu_group_2 = xrtPLKernelOpen(device, xclbin_uuid, "foo:{foo_3}");
C++: In C++, xrt::kernel
object can be created from the constructor of xrt::kernel
class.
35 auto xclbin_uuid = device.load_xclbin("kernel.xclbin"); 36 auto krnl = xrt::kernel(device, xclbin_uuid, name);
Exclusive access of the kernel’s CU¶
The API xrtPLKernelOpen
opens a kernel’s CU in a shared mode so that the CU can be shared with the other processes. In some cases, it is required to open the CU in exclusive mode (for example, when it is required to read/write CU mapped register). Exclusive CU opening fails if the CU is already opened in either shared or exclusive access.
39 xrtKernelHandle kernel = xrtPLKernelOpenExclusive(device, xclbin_uuid, "name");
C++: In C++, xrt::kernel
constructor can be called with an additional enum class
argument to access the kernel in exclusive mode. The enumerator values are:
xrt::kernel::cu_access_mode::shared
(defaultxrt::kernel
constructor argument)xrt::kernel::cu_access_mode::exclusive
39 auto krnl = xrt::kernel(device, xclbin_uuid, name, xrt::kernel::cu_access_mode::exclusive);
Getting bank group index of the kernel argument¶
We have seen in the Buffer creation section that it is required to provide the buffer location during the buffer creation. XRT provides an API xrtKernelArgGroupId
(C++: xrt::kernel::group_id
) that returns the bank index (ID) of a specific argument of the kernel. This ID is used as the last argument of xclAllocBO
(in C++ with xrt::bo
constructor) API to create the buffer on the same memory bank.
Let us review the example below where the buffer is allocated for the kernel’s first (argument index 0) by using this API
39 xrtMemoryGroup idx_0 = xrtKernelArgGroupId(kernel, 0); // bank index of 0th argument 40 xrtBufferHandle a = xrtBOAlloc(device, data_size*sizeof(int), XRT_BO_FLAGS_NONE, idx_0);
15 auto input_buffer = xrt::bo(device, buffer_size_in_bytes, kernel.group_id(0));
The API fails if the kernel bank index is ambiguous. For example, the kernel has multiple CU with different connectivity for that argument. In those cases, it is required to create a kernel object/handle with specific a CU (or group of CUs with identical connectivity).
Reading and write CU mapped registers¶
To read and write from the AXI-Lite register space corresponding to a CU, the CU must be opened in exclusive mode (in shared mode, multiple processes can access the CU’s address space, hence it is unsafe if they are trying to access/change registers at the same time leading to a potential race behavior). The required APIs for kernel register read and write are
xrtKernelReadRegister
(C++: member functionxrt::kernel::read_register
)xrtKernelWriteRegiste
(C++: member functionxrt::kernel::write_register
)
35 int read_data; 36 int write_data = 7; 37 38 xrtKernelHandle kernel = xrtPLKernelOpenExclusive(device, xclbin_uuid, "foo:{foo_1}"); 39 40 xrtKernelReadRegister(kernel,READ_OFFSET,&read_data); 41 xrtKernelWriteRegister(kernel,WRITE_OFFSET,write_data); 42 43 xrtKernelClose(kernel);
In the above code block
- The CU named “foo_1” (name syntax: “kernel_name:{cu_name}”) is opened exclusively.
- The Register Read/Write operation is performed.
- Closed the kernel
C++: The equivalent C++ API example
35 int read_data; 36 int write_data = 7; 37 38 auto krnl = xrt::kernel(device, xclbin_uuid, "foo:{foo_1}", true); 39 40 read_data = kernel.read_register(READ_OFFSET); 41 kernel.write_register(WRITE_OFFSET,write_data);
Obtaining the argument offset¶
The register read/write access APIs use the register offset as shown in the above examples. The user can get the register offset of a corresponding kernel argument from the v++
generated .xclbin.info
file and use with the register read/write APIs.
--------------------------
Instance: foo_1
Base Address: 0x1800000
Argument: a
Register Offset: 0x10
However, XRT also provides APIs to obtain the register offset for CU arguments. In the below example C API xrtKernelArgOffset
is used to obtain offset of third argument of the CU foo:foo_1
.
38 // Assume foo has 3 arguments, a,b,c (arg 0, arg 1 and arg 2 respectively) 39 40 xrtKernelHandle kernel = xrtPLKernelOpenExclusive(device, xclbin_uuid, "foo:{foo_1}"); 41 uint32_t arg_c_offset = xrtKernelArgOffset(kernel, 2);
C++: The equivalent C++ API example
38 // Assume foo has 3 arguments, a,b,c (arg 0, arg 1 and arg 2 respectively) 39 40 auto krnl = xrt::kernel(device, xclbin_uuid, "foo:{foo_1}", true); 41 auto offset = krnl.offset(2);
Executing the kernel¶
Execution of the kernel is associated with a Run handle (or object). The kernel can be executed by the API xrtKernelRun
(in C++ overloaded operator xrt::kernel::operator()
) that takes all the kernel arguments in order. The kernel execution API returns a run handle (or object) corresponding to the execution.
50 // 1st kernel execution 51 xrtRunHandle run = xrtKernelRun(kernel, buf_a, buf_b, scalar_1); 52 xrtRunWait(run); 53 54 // 2nd kernel execution with just changing 3rd argument 55 xrtRunSetArg(run,2,scalar_2); // Arguments are specified starting from 0 56 xrtRunStart(run); 57 xrtRunWait(run); 58 59 // Close the run handle 60 xrtRunClose(run);
Note the following APIs regarding the above example
The kernel is executed by
xrtKernelRun
API by specifying all its arguments to obtain a Run handleThe API
xrtKernelRun
is non-blocking. It returns as soon as it submits the job without waiting for the kernel’s actual execution start.The host code uses
xrtRunWait
API to block the current thread and wait till the kernel execution is finished.After a run is finished, the same run handle can be reused to execute the kernel multiple times if desired.
- API
xrtRunSetArg
is used to set one or more arguments, in the example above only the last (3rd) argument is changed before the second execution- API
xrtRunStart
is used to execute the kernel using the run handle.API
xrtRunClose
is used to close the Run handle.
C++: The equivalent C++ code
In C++ the xrt::kernel
class provides overloaded operator () to execute the kernel with a comma-separated list of arguments.
50 // 1st kernel execution 51 auto run = kernel(buf_a, buf_b, scalar_1); 52 run.wait(); 53 54 // 2nd kernel execution with just changing 3rd argument 55 run.set_arg(2,scalar_2); // Arguments are specified starting from 0 56 run.start(); 57 run.wait();
The above c++ code block is demonstrating
- The kernel execution using the
xrt::kernel()
operator with the list of arguments that returns a xrt::run object. This is an asynchronous API and returns after submitting the task.- The member function
xrt::run::wait
is used to block the current thread until the current execution is finished.- The member function
xrt::run::set_arg
is used to set one or more kernel argument(s) before the next execution. In the example above, only the last (3rd) argument is changed.- The member function
xrt::run::start
is used to start the next kernel execution with new argument(s).
Graph¶
In Versal ACAPs with AI Engines, the XRT Graph APIs can be used to dynamically load, monitor, and control the graphs executing on the AI Engine array. As of the 2020.2 release, XRT provides a set of C APIs for graph control. The C++ APIs are planned for a future release. Also, as of the 2020.2 release Graph APIs are only supported on the Edge platform.
A graph handle is of type xrtGraphHandle
.
Graph Opening and Closing¶
The XRT graph APIs support the obtaining of graph handle from currently loaded xclbin. The required APIs for graph open and close are
xrtGraphOpen
: API provides the handle of the graph from the device, XCLBIN UUID, and the graph name.xrtGraphClose
: API to close the graph handle.
35 xuid_t xclbin_uuid; 36 xrtXclbinGetUUID(xclbin,xclbin_uuid); 37 38 xrtGraphHandle graph = xrtGraphOpen(device, xclbin_uuid, "graph_name"); 39 .... 40 .... 41 xrtGraphClose(graph);
The graph handle obtained from xrtGraphOpen
is used to execute the graph function on the AIE tiles.
Reset Functions¶
There are two reset functions are used:
- API
xrtAIEResetArray
is used to reset the whole AIE array.- API
xrtGraphReset
is used to reset a specified graph by disabling tiles and enabling tile reset.
45 xrtDeviceHandle device_handle = xrtDeviceOpen(0); 46 ... 47 // AIE Array Reset 48 xrtAIEResetArray(device_handle) 49 50 xrtGraphHandle graph = xrtGraphOpen(device, xclbin_uuid, "graph_name"); 51 // Graph Reset 52 xrtGraphReset(graphHandle);
Graph execution¶
XRT provides basic graph execution control APIs to initialize, run, wait, and terminate graphs for a specific number of iterations. Below we will review some of the common graph execution styles.
Graph execution for a fixed number of iterations¶
A graph can be executed for a fixed number of iterations followed by a “busy-wait” or a “time-out wait”.
Busy Wait scheme
The graph can be executed for a fixed number of iteration by xrtGraphRun
API using an iteration argument. Subsequently, xrtGraphWait
or xrtGraphEnd
API should be used (with argument 0) to wait until graph execution is completed.
Let’s review the below example
The graph is executed for 3 iterations by API
xrtGraphRun
with the number of iterations as an argument.The API
xrtGraphWait(graphHandle,0)
is used to wait till the iteration is done.- The API xrtGraphWait is used because the host code needs to execute the graph again.
The Graph is executed again for 5 iteration
The API
xrtGraphEnd(graphHandle,0)
is used to wait till the iteration is done.- After
xrtGraphEnd
the same graph should not be executed.
- After
35 // start from reset state 36 xrtGraphReset(graphHandle); 37 38 // run the graph for 3 iteration 39 xrtGraphRun(graphHandle, 3); 40 41 // Wait till the graph is done 42 xrtGraphWait(graphHandle,0); // Use xrtGraphWait if you want to execute the graph again 43 44 45 xrtGraphRun(graphHandle,5); 46 xrtGraphEnd(graphHandle,0); // Use xrtGraphEnd if you are done with the graph execution
Timeout wait scheme
As shown in the above example xrtGraphWait(graphHandle,0)
performs a busy-wait and suspend the execution till the graph is not done. If desired a timeout version of the wait can be achieved by xrtGraphWaitDone
which can be used to wait for some specified number of milliseconds, and if the graph is not done do something else in the meantime. An example is shown below
35 // start from reset state 36 xrtGraphReset(graphHandle); 37 38 // run the graph for 100 iteration 39 xrtGraphRun(graphHandle, 100); 40 41 while (1) { 42 auto rval = xrtGraphWaitDone(graphHandle, 5); 43 std::cout << "Wait for graph done returns: " << rval << std::endl; 44 if (rval == -ETIME) { 45 std::cout << "Timeout, reenter......" << std::endl; 46 // Do something 47 } 48 else // Graph is done, quit the loop 49 break; 50 }
Infinite Graph Execution¶
The graph runs infinitely if xrtGraphRun
is called with iteration argument -1. While a graph running infinitely the APIs xrtGraphWait
, xrtGraphSuspend
and xrtGraphEnd can be used to suspend/end the graph operation after some number of AIE cycles. The API xrtGraphResume
is used to execute the infinitely running graph again.
39 // start from reset state 40 xrtGraphReset(graphHandle); 41 42 // run the graph infinitely 43 xrtGraphRun(graphHandle, -1); 44 45 xrtGraphWait(graphHandle,3000); // Suspends the graph after 3000 AIE cycles from the previous start 46 47 48 xrtGraphResume(graphHandle); // Restart the suspended graph again to run forever 49 50 xrtGraphSuspend(graphHandle); // Suspend the graph immediately 51 52 xrtGraphResume(graphHandle); // Restart the suspended graph again to run forever 53 54 xrtGraphEnd(graphHandle,5000); // End the graph operation after 5000 AIE cycles from the previous start
In the example above
The API
xrtGraphRun(graphHandle, -1)
is used to execute the graph infinitelyThe API
xrtGraphWait(graphHandle,3000)
suspends the graph after 3000 AIE cycles from the graph starts.- If the graph was already run more than 3000 AIE cycles the graph is suspended immediately.
The API
xrtGraphResume
is used to restart the suspended graphThe API
xrtGraphSuspend
is used to suspend the graph immediatelyThe API
xrtGraphEnd(graphHandle,5000)
is ending the graph after 5000 AIE cycles from the previous graph start.- If the graph was already run more than 5000 AIE cycles the graph ends immediately.
- Using
xrtGraphEnd
eliminates the capability of rerunning the Graph (without loading PDI and a graph reset again).
Measuring AIE cycle consumed by the Graph¶
The API xrtGraphTimeStamp
can be used to determine AIE cycle consumed between a graph start and stop.
Here in this example, the AIE cycle consumed by 3 iteration is calculated
35 // start from reset state 36 xrtGraphReset(graphHandle); 37 38 uint64_t begin_t = xrtGraphTimeStamp(graphHandle); 39 40 // run the graph for 3 iteration 41 xrtGraphRun(graphHandle, 3); 42 43 xrtGraphWait(graphHandle, 0); 44 45 uint64_t end_t = xrtGraphTimeStamp(graphHandle); 46 47 std::cout<<"Number of AIE cycles consumed in the 3 iteration is: "<< end_t-begin_t;
RTP (Runtime Parameter) control¶
XRT provides the API to update and read the runtime parameters of the graph.
- The API
xrtGraphUpdateRTP
to update the RTP - The API
xrtGraphReadRTP
to read the RTP.
35 ret = xrtGraphReset(graphHandle); 36 if (ret) throw std::runtime_error("Unable to reset graph"); 37 38 ret = xrtGraphRun(graphHandle, 2); 39 if (ret) throw std::runtime_error("Unable to run graph"); 40 41 float increment[1] = {1}; 42 const char *inVect = reinterpret_cast<const char *>(increment); 43 xrtGraphUpdateRTP(graphHandle, "mm.mm0.in[2]", inVect, sizeof (float)); 44 45 // Do more things 46 xrtGraphRun(graphHandle,16); 47 xrtGraphWait(graphHandle,0); 48 49 // Read RTP 50 float increment_out[1] = {1}; 51 char *outVect = reinterpret_cast<char *>(increment_out); 52 xrtGraphReadRTP(graphHandle, "mm.mm0.inout[0]", outVect, sizeof(float)); 53 std::cout<<"\n RTP value read<<increment_out[0];
In the above example, the API xrtGraphUpdateRTP
and xrtGraphReadRTP
are used to update and read the RTP values respectively. Note the API arguments
- The hierarchical name of the RTP port
- Pointer to write or read the RTP variable
- The size of the RTP value.
DMA operation to and from Global Memory IO¶
XRT provides API xrtAIESyncBO
to synchronize the buffer contents between GMIO and AIE. The following code shows a sample example
35 xrtDeviceHandle device_handle = xrtDeviceOpen(0); 36 37 // Buffer from GM to AIE 38 xrtBufferHandle in_bo_handle = xrtBOAlloc(device_handle, SIZE * sizeof (float), 0, 0); 39 40 // Buffer from AIE to GM 41 xrtBufferHandle out_bo_handle = xrtBOAlloc(device_handle, SIZE * sizeof (float), 0, 0); 42 43 inp_bo_map = (float *)xrtBOMap(in_bo_handle); 44 out_bo_map = (float *)xrtBOMap(out_bo_handle); 45 46 // Prepare input data 47 std::copy(my_float_array,my_float_array+SIZE,inp_bo_map); 48 49 50 xrtAIESyncBO(device_handle, in_bo_handle, "in_sink", XCL_BO_SYNC_BO_GMIO_TO_AIE, SIZE * sizeof(float),0); 51 52 xrtAIESyncBO(device_handle, out_bo_handle, "out_sink", XCL_BO_SYNC_BO_AIE_TO_GMIO, SIZE * sizeof(float), 0);
The above code shows
Input and output buffer (
in_bo_handle
andout_bo_handle
) to the graph are created and mapped to the user spaceThe API
xrtAIESyncBO
is used for data transfer using the following arguments
Device and Buffer Handle
The name of the GMIO ports associated with the DMA transfer
The direction of the buffer transfer
- GMIO to Graph:
XCL_BO_SYNC_BO_GMIO_TO_AIE
- Graph to GMIO:
XCL_BO_SYNC_BO_AIE_TO_GMIO
The size and the offset of the buffer
XRT Error API¶
In general, XRT APIs can encounter two types of errors:
- Synchronous error: Error can be thrown by the API itself. These types of errors should be checked against all APIs (strongly recommended).
- Asynchronous error: Errors from the underneath driver, system, hardware, etc.
XRT provides a couple of APIs to retrieve the asynchronous errors into the userspace host code. This helps to debug when something goes wrong.
xrtErrorGetLast
- Gets the last error code and its timestamp of a given error classxrtErrorGetString
- Gets the description string of a given error code.
NOTE: The asynchronous error retrieving APIs are at an early stage of development and only supports AIE related asynchronous errors. Full support for all other asynchronous errors is planned in a future release.
Example code
41 rval = xrtGraphRun(graphHandle, runInteration); 42 if (rval != 0) { 43 /* code to handle synchronous xrtGraphRun error */ 44 goto fail; 45 } 46 47 rval = xrtGraphWaitDone(graphHandle, timeout); 48 if (rval == -ETIME) { 49 /* wait Graph done timeout without further information */ 50 xrtErrorCode errCode; 51 uint64_t timestamp; 52 53 rval = xrtErrorGetLast(devHandle, XRT_ERROR_CLASS_AIE, &errCode, ×tamp); 54 if (rval == 0) { 55 size_t len = 0; 56 if (xrtErrorGetString(devHandle, errCode, nullptr, 0, &len)) 57 goto fail; 58 std::vector<char> buf(len); // or C equivalent 59 if (xrtErrorGetString(devHandle, errCode, buf.data(), buf.size())) 60 goto fail; 61 /* code to deal with this specific error */ 62 std::cout << buf.data() << std::endl; 63 } 64 } 65 /* more code can be added here to check other error class */
The above code shows
- As good practice synchronous error checking is done directly against all APIs (line 41,47,53,56,59)
- After timeout occurs from
xrtGraphWaitDone
the APIxrtErrorGetLast
is called to retrieve asynchronous error code (line 53)- Using the error code API
xrtErrorGetString
is called to get the length of the error string (line 56)- The API
xrtErrorGetString
called again for the second time to get the full error string (line 59)