.. _xrt_native_apis.rst: XRT Native APIs =============== From 2020.2 release XRT provides a new XRT API set in C, C++, and Python flavor. This document introduces the usability of C and C++ APIs. To use the native XRT APIs, the host application must link with the **xrt_coreutil** library. Example g++ command .. code-block:: shell g++ -g -std=c++14 -I$XILINX_XRT/include -L$XILINX_XRT/lib -o host.exe host.cpp -lxrt_coreutil -pthread The core data structures in C and C++ are as below +---------------+---------------+-------------------+ | | C++ Class | C Type (Handle) | +===============+===============+===================+ | Device | xrt::device | xrtDeviceHandle | +---------------+---------------+-------------------+ | XCLBIN | xrt::xclbin | xrtXclbinHandle | +---------------+---------------+-------------------+ | Buffer | xrt::bo | xrtBufferHandle | +---------------+---------------+-------------------+ | Kernel | xrt::kernel | xrtKernelHandle | +---------------+---------------+-------------------+ | Run | xrt::run | xrtRunHandle | +---------------+---------------+-------------------+ | Graph | TBD | xrtGraphHandle | +---------------+---------------+-------------------+ All the core data structures are defined inside in the header files at ``$XILINX_XRT/include/experimental/`` directory. In the user host code, it is sufficient to include ``"experimental/xrt_kernel.h"`` and ``"experimental/xrt_aie.h"`` (when using Graph APIs) to access all the APIs related to these data structure. .. code:: c :number-lines: 5 #include "experimental/xrt_kernel.h" #include "experimental/xrt_aie.h" The common host code flow using the above data structures is as below - Open Xilinx **Device** and Load the **XCLBIN** - Set up the **Buffers** that are used to transfer the data between the host and the device - Use the Buffer APIs for the data transfer between host and device (before and after the kernel execution). - Use **Kernel** and **Run** handle/objects to offload and manage the compute-intensive tasks running on FPGA. Below we will walk through the common API usage to accomplish the above tasks. Device and XCLBIN ----------------- Device and XCLBIN class provide fundamental infrastructure-related interfaces. The primary objective of the device and XCLBIN related APIs are - Open a Device - Load compiled kernel binary (or XCLBIN) onto the device Example C API based code .. code:: c :number-lines: 10 xrtDeviceHandle device = xrtDeviceOpen(0); xrtXclbinHandle xclbin = xrtXclbinAllocFilename("kernel.xclbin"); xrtDeviceLoadXclbinHandle(device,xclbin); .............. .............. xrtDeviceClose(device); The above code block shows - Opening the device (enumerated as 0) and get device handle ``xrtDeviceHandle`` (line 10) - Device indices are enumerated as 0,1,2 and can be observed by ``xbutil scan`` .. code:: >>xbutil scan INFO: Found total 2 card(s), 2 are usable ............. [0] 0000:b3:00.1 xilinx_u250_gen3x16_base_1 user(inst=129) [1] 0000:65:00.1 xilinx_u50_gen3x16_base_1 user(inst=128) - Opening the XCLBIN from the filename and get an XCLBIN handle ``xrtXclbinHandle`` (line 12) - Loading the XCLBIN onto the Device by using the XCLBIN handle by API ``xrtDeviceLoadXclbinHandle`` (line 14) - Closing the device handle at the end of the application (line 19) **C++**: The equivalent C++ API based code .. code:: c++ :number-lines: 10 unsigned int dev_index = 0; auto device = xrt::device(dev_index); auto xclbin_uuid = device.load_xclbin("kernel.xclbin"); The above code block shows - The ``xrt::device`` class's constructor is used to open the device - The member function ``xrt::device::load_xclbin`` is used to load the XCLBIN from the filename. - The member function ``xrt::device::load_xclbin`` returns the XCLBIN UUID, which is required to open the kernel (refer the Kernel Section). Buffers ------- Buffers are primarily used to transfer the data between the host and the device. The Buffer related APIs are discussed in the following three subsections 1. Buffer allocation and deallocation 2. Data transfer using Buffers 3. Miscellaneous other Buffer APIs 1. Buffer allocation and deallocation ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ XRT APIs provides API for - ``xrtBOAlloc``: Allocates a buffer object 4K aligned, the API must be called with appropriate flags. - ``xrtBOAllocUserPtr``: Allocates a buffer object using pointer provided by the user. The user pointer must be aligned to 4K boundary. - ``xrtBOFree``: Deallocates the allocated buffer. .. code:: c :number-lines: 15 xrtMemoryGroup bank_grp_idx_0 = xrtKernelArgGroupId(kernel, 0); xrtMemoryGroup bank_grp_idx_1 = xrtKernelArgGroupId(kernel, 1); xrtBufferHandle input_buffer = xrtBOAlloc(device, buffer_size_in_bytes, XRT_BO_FLAGS_NONE, bank_grp_idx_0); xrtBufferHandle output_buffer = xrtBOAlloc(device, buffer_size_in_bytes, XRT_BO_FLAGS_NONE, bank_grp_idx_1); .... .... xrtBOFree(input_buffer); xrtBOFree(output_buffer); The above code block shows - Buffer allocation API ``xrtBOAlloc`` at lines 15,16 - Buffer deallocation API ``xrtBOFree`` at lines 23,24 The various arguments of the API ``xrtBOAlloc`` are - Argument 1: The device on which the buffer should be allocated - Argument 2: The size (in bytes) of the buffer - Argument 3: ``xrtBufferFlags``: Used to specify the buffer type, most commonly used types are - ``XRT_BO_FLAGS_NONE``: Regular Buffer - ``XRT_BO_FLAGS_DEV_ONLY``: Device only Buffer (meant to be used only by the kernel). - ``XRT_BO_FLAGS_HOST_ONLY``: Host Only Buffer (buffers reside in the host memory directly transferred to/from the kernel) - ``XRT_BO_FLAGS_P2P``: P2P Buffer, buffer for NVMe transfer - ``XRT_BO_FLAGS_CACHEABLE``: Cacheable buffer can be used when host CPU frequently accessing the buffer (applicable for embedded platform). - Argument 4: ``xrtMemoryGroup``: Enumerated Memory Bank to specify the location on the device where the buffer should be allocated. The ``xrtMemoryGroup`` is obtained by the API ``xrtKernelArgGroupId`` as shown in line 15 (for more details of this API refer to the Kernel section). **C++**: The equivalent C++ API based code .. code:: c++ :number-lines: 15 auto bank_grp_idx_0 = kernel.group_id(0); auto bank_grp_idx_1 = kernel.group_id(1); auto input_buffer = xrt::bo(device, buffer_size_in_bytes,bank_grp_idx_0); auto output_buffer = xrt::bo(device, buffer_size_in_bytes, bank_grp_idx_1); In the above code ``xrt::bo`` buffer objects are created using the class's constructor. Note the buffer flag is not used as constructor by default created regular buffer. Nonetheless, the available buffer flags for ``xrt::bo`` are described using ``enum class`` argument with the following enumerator values - ``xrt::bo::flags::normal``: Default, Regular Buffer - ``xrt::bo::flags::device_only``: Device only Buffer (meant to be used only by the kernel). - ``xrt::bo::flags::host_only``: Host Only Buffer (buffer resides in the host memory directly transferred to/from the kernel) - ``xrt::bo::flags::p2p``: P2P Buffer, buffer for NVMe transfer - ``xrt::bo::flags::cacheable``: Cacheable buffer can be used when host CPU frequently accessing the buffer (applicable for embedded platform). 2. Data transfer using Buffers ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ XRT Buffer API library provides a rich set of APIs helping the data transfers between the host and the device, between the buffers, etc. We will discuss the following data transfer style I. Data transfer between host and device by Buffer read/write API II. Data transfer between host and device by Buffer map API III. Data transfer between buffers by copy API I. Data transfer between host and device by Buffer read/write API ***************************************************************** To transfer the data from the host to the device, the user first needs to update the host-side buffer backing pointer followed by a DMA transfer to the device. The following C APIs are used for the above tasks 1. ``xrtBOWrite`` 2. ``xrtBOSync`` with flag ``XCL_BO_SYNC_BO_TO_DEVICE`` In C++, ``xrt::bo`` class has following member functions for the same functionality 1. ``xrt::bo::write`` 2. ``xrt::bo::sync`` with flag ``XCL_BO_SYNC_BO_TO_DEVICE`` To transfer the data from the device to the host, the steps are reverse, the user first needs to do a DMA transfer from the device followed by the reading data from the host-side buffer backing pointer. The following C APIs are used for the above tasks 1. ``xrtBOSync`` with flag ``XCL_BO_SYNC_BO_FROM_DEVICE`` 2. ``xrtBORead`` In C++ the corresponding ``xrt::bo`` class's member functions are 1. ``xrt::bo::sync`` with flag ``XCL_BO_SYNC_BO_FROM_DEVICE`` 2. ``xrt::bo::read`` Code example of transferring data from the host to the device .. code:: c :number-lines: 20 xrtBufferHandle input_buffer = xrtBOAlloc(device, buffer_size_in_bytes, XRT_BO_FLAGS_NONE, bank_grp_idx_0); // Prepare the input data int buff_data[data_size]; for (int i=0; i(); for (auto i=0;i(increment); xrtGraphUpdateRTP(graphHandle, "mm.mm0.in[2]", inVect, sizeof (float)); // Do more things xrtGraphRun(graphHandle,16); xrtGraphWait(graphHandle,0); // Read RTP float increment_out[1] = {1}; char *outVect = reinterpret_cast(increment_out); xrtGraphReadRTP(graphHandle, "mm.mm0.inout[0]", outVect, sizeof(float)); std::cout<<"\n RTP value read< buf(len); // or C equivalent if (xrtErrorGetString(devHandle, errCode, buf.data(), buf.size())) goto fail; /* code to deal with this specific error */ std::cout << buf.data() << std::endl; } } /* more code can be added here to check other error class */ The above code shows - As good practice synchronous error checking is done directly against all APIs (line 41,47,53,56,59) - After timeout occurs from ``xrtGraphWaitDone`` the API ``xrtErrorGetLast`` is called to retrieve asynchronous error code (line 53) - Using the error code API ``xrtErrorGetString`` is called to get the length of the error string (line 56) - The API ``xrtErrorGetString`` called again for the second time to get the full error string (line 59)