Xilinx Media Accelerator (XMA) Core Library

Introduction

The Xilinx Media Accelerator (XMA) library (libxmaapi) is a host interface meant to simplify the development of applications managing and controlling video accelerators such as decoders, scalers, filters, and encoders. The libxmaapi is comprised of two API interfaces: an application interface and a plugin interface. The application API is a higher-level, generalized interface intended for application developers responsible for integrating control of Xilinx accelerators into software frameworks such as FFmpeg, GStreamer, or proprietary frameworks. The plugin API is a lower level interface intended for developers responsible for implementing hardware control of specific Xilinx acceleration kernels. In general, plugins are developed by kernel providers as these plugins are specialized user space drivers that are aware of the low-level hardware interface.

From a high-level perspective, the XMA sits between a media framework (i.e. FFmpeg) and the Xilinx runtime (XRT). In addition, the XMA acts as a peer to the host side implementation of OpenCL. The diagram below illustrates the entire stack including an example of common accelerator kernels that are possible in a specific design:

_images/XMA-Stack.png

The remaining sections will describe the key architectural aspects of the libxmaapi and describe the high-level API along with the low-level plugin API.

XMA Application Interface Overview

The API for the libxmaapi can be categorized into three areas:

  1. Initialization
  2. Video frame processing
  3. Termination

From an interface perspective, the high-level or upper edge interface and the low-level or plugin interface are organized as follows:

_images/XMA-Internal-Stack.png

The diagram above illustrates a number of distinct API layers. The XMA upper edge initialization API provides two types of initialization: global and session level initialization. The XMA upper edge API also provides functions for sending and receiving frames as well as a method for gracefully terminating a video stream when the end of the stream is found. Also depicted in the diagram is the XMA Framework and resource manager. The XMA Framework and resource manager are responsible for managing the state of the system, delegating requests to the appropriate plugin, and selecting available resources based on session creation requests.

See the Application Development Guide for more information about utilizing the XMA application interface to development your own stand alone or integrated applications.

XMA Plugin Interface Overview

The XMA lower edge API parallels the upper edge API; however, the lower edge API is comprised of function callbacks similar to those used in a driver or as defined in the FFmpeg plugin interface.

There are five classes of XMA plugin interfaces: decoders, encoders, filters, scalers, and a generic ‘kernel’ class. Since each of these classes are unique in terms of the processing performed, the APIs are slightly different, however, there is a common pattern associated with these classes. Specifically, a plugin must provide registration information and must implement all required callback functions. In general, an XMA plugin implements at least four required callback functions: initialize, send frame or send data, receive frame or receive data, and close. In addition to these required functions, the encoder plugin API offers two optional callbacks for channel allocation and retrieving the physical address of an available device input buffer. The channel allocation callback is needed for encoders that support multiple channels within a single kernel by time-division multiplexing of the underlying accelerator resources. The retrieval of a physical address of a device input buffer is needed when an encoder offers zero-copy support as this buffer address is required by the kernel preceding the encoder as the output buffer.

By way of example, the following represents the interface of the XMA Encoder class:

typedef struct XmaEncoderPlugin
{
 XmaEncoderType  hwencoder_type;
 const char    hwvendor_string;
 XmaFormatType   format;
 int32_t         bits_per_pixel;
 size_t          kernel_data_size;
 size_t          plugin_data_size;
 int32_t         (*init)(XmaEncoderSessionenc_session);
 int32_t         (*send_frame)(XmaEncoderSessionenc_session,
                               XmaFrame         frame);
 int32_t         (*recv_data)(XmaEncoderSession enc_session,
                              XmaDataBuffer     data,
                              int32_t           data_size);
 int32_t         (*close)(XmaEncoderSessionsession);
 int32_t         (*alloc_chan)(XmaSessionpending_sess,
                               XmaSession*curr_sess,
                               uint32_t sess_cnt);
 uint64_t        (*get_dev_input_paddr)(XmaEncoderSessionenc_session);
} XmaEncoderPlugin;

Finally, the XMA offers a set of buffer management utilities that includes the creation of frame buffers and encoded data buffers along with a set of miscellaneous utility functions. By providing XMA buffer management functions, it is possible for an XMA plugin to easily integrate with virtually any higher-level media framework without requiring any changes. Instead, it is up to the upper level media framework functions to convert buffers into the appropriate XMA buffer. The sections that follow will describe the layers of the API in more detail and provide examples of how these functions are called from both the perspective of an application and from the perspective of an XMA plugin. For the low-level details of the APIs, please consult the doxygen documentation.

Sequence of Operations

In order to better understand how XMA integrates with a standard multi-media framework such as FFmpeg, the sequence diagram that follows identifies the critical operations and functions called as part of a hypothetical encoder. The diagram only calls out the initialization and processing stages:

_images/XMA-Sequence-Diagram.png

As shown in the diagram above, the system is comprised of five blocks:

  • The FFmpeg Command Line application that is used to create a processing graph
  • The FFmpeg encoder plugin that interfaces with the XMA Upper Edge Interface to manage a video session
  • The XMA Upper Edge library interface responsible for initialization, resource allocation, and dispatching of the XMA plugin
  • The XMA Lower Edge plugin responsible for interfacing with the SDAccel Video Kernel
  • The XMA Video Kernel responsible for accelerating the encoding function

While this sequence diagram only shows five components, more complex systems can be developed that include multiple accelerators with the associated XMA plugin and FFmpeg plugin. In fact, adding new processing blocks is controlled entirely by the FFmpeg command line and the presence of the requested accelerator kernels. No additional development is required if all of the SDAccel kernels are available along with the associated plugins. In this example, an FFmpeg command is invoked that ingests an MP4 file encoded as H.264 and re-encodes the file as H.264 at a lower bit rate. As a result, the main() function of the FFmpeg command is invoked and this calls the xma_initialize() function. The xma_initialize() function is called prior to executing any other XMA functions and performs a number of initialization steps that are detailed in a subsequent section. Once the xma_initialize() successfully completes, the FFmpeg main() function performs initialization of all requested processing plugins. In this case, the hypothetical encoder plugin has been registered with FFmpeg and the initialization callback of the plugin is invoked. The FFmpeg encoder plugin begins by creating an XMA session using the xma_enc_session_create() function. The xma_enc_session_create() function finds an available resource based on the properties supplied and, assuming resources are available, invokes the XMA plugin initialization function. The XMA plugin initialization function allocates any required input and output buffers on the device and performs initialization of the SDAccel kernel if needed.

After initialization has completed, the FFmpeg main() function reads encoded data from the specified file, decodes the data in software, and sends the raw video frame to the FFmpeg plugin for encoding by calling the encode2() plugin callback. The encode2() callback function converts the AVFrame into an XmaFrame and forwards the request to the XMA Upper Edge interface via the xma_enc_session_send_frame() function. The xma_enc_session_send_frame() function locates the corresponding XMA plugin and invokes the send frame callback function of the plugin. The XMA send frame callback function writes the frame buffer data to a pre-allocated DDR buffer on the device and launches the kernel. After the FFmpeg plugin encode2() function has sent the frame for encoding, the next step is to determine if encoded data can be received or if another raw frame should be sent. In most cases, an encoder will want several raw frames before providing encoded data. Supplying multiple frames before generated encoded data improves video quality through a look ahead and improves performance by allowing new frame data to be written to the device DDR in parallel with processing previously supplied frames. Assuming a frame is ready to be received, the xma_enc_session_recv_data() function is called by the FFmpeg plugin and in turn results in the receive data function of the XMA plugin being invoked. The XMA plugin communicates with the kernel to ensure that data is ready to be received, determines the length of the encoded data, and reads the encoded data from DDR device memory to host memory. The description above is meant as a high-level introduction to FFmpeg and XMA. The remainder of this document covers these topics in more depth and provides code examples to help illustrate usage of the XMA.

Execution model

In earlier versions of XMA plugin xma_plg_register_write and xlc_plg_register_read were used for various purposes. However starting from 2018.3, xma_plg_register_write and xlc_plg_register_read are depricated and new APIs are provided at a higher level of abstraction. The new APIs are purposed-based. So instead of direct register read/write the user will use appropriate higher-level purposed based API to achieve the same result.

Towards that end, XMA now offers a new execution model with three brand new APIs.

The new APIs are:

  • xma_plg_register_prep_write
  • xma_plg_schedule_work_item
  • xma_plg_is_work_item_done

Lets consider the various purposes where the above APIs would be useful.

Purpose 1: The API xma_plg_register_write was used to send scaler inputs to the kernel by directly writing to the AXI-LITE registers. Now the higher level API xma_plg_register_prep_write should be used for the same purpose.

Purpose 2: The API xma_plg_register_write was also used to start the kernel by writing to the start bit of the AXI-LITE registers. For this purpose the new API xma_plg_schedule_work_item should be used instead of xma_plg_register_write.

Purpose 3: The API xma_plg_register_read was used to check kernel idle status (by reading AXI-LITE register bit) to determine if the kernel finished processing the operation. For this purpose now the new API xma_plg_is_work_item_done should be used.

The below table summarizes how to migrate to the new APIs from xma_plg_register_write/xma_plg_register_read.

Purposes Earlier register read/write API New API
Sending scalar input xma_plg_register_write xma_plg_register_prep_write
Starting the kernel xma_plg_register_write xma_plg_schedule_work_item
Checking if kernel finished processing xma_plg_register_read xma_plg_is_work_item_done

Application Development Guide

The XMA application interface is used to provide an API that can be used to control video accelerators. The XMA API operations fall into three categories:

  • Initialization
  • Runtime frame/data processing
  • Cleanup

Initialization

The first act an application must perform is that of initialization of the system environment. This is accomplished by calling xma_initialize() and passing in a string that represents the filepath to your system configuration file. This system configuration file, described in more detail below, serves as both information about the images you will be deploying as well as instructions to XMA with regard to which devices will be programmed with a given image.

Once the system has been configured according to the instructions in your system configuration file, the next step is allocate and initialize the video kernels that will be required for your video processing pipeline. Each class of video kernel supported by XMA has its own initialization routine and a set of properties that must be populated and passed to this routine to allocate and initialize a video kernel. Both system wide initialization and kernel initialization are detailed in the next two sections.

XMA System Configuration File

System configuration is described by a file conforming to YAMLsyntax. This file contains instructions for the XMA system initialization as well as description(s) of the kernel contents of the xclbin image file(s). The configuration file consists of two logial parts: - System paths to required libraries and binary files (e.g. pluginpath) - One or more image deployment plans and descriptions (i.e. ImageCfg)

Below is a sample configuration file describing a simple system configuration for a single device utilizing an image file containing a single HEVC encoder kernel:

SystemCfg:
    - logfile:    ./output.log
    - loglevel:   2
    - dsa:        xilinx_1525_dynamic_5_1
    - pluginpath: /tmp/libxmaapi/installdir/share/libxmaapi
    - xclbinpath: /tmp/xclbins
    - ImageCfg:
        xclbin:   hevc_encoder.xclbin
        zerocopy: disable
        device_id_map: [0]
        KernelCfg: [[ instances: 1,
                      function: encoder,
                      plugin: libhevc.so,
                      vendor: ACME,
                      name: hevc_encoder_1,
                      ddr_map: [0]]]

Because this file is parsed using YAML syntax, the indentation present in this example is mandatory for showing the relationships between the data.

The system information comes first and includes the path to the directory of the XMA plugin libraries as well as a directory to the xclbin files (aka images). After this system information will be one or more image descriptions. Each image description, denoted by the ‘ImageCfg’ key, instructs XMA as to which devices should be programmed with the given image file. In the example above, we are deploying only to device ‘0’ (devices are enumerated as positive integers starting from 0). In addition, a description of the kernels that are included in the image is also a part of the image description and will be used by XMA for tracking kernel resources.

The configuration file is hierarchial and must conform to YAML syntax as well as include the requisite keys else an error will be thrown indicating what is missing/mistaken.

In Backus-Naur Form, the grammar of the YAML file could be described as follows:

@precondition
[SystemCfg]    ::= SystemCfg:CRLF
                   (HTAB[logifile]CRLF)*
                   (HTAB[loglevel]CRLF)*
                   HTAB[dsa]CRLF
                      HTAB[pluginpath]CRLF
                   HTAB[xclbinpath]CRLF
                   (HTAB[ImageCfg])+
[logfile]      ::= logfile:[filepath]
[loglevel]     ::= loglevel:[0 | 1 | 2| 3]
[dsa]          ::= dsa:[name_string]
[pluginpath]   ::= pluginpath:[filepath]
[xclbinpath]   ::= xclbinpath:[filepath]
[ImageCfg]     ::= ImageCfg:CRLF HTAB*2[zerocopy]CRLF
                   HTAB*2[device_id_map]CRLF
                   HTAB*2[KernelCfg]CRLF
[zerocopy]     ::= zerocopy:(enable | disable)
[device_id_map]::= device_id_map:[number_list] CRLF
[KernelCfg]    ::= KernelCfg:%5B (%5B HTAB[instances]CRLF
                   HTAB*3[function]CRLF
                   HTAB*3[plugin]CRLF
                   HTAB*3[vendor]CRLF HTAB*3[name]CRLF
                   HTAB*3[ddr_map]CRLF %5D)+ %5D
[instances]    ::= instances:digit+
[function]     ::= encoder | scaler | decoder | filter | kernel
[plugin]       ::= plugin:[name_string]
[vendor]       ::= vendor:[name_string]
[name]         ::= name:[name_string]
[ddr_map]      ::= ddr_map:[number_list]
[filepath]     ::= (%2F(vchar)*)+
[name_string]  ::= (vchar)+
[number_list]  ::= %5B digit+[,(digit)+]*%5D

A description of each YAML key:

Parameters

SystemCfg
Mandatory header property. Takes no arguments.
logifile
Optional property of SystemCfg; specifies filename to write log output. If logfile and loglevel parameters are not specified, the log level will default to INFO and the output file will be stdout.
loglevel
Optional property of SystemCfg; specifies the level of logging of which there are four: CRITICAL, ERROR, INFO, DEBUG. Logs of a the level specified or lower will be output to the specified logfile. The level mapping is as follows: 0 = CRITICAL, 1 = ERROR, 2 = INFO, 3 = DEBUG. For more information regarding the logging capability see xmalog.
dsa
Property of SystemCfg; The name of the “Dynamic System Archive” used for all images.
pluginpath
Property of SystemCfg; The path to directory containing all plugin libraries (typically <libxmaapi install dir>/share/libxmaapi)
xclbinpath
Property of SystemCfg; The path to the directory containing the hardware binary file(s) that will be used to program the devices on the system.
ImageCfg
Property of SystemCfg; Mandatory sub-header property describing an xclbin image as well as specifying to which device(s) is shall be deployed.
xclbin
Property of ImageCfg; The xclbin filename that comprises this image to be deployed to the specified devices in device_id_map.
zerocopy
Property of ImageCfg; Either the bare word ‘enable’ or ‘disable’. If set to ‘enable’, indicates that zerocopy between kernels will be attempted if possible (requires both kernels to be connected to the same device memory).
device_id_map
Property of ImageCfg; An array of numeric device ids (0-indexed) indicating which fpga devices will be programmed with the xclbin. Note: if a device id specified is > than the number of actual devices on the system, initalization will fail and an error message will be logged.
KernelCfg
Property denoting the start of array of kernel entries contained in the xclbin.
instances
Propery of KernelCfg; identifies the number of kernels of a a specific type included in this xclbin. IMPORTANT: The order of the kernel entries MUST MATCH the order of base addresses in which the kernels are assigned in a given xclbin. Lowest base address must be described first.
function
Either ‘encoder’,’scaler’,’decoder’,’filter’ or ‘kernel’ as appropriate for this kernel entry.
plugin
Then name of the XMA plugin library that will be mapped to this kernel entry; used by XMA to route high level application calls to the appropriate XMA plugin driver.
vendor
Name of the vendor that authored this kernel. Important for session creation as the vendor string is used by application code to, in part, identify which kernel entry is being requested for a given session.
name
The name, as it appears in the xclbin, of this kernel entry. Not used as this time.
ddr_map
An array of integer values indicating a mapping of kernel instances to DDR banks. This MUST MATCH the number of kernel instances indicated for this entry.

Below is a sample of a more complex, multi-image YAML configuration file:

SystemCfg:
    - logfile:    ./output.log
    - loglevel:   2
    - dsa:        xilinx_xil-accel-rd-vu9p_4ddr-xpr_4_2
    - pluginpath: /plugin/path
    - xclbinpath: /xcl/path
    - ImageCfg:
        xclbin: filename1.xclbin
        zerocopy: enable
        device_id_map: [0,1]
        KernelCfg: [[ instances: 2,
                      function: HEVC,
                      plugin:  libhevc.so,
                      vendor: ACME,
                      name:   hevc_kernel,
                      ddr_map: [0,0]],
                    [ instances: 1,
                      function: Scaler,
                      plugin: libxscaler.so,
                      vendor: Xilinx,
                      name: xlnx_scaler_kernel,
                      ddr_map: [0]]]
    - ImageCfg:
        xclbin: filename2.xclbin
        zerocopy: disable
        device_id_map: [2]
        KernelCfg: [[ instances: 1,
                      function: H264,
                      plugin:  libxlnxh264.so,
                      vendor: Xilinx,
                      name: H264_E_KERNEL,
                      ddr_map: [0]]]

In the above example, two images are described. XMA will deploy the filename1.xclbin to devices 0 and 1. The first image consists of three kernels: two hevc kernels mapped to DDR banks 0 and 0. The third kernel is the video scaler. The second image file is instructed to be deployed to device 2 and consists of a single h264 kernel mapped to ddr bank 0. Logging is set to a local file called output.log and at the INFO level (i.e. all logging of type CRITICAL, ERROR and INFO will be output to the log).

This YAML file will be consumed by the application code as the first step in the initalization process.

XMA Initalization

The prior section described the components of a proper configuration file necessary for describing the planned initialization of the system. Herein, we describe the proper XMA API calls to both initialize the system with your properly prepared YAML system configuration file as well as the to allocate and initialize one or more video kernels.

Initialization has two parts and must be performed in the following order:

  • system initialization wherein all devices are programmed with images as described by the XMA system configuration file
  • kernel initialization wherein a specific kernel resource is initialized for video processing

All application code must include the following header file to access the XMA application interface:

#include <xma.h>

This header will pull in all files located in [include_dir]/app/ which, collectively, defines the complete application interface and datastructures required for XMA development.

The first step for any XMA application is to initalize the system with the system configuraton file:

//prior includes
...
#include <xma.h>
// XMA application interface
int main(void) {
    int rc;charmy_yaml_path = "/tmp/xma_sys_cfg.yaml";
    rc = xma_initalize(my_yaml_path);...
}

The above code will program all devices on the system as defined in the xma_sys_cfg.yaml. The name of the configuration file is arbitrary and you may have multiple configuration files. However, only the first invocation of xma_initialize will result in programming of the system. Any subsequent invocation is idempotent. If another process attempts to initalize the system (or the same program is invoked a 2nd time) while the original process that initialized the system is still active, the existing system configuration will be utilized by the 2nd process; device programming will only ever occur once. When all processes connected to the original system configuration have terminated, the process of initialization with a new YAML file can begin anew when a later process calls xma_initalize() with a new system configuration file.

Once the system has been initialized, then kernel sessions can be allocated.

Each kernel class (i.e. encoder, filter, decoder, scaler, filter, kernel) requires different properties to be specified before a session can be created.

See the document for the corresponding module for more details for a given kernel type: - xmadec - xmaenc - xmafilter - xmascaler - xmakernel

The general initialization sequence that is common to all kernel classes is as follows:

  • define key type-specific properties of the kernel to be initialized
  • call the_session_create() routine corresponding to the kernel (e.g. xma_enc_session_create()) Using the decoder kernel as an example, the following code defines request for an H264 decoder kernel made by Xilinx:
#include <xma.h>
...
// init system via yaml file
...
// Setup decoder properties
XmaDecoderProperties dec_props;
dec_props.hwdecoder_type = XMA_H264_DECODER_TYPE;
strcpy(dec_props.hwvendor_string, "Xilinx");
// Create a decoder session based on the requested properties
XmaDecoderSessiondec_session;
dec_session = xma_dec_session_create(&dec_props);
if (!dec_session){
    // Log message indicating session could not be created
    // return from function
}
    ...

What is returned is a reference to a session object (XmaDecoderSession in the case of the above example). This will serve as an opqaue object handlthat you will pass to all other API routines interacting with this kernelA session represents control a single kernel. Note that some kernelmay support ‘channels’ which are portions of a kernel resource that behavlike full kernels (i.e. in essence, a ‘virtual’ kernel). The distinctionis unimportant to the application developer; a session is a kernel resourcand functions as a dedicated kernel resource to the requesting process othread. Note: channels of a given kernel may only be assigned to threadfrom within a given process context. Multiple processes may not shara kernel; channels from a single kernel may not be assigned to multiplprocesses.

Runtime Frame and Data Processing

Once system and kernel initalization (i.e. session creation) are complete, video processing may commence.

Most kernel types include routines to consume data and then produce data from host memory buffers. Depending on the nature of the kernel, you may be required to send a frame and then receive data or vice versa. XMA defines buffer data structures that correspond to frames (XmaFrame) or data (XmaFrameData). These buffer structures are used to communicate with the kernel application APIs and include addresses to host memory which you will be required to allocate. The XMA Application Interface includes functions to allocate data from host memory and create these containers for you. See xmabuffers.h for additional information.

Continuing with our decoder example, the two runtime routines for data processing are:

  • xma_dec_session_send_data()
  • xma_dec_session_recv_frame()

Calling the send_data() routine and following with recv_frame() will form the body of your runtime processing code.

If, by contrast, we examine the XMA Encoder library, we see the following two routines:

  • xma_enc_session_send_frame()
  • xma_enc_session_recv_data()

The idea is the same as that of the decoder: send data to be processed, thereceive the data.

int ret, data_size = 0;...// XMA init code and enc_session
...
// Create an input frame
XmaFrameProperties fprops;
fprops.format = XMA_YUV420_FMT_TYPE;fprops.width = 1920;
fprops.height = 1080;
fprops.bits_per_pixel = 8;
XmaFramescl_frame = xma_frame_alloc(&fprops);

// Create data buffer for encoderXmaDataBuffer
buffer;
buffer = xma_data_buffer_alloc(191080);
...
ret = XMA_SEND_MORE_DATA;
//send encoder frame
if (ret == XMA_SEND_MORE_DATA) {
    ret = xma_enc_session_send_frame(enc_session, scl_frame);
    continue; // read next frame into scl_frame buffer}
else if (ret == XMA_SUCCESS) {
    do {
        xma_enc_session_recv_data(enc_session, buffer, &data_size);
    }while(data_size == 0);
}

Some routines, such as that of the encoder, may require multiple frames of data before recv_data() can be called. You must consult the API to ensure you check for the correct return code to know how to proceed. In the case of the encoder, calling xma_enc_session_send_frame() may return XMA_SEND_MORE_DATA which is an indication that calling recv_data() will not yield any data as more frames must be sent before any output data can be received.

Of special note is the XmaKernel plugin type. This kernel type is a generic type and not necessarily video-specific. It is used to represent kernels that perform control functions and/or other functions not easily represented by any of the other kernel classes.

As such, the application API is more flexible:

  • xma_kernel_session_write
  • xma_kernel_session_read

These routines take a list of XmaParameter objects which are type-length-value objects. A kernel implementing this interface must make known what parameters are legal to the application developer via a document so that that right types of parameters may be instantiated and passed to the write/read routines. If using a kernel of this type, consult the kernel developer’s documentation to learn what XmaParameter types are expected to be passed in for write() and what will be returned upon calling read().

Cleanup

When runtime video processing has concluded, the application should destroy each session. Doing so will free the session to be used by another thread or process and ensure that the kernel plugin has the opportunity to perform proper cleanup/closing procedures.

  • xma_enc_session_destroy()
  • xma_dec_session_destroy()
  • xma_scaler_session_destroy()
  • xma_filter_session_destroy()
  • xma_kernel_session_destroy()

Plugin Development Guide

Overview

The XMA Plugin Interface is used to write software capable of managing a specific video kernel hardware resource. The plugin interface consists of a library for moving data between device memory and host memory and accessing hardware registers. Additionally, standard interfaces are defined to represent various video kernel archtypes such as encoders, decoders, and filters.

The plugin developer, by implementing a given plugin interface, permits XMA to translate requests from XMA applications into hardware-specific actions (i.e. register programming, buffer processing). The XMA plugin is akin to a software ‘driver’ in this regard.

The first step in developing an XMA plugin requires you to decide which XMA kernel interface accurately represents the type of hardware kernel for which you seek to provide support:

Kernel Type XMA Plugin Interface
Encoders (VP9, H.264, H.265) xmaplgenc
Decoders (VP9, H.264, H.265) xmaplgdec
Filters (colorspace converter, scalers) xmaplgfilter or xmaplgscaler
Scalers xmaplgscaler
Other (embedded cpu) xmaplgkernel

Once selected, the job of the plugin author is to implement the interface for the given kernel thus providing a mapping between the xma_app_intf and the kernel. Most callbacks specified are implicitly mandatory with some exceptions which will be noted below.

Your plugin will be compiled into a shared object library and linked to the kernel via the XMA configuration file ‘pluginpath’ property:

SystemCfg:
    - dsa:        xilinx_1525_dynamic_5_1
    - pluginpath: /tmp/libxmaapi/installdir/share/libxmaapi
    - xclbinpath: /tmp/xclbins
    - ImageCfg:
        xclbin:   hevc_encoder.xclbin
        zerocopy: disable
        device_id_map: [0]
        KernelCfg: [[ instances: 1,
                      function: encoder,
                      plugin: libhevc.so,
                      vendor: ACME,
                      name: hevc_encoder_1,
                      ddr_map: [0]]]

In the above example, the libhevc.so is an XMA plugin that is linked to the encoder instance produced by the “ACME” company. When an application requests a resource through the XMA Application API, it will specify a specific type, from the list of XmaEncoderType as well as a vendor name string. Your plugin will be linked to the vendor string as part of the YAML configuration file (as indicated in the example above) and will specify the precise type (i.e. XmaEncoderType) it is designed to control in its XMA kernel-specific plugin data structure (e.g. see XmaEncoderPlugin::hwencoder_type). If there is a match, then your plugin will be called into service to implement control of the kernel in response to the application interface.

See xma_app_init_yaml for more details about the system configuration file.

XMA Plugin Code Layout

Each XMA kernel type specifies a slightly different interface so these guidelines are intended to cover what is generally common.

All plugin code must include xmaplugin.h

#include <xmaplugin.h>

This will provide the plugin code access to all data structures necessary to author XMA plugin code. This includes access to the structures used by the xma_app_intf as xmaplugin.h includes xma.h.

What follows is a general description of what is expected of a plugin in response to the xma_app_intf.

From the application perspective, the following operations will be peformed:

  1. Create session
  2. Send data/frame or write**
  3. Receive data/frame or read**
  4. Destroy

** in the case of a non-video kernel

Steps 2 and 3 will form the runtime processing of frames/data and likely repeated for as long as there is data to be processed.

A general mapping between the application interface and plugin interface:

Application Call Plugin Callbacks Invoked
session_create()
alloc_chan()**
init()
send_(data|frame)()
get_dev_input_paddr()**
send_(data|frame)()
recv_(data|frame)() recv_(data|frame)()
destroy() close()

** optional callback if specified in kernel interface

Using the XMA encoder plugin kernel type as an example (specified by XmaEncoderPlugin) the following is a rough sketch of a simple plugin implementation with most implementation details omitted for brevity:

#include <stdio.h>
#include <xmaplugin.h>


static int32_t xlnx_encoder_init(XmaEncoderSessionenc_session)
{
    //Gather plugin-specific data and properties
    EncoderContextctx = enc_session->base.plugin_data;
    XmaEncoderPropertiesenc_props = &enc_session->encoder_props;
    HostKernelCtxpKernelCtx = ((XmaSession*)enc_session)->kernel_data;
    ...
    //allocate device buffers for incoming and outgoing encoded data
    ctx->encoder.input_y_buffer[i].b_handle = xma_plg_buffer_alloc(hw_handle,
                                                  ctx->encoder.input_y_buffer[i].b_size);

    ctx->encoder.input_u_buffer[i].b_handle = xma_plg_buffer_alloc(hw_handle,
                                                  ctx->encoder.input_u_buffer[i].b_size);

    ctx->encoder.input_v_buffer[i].b_handle = xma_plg_buffer_alloc(hw_handle,
                                                  ctx->encoder.input_v_buffer[i].b_size);
    //alloc add'l buffers for outgoing data
    ...
    //initalize state of encoder based on enc_props via register_write
    ...
    //update private context data structuresctx andpKernelCtx
    ...
    return 0;
}

static int32_t xlnx_encoder_alloc_chan(XmaSessionpending, XmaSession*sessions, uint32_t sess_cnt)
{
    // evaluate pending session loado on kernel vs existing sessions and reject/approve
    ...
    //approve new channel request and assign channel id
    pending->chan_id = sess_cnt;
    return 0;
}

static int32_t xlnx_encoder_send_frame(XmaEncoderSessionenc_session, XmaFrameframe)
{
    EncoderContextctx = enc_session->base.plugin_data;
    XmaHwSession hw_handle = enc_session->base.hw_session;
    HostKernelCtxpKernelCtx = ((XmaSession*)enc_session)->kernel_data;
    uint32_t nb = 0;
    nb = ctx->n_frame % NUM_BUFFERS;

    //write frame properties to registers
    xma_plg_register_write(hw_handle, &(ctx->width), sizeof(uint32_t), ADDR_FRAME_WIDTH_DATA);
    xma_plg_register_write(hw_handle, &(ctx->height), sizeof(uint32_t), ADDR_FRAME_HEIGHT_DATA);
    xma_plg_register_write(hw_handle, &(ctx->fixed_qp), sizeof(uint32_t), ADDR_QP_DATA);
    xma_plg_register_write(hw_handle, &(ctx->bitrate), sizeof(uint32_t), ADDR_BITRATE_DATA);
    ...
    //additional register writes for frame processing...
    ...
    //copy host frame data to device memory for YUV buffer
    xma_plg_buffer_write(hw_handle,
            ctx->encoder.input_y_buffer[nb].b_handle,
            frame->data[0].buffer,
            ctx->encoder.input_y_buffer[nb].b_size, 0);

    xma_plg_buffer_write(hw_handle,
            ctx->encoder.input_u_buffer[nb].b_handle,
            frame->data[1].buffer,
            ctx->encoder.input_u_buffer[nb].b_size, 0);

    xma_plg_buffer_write(hw_handle,
            ctx->encoder.input_v_buffer[nb].b_handle,
            frame->data[2].buffer,
            ctx->encoder.input_v_buffer[nb].b_size, 0);
    //additonal register read to ensure data is processed
    ...
    return 0;
}

static int32_t xlnx_encoder_recv_data(XmaEncoderSessionenc_session, XmaDataBufferdata, int32_tdata_size)
{
    EncoderContextctx = enc_session->base.plugin_data;
    XmaHwSession hw_handle = enc_session->base.hw_session;
    HostKernelCtxpKernelCtx = ((XmaSession*)enc_session)->kernel_data;
    int64_t out_size = 0;
    uint64_t d_cnt = 0;
    uint32_t nb = (ctx->n_frame) % NUM_BUFFERS;

    // Read the length of output data into out_size
    ...
    // Copy data to host buffer data->data.buffer
    xma_plg_buffer_read(hw_handle,
                        ctx->encoder.output_buffer[nb].b_handle,
                        data->data.buffer, out_size, 0);
    ...
    return 0;
}

static int32_t xlnx_encoder_close(XmaEncoderSessionenc_session)
{
    EncoderContextctx = enc_session->base.plugin_data;
    XmaHwSession hw_handle = enc_session->base.hw_session;

    for (int i = 0; i < NUM_BUFFERS; i++)
    {
        xma_plg_buffer_free(hw_handle, ctx->encoder.input_y_buffer[i].b_handle);
        xma_plg_buffer_free(hw_handle, ctx->encoder.input_u_buffer[i].b_handle);
        xma_plg_buffer_free(hw_handle, ctx->encoder.input_v_buffer[i].b_handle);
        xma_plg_buffer_free(hw_handle, ctx->encoder.output_buffer[i].b_handle);
    }
    return 0;
}

XmaEncoderPlugin encoder_plugin = {
    .hwencoder_type = XMA_H264_ENCODER_TYPE,
    .hwvendor_string = "Xilinx",
    .format = XMA_YUV420_FMT_TYPE,
    .bits_per_pixel = 8,
    .plugin_data_size = sizeof(EncoderContext),
    .kernel_data_size = sizeof(HostKernelCtx),
    .init = xlnx_encoder_init,
    .send_frame = xlnx_encoder_send_frame,
    .recv_data = xlnx_encoder_recv_data,
    .close = xlnx_encoder_close,
    .alloc_chan = xlnx_encoder_alloc_chan,
    .get_dev_input_paddr = NULL
};

Note that each plugin implementation must statically allocate a data structure with a specific name (as present on line 425 in the above example):

Plugin Type Required Global Variable Name
XmaDecoderPlugin decoder_plugin
XmaEncoderPlugin encoder_plugin
XmaFilterPlugin filter_plugin
XmaScalerPlugin scaler_plugin
XmaKernelPlugin Kernel_plugin

Initalization

Initialization is the time for a plugin to perform one or more of the following: * evaluate an application request for a kernel channel (optional) * allocate device buffers to handle input data as well as output data * initalize the state of the kernel

When an application creates a session (e.g. xma_enc_session_create()), the plugin code will have the following callbacks invoked:

  1. alloc_chan (optional)
  2. init

What is returned to the application code is a session object corresponding to the type of session requested (e.g. XmaEncoderSession). All session objects derive from a base class: XmaSession. These session data structures contain all of the instance data pertaining to a kernel and are used by the XMA library as well as plugin for storage and retrieval of state information.

From the perspective of the application, an session object represents control of a kernel instance. This may, in fact, be an entire video kernel or, in the case of a kernel that supports channels, a ‘virtual’ kernel that is shared amongst more than one thread of execution. If your kernel supports channels (i.e. a type of ‘virtual’ kernel), then the alloc_chan() callback must be implemented. The signature for alloc_chan includes an array of existing XmaSession objects that have been previously allocated to this kernel as well as the currently pending request. It is your responsibility, as the plugin developer, to decide if the pending request can be approved or rejected. Approval should include updating the XmaSession::chan_id member with a non-negative channel id and an XMA_SUCCESS return code.

Your init function will then be called after alloc_chan (assuming it was implemented). Within your init() implementation, you will be expected to intialize any private session-specific data structures, kernel-specific data structures, allocate device memory for holding incoming data as well as for holding outgoing data and program the registers of the kernel to place it into an initial state ready for processing data.

When your plugin is first loaded, XMA will allocate memory for kernel-wide data based on the size you specify in your plugin. This data is considered global for all sessions sharing a given kernel (if the kernel supports this via channels) and should be protected from simultaneous access.

When a session has been created in response to an application request, XMA will allocate plugin data that is session-specific.

These XmaSession::kernel_data and XmaSession::plugin_data members are available to you to store the necessary kernel-wide and session-specific state as necessary. There is no need to free these data structures during termination; XMA frees this data for you.

The XMA Plugin Library provides a set of functions to allocating device memory and performing register reads and writes. To allocate buffers necessary to handle both incoming and outgoing data, please see xma_plg_buffer_alloc().

See xmaplugin for more details.

Handling Incoming Application Data

For each kernel type, there is an application interface to send data to be proceessed (i.e. encoded, decoded, or otherwised transformed). Data being sent by an application to the kernel will result in the invocation of your send()/write() callback.

The most common operation within the plugin is to copy data from host memory to device memory so that it may be operated on by the kernel. Subsequently, the kernel must be programmed to know which device buffer contains the data to be processed and programmed appropriately.

The XMA Plugin library call xma_plg_buffer_write() can be used to copy host data to device data.

xma_plg_register_write() and xma_plg_register_read() can be used to program the kernel registers and start kernel processing.

Sending Output to the Application

For each kernel type, there is an application interface to request processed data (i.e. encoded, decoded, otherwise transformed) by the kernel. Data being requested by an application from the kernel will invoke your recv()/read() callback implementation.

The most common operation within the plugin is to copy data from device memory back to host memory so that it may be processed by the application. Subsequently, the kernel may be prepared for new data to arrive for processing.

The XMA Plugin library call xma_plg_buffer_read() can be used to copy host data to device data.

xma_plg_register_write() and xma_plg_register_read() can be used to program the kernel registers and start kernel processing.

Termination

When an XMA application has concluded data processing, it will destroy its kernel session. Your close() callback will be invoked to perform the necessary cleanup. Your close() implementation should free any buffers that were allocated in device memory during your init() via xma_plg_buffer_free(). Freeing XmaSession::kernel_data and XmaSession::plugin_data is not necessary as this will be done by the XMA library.

Zerocopy Special Case

Encoders are capable of receiving data directly from upstream video processing kernels such as filters or scalers. In such a case, it may improve the the performance of a video processing pipeline that includes both a filter and an encoder to exchange data directly within device memory rather than have the filter copy data back to a host buffer only to be re-copied from the host to the device buffer of the downstream encoder. This double-copy can be avoided if the two kernels can share a buffer within the device memory; a buffer that serves as an ‘output’ buffer for the filter but an ‘input’ buffer for the encoder. This optimization is known as ‘zerocopy’. The encoder must implement the XmaEncoderPlugin::get_dev_input_paddr() callback. The XMA library can detect whether the two kernel sessions are capable of sharing buffers. The following conditions will be checked:

  1. Both kernel sessions are connected to the same device DDR bank
  2. The get_dev_input_paddr() callback is implemented by the encoder session
  3. The encoder has been configured to expect frame data that is same format and size as the upstream filter kernel is producing as output.
  4. The system configuration file has specified that zerocopy is ‘enabled’

If all of the above conditions are true, zero-copy between the kernels will be supported. The XMA library will obtain the destination buffer address for the filter from the encoder session. This will then be provided as the destination address to the filter’s XmaFrame argument as part of its recv_frame() callback.

xma

XMA Application Interface

The interface used by stand-alone XMA applications or plugins

int32_t xma_initialize(char * cfgfile)

the system according to the layout specified in the YAML configuration file.

Parameters

char * cfgfile
a filepath to the YAML configuration file describing the layout of the xclbin(s) and the devices to which the xclbin(s) are to be deployed. If a NULL value is passed, the XMA will use a default name and location: /etc/xma/xma_def_sys_cfg.yaml. In all cases, a properly defined yaml configuration file must exist.

Description

This is the entry point routine for utilzing the XMA library and must be the first call within any application before calling any other XMA APIs. The YAML file is parsed and then verified for compatibility with the system hardware. If deemed compatible, each device specified in the YAML file will be programmed with the xclbin(s) specified in the YAML. A shared memory file will be created in /tmp which will store the contents of the YAML file * and serve as a resource database tracking allocation of kernels thus permitting multiple processes to share device resources. If the system has already been configured by a prior process, then a successful return from this routine will map the existing resource database file to the calling processes; XMA will NOT attempt to reprogram any of the system devices if any device is in-use based on the prior configuration. In effect, programming and and configuration of the system will only occur when this routine is first invoked. From the first invocation, so long as any running process is attached to and utilizing resources for an existing configuration, all subsequent invocations of this routine by any other process will be forced to use the existing configuration of the system; their configuration file argument will be ignored. When all currently running processes attached to a given resource file database have run to completion normally, the resource file will be deleted and a subsequent process invoking this routine will restart the parsing and programming of the system as would be true during initial invocation.

Return

XMA_SUCCESS after successfully initializing the system and/or (if not the first process to invoke) mapping in the currently active system configuration.

XMA_ERROR_INVALID if the YAML file is incompatible with the system hardware.

XMA_ERROR for all other errors.

xmaplugin

  1. Intro from xmaplugin.h
  2. xmaplugin.h autogenerated

XMA Plugin Interface

The interface used by XMA kernel plugin developers

XmaBufferHandle xma_plg_buffer_alloc(XmaHwSession s_handle, size_t size)

Allocate device memory This function allocates memory on the FPGA DDR and provides a handle to the memory that can be used for copying data from the host to device memory or from the device to the host. In addition, the handle can be passed to the function ref xma_plg_get_paddr() in order to obtain the physical address of the buffer. Obtaining the physical address is necessary for setting the AXI register map with physical pointers so that the kernel knows where key input and output buffers are located. This function knows which DDR bank is associated with this session and therefore automatically selects the correct DDR bank.

Parameters

XmaHwSession s_handle
The session handle associated with this plugin instance.
size_t size
Size in bytes of the device buffer to be allocated.

Return

Non-zero buffer handle on success

void xma_plg_buffer_free(XmaHwSession s_handle, XmaBufferHandle b_handle)

Free a device buffer This function frees a previous allocated buffer that was obtained using the ref xma_plg_buffer_alloc() function.

Parameters

XmaHwSession s_handle
The session handle associated with this plugin instance
XmaBufferHandle b_handle
The buffer handle returned from ref xma_plg_buffer_alloc()
uint64_t xma_plg_get_paddr(XmaHwSession s_handle, XmaBufferHandle b_handle)

Get a physical address for a buffer handle This function returns the physical address of DDR memory on the FPGA used by a specific session

Parameters

XmaHwSession s_handle
The session handle associated with this plugin instance
XmaBufferHandle b_handle
The buffer handle returned from ref xma_plg_buffer_alloc()

Return

Physical address of DDR on the FPGA

int32_t xma_plg_buffer_write(XmaHwSession s_handle, XmaBufferHandle b_handle, const void * src, size_t size, size_t offset)

Write data from host to device buffer This function copies data from host to memory to device memory.

Parameters

XmaHwSession s_handle
The session handle associated with this plugin instance
XmaBufferHandle b_handle
The buffer handle returned from ref xma_plg_buffer_alloc()
const void * src
Source data pointer
size_t size
Size of data to copy
size_t offset
Offset from the beginning of the allocated device memory

Return

XMA_SUCCESS on success XMA_ERROR on failure

int32_t xma_plg_buffer_read(XmaHwSession s_handle, XmaBufferHandle b_handle, void * dst, size_t size, size_t offset)

Read data from device memory and copy to host memory This function copies data from device memory and stores the result in the requested host memory

Parameters

XmaHwSession s_handle
The session handle associated with this plugin instance
XmaBufferHandle b_handle
The buffer handle returned from ref xma_plg_buffer_alloc()
void * dst
Destination data pointer
size_t size
Size of data to copy
size_t offset
Offset from the beginning of the allocated device memory

Return

XMA_SUCCESS on success XMA_ERROR on failure

int32_t xma_plg_schedule_work_item(XmaHwSession s_handle)

This function schedules a request to the XRT scheduler for execution of a kernel based on the saved state of the kernel registers supplied by the xma_plg_register_prep_write() function call. The prep_write() keeps a shadow register map so that the schedule_work_item() can gather all registers and push a new work item onto the scheduler queue. Work items are processed in FIFO order. After calling schedule_work_item() one or more times, the caller can invoke xma_plg_is_work_item_done() to wait for one item of work to complete.

Parameters

XmaHwSession s_handle
The session handle associated with this plugin instance

Return

XMA_SUCCESS on success

XMA_ERROR on failure

int32_t xma_plg_is_work_item_done(XmaHwSession s_handle, int32_t timeout_in_ms)

This function checks if at least one work item previously submitted via xma_plg_schedule_work_item() has completed. If the supplied timeout expires before a work item has completed, this function returns an error.

Parameters

XmaHwSession s_handle
The session handle associated with this plugin instance
int32_t timeout_in_ms
A timeout value in milliseconds

Return

XMA_SUCCESS on success

XMA_ERROR on timeout

int32_t xma_plg_register_prep_write(XmaHwSession s_handle, void * dst, size_t size, size_t offset)

This function writes the data provided and sets the specified AXI_Lite register(s) exposed by a kernel. The base offset of 0 is the beginning of the kernels AXI_Lite memory map as this function adds the required offsets internally for the kernel and PCIe.

Parameters

XmaHwSession s_handle
The session handle associated with this plugin instance
void * dst
Destination data pointer
size_t size
Size of data to copy
size_t offset
Offset from the beginning of the kernel AXI_Lite register register map

Return

>=0 number of bytes written

<0 on failure