QDMA Windows Driver UseCases

QDMA IP is released with five example designs in the Vivado® Design Suite. They are

  1. AXI4 Memory Mapped And AXI-Stream with Completion

  2. AXI Memory Mapped

  3. AXI Memory Mapped with Completion

  4. AXI Stream with Completion

  5. AXI Stream Loopback

  6. Descriptor Bypass In/Out Loopback

Note: AXI Memory Mapped with Completion is not supported.

Refer to QDMA_Product_Guide for more details on these example designs.

The driver functionality remains same for all the example designs. For Descriptor Bypass In/Out Loopback example design, application has to enable the bypass mode in driver.

All the flows described below are with respect to QDMA internal mode of operation. Also, the changes required in driver for bypass mode of operation are specified.

AXI4 Memory Mapped And AXI-Stream with Completion

This is the default example design used to test the MM and ST functionality using QDMA driver. This example design provides blocks to interface with the AXI4 Memory Mapped and AXI4-Stream interfaces. This example design covers most of the functionality provided by QDMA.

Refer to QDMA Product Guide for more details on the example design and its registers.

Below sections describes C2H and H2C data flow for ST and MM mode required in all the example designs.

MM H2C(Host-to-Card)

This Example Design provides BRAM with AXI-MM interface to achieve the MM H2C functionality. The driver comes with dma-ctl and dma-rw applications which helps to realize the MM H2C functionality. QDMA driver takes care of HW updates.

The complete flow between the Host components and HW components is depicted in below sequence.

  • Application needs to configure the queue to start in MM mode

  • dma-rw command takes a file containing the contents to be transmitted to FPGA memory as an input

  • Pass the buffer to be transferred as an input to dma-rw application which in turn passes it to the QDMA Driver

  • QDMA driver programs the descriptors with buffer base address and updates the H2C ring PIDX

  • Upon H2C ring PIDX update, DMA engine fetches the descriptors and passes them to H2C MM Engine for processing

  • H2C MM Engine reads the buffer contents from the Host and writes to the BRAM

  • Upon transfer completion, DMA Engine updates the CIDX in H2C ring completion status and generates interrupt if required. In poll mode, QDMA driver polls on CIDX update.

  • QDMA driver processes the completion status and sends the response back to the application

For MM (H2C and C2H) bypass mode, application needs to enable the bypass mode on the required queues. Application can enable C2H/H2C bypass mode while starting the queue using dma-ctl application by specifying desc_bypass_en flag .

The MM descriptor format used by the example design is defined in QDMA Driver code base at sys/libqdma/include/qdma_reg_ext.h

struct mm_descriptor {
    UINT64 addr;

    UINT64 length : 28;
    UINT64 valid : 1;
    UINT64 sop : 1;
    UINT64 eop : 1;
    UINT64 reserved_0 : 33;

    UINT64 dest_addr;

    UINT64 reserved_1;
};
static_assert(sizeof(mm_descriptor) == (32 * sizeof(UINT8)), "mm_descriptor must be 32 bytes wide!");

Update this structure if any changes required in the descriptor format for bypass mode. Accordingly, update the data flow functionality in qdma_enqueue_mm_request() functions in sys/libqdma/source/qdma.cpp.

MM C2H(Card-to-Host)

This Example Design provides BRAM with AXI-MM interface to achieve the MM C2H functionality. The current driver with dma-ctl tool and dma-rw application helps achieve the MM C2H functionality and QDMA driver takes care of HW updates.

The complete flow between the Host components and HW components is depicted in below sequence.

  • User needs to start the queue in MM mode

  • Pass the buffer to copy the data as an input to dma-rw application which inturn passes it to the QDMA Driver

  • QDMA driver programs the required descriptors based on the length of the required buffer and programs the descriptors with buffer base address and updates the C2H ring PIDX

  • Upon C2H ring PIDX update, DMA engine fetches the descriptors and passes them to C2H MM Engine for processing

  • C2H MM Engine reads the BRAM contents writes to the Host buffers

  • Upon transfer completion, DMA Engine updates the PIDX in C2H ring completion status and generates interrupt if required. In poll mode, QDMA driver polls on PIDX update.

  • QDMA driver processes the completion status and sends the response back to the application with the data received.

ST H2C(Host-to-Card)

In ST H2C, data is moved from Host to Device through H2C stream engine.The H2C stream engine moves data from the Host to the H2C Stream interface. The engine is responsible for breaking up DMA reads to MRRS size, guaranteeing the space for completions, and also makes sure completions are reordered to ensure H2C stream data is delivered to user logic in-order.The engine has sufficient buffering for up to 256 DMA reads and up to 32 KB of data. DMA fetches the data and aligns to the first byte to transfer on the AXI4 interface side. This allows every descriptor to have random offset and random length. The total length of all descriptors put to gather must be less than 64 KB.

There is no dependency on user logic for this use case.

The complete flow between the Host components and HW components is depicted in below sequence.

  • User needs to start the queue in ST mode

  • Pass the buffer to be transferred as an input to dma-rw application which in turn passes it to the QDMA Driver

  • QDMA driver programs the descriptors with buffer base address and updates the H2C ring PIDX

  • Upon H2C ring PIDX update, DMA engine fetches the descriptors and passes them to H2C Stream Engine for processing

  • H2C Stream Engine reads the buffer contents from the Host buffers the data

  • Upon transfer completion, DMA Engine updates the CIDX in H2C ring completion status and generates interrupt if required. In poll mode, QDMA driver polls on CIDX update.

  • QDMA driver processes the completion status and sends the response back to the application

For ST H2C bypass mode, application needs to enable the bypass mode on the required queues. Application can enable H2C bypass mode while starting the queue using dma-ctl application by specifying desc_bypass_en flag.

The ST H2C descriptor format used by the example design is defined in QDMA Driver code base at sys/libqdma/include/qdma_reg_ext.h

struct h2c_descriptor {
    UINT16 cdh_flags;     /**< Dont care bits */
    UINT16 pld_len;       /**< Packet length in bytes */
    UINT16 length;
    UINT16 sop : 1;
    UINT16 eop : 1;
    UINT16 reserved : 14;
    UINT64 addr;
};
static_assert(sizeof(h2c_descriptor) == (16 * sizeof(UINT8)), "h2c_descriptor must be 16 bytes wide!");

Update this structure if any changes required in the descriptor format for bypass mode. Accordingly, update the data flow functionality in qdma_enqueue_st_tx_request() functions in sys/libqdma/source/qdma.cpp.

ST C2H(Card-to-Host)

In ST C2H, data is moved from DMA Device to Host through C2H Stream Engine.

The C2H streaming engine is responsible for receiving data from the user logic and writing to the Host memory address provided by the C2H descriptor for a given Queue. The C2H Stream Engine, DMA writes the stream packets to the Host memory into the descriptors provided by the Host QDMA driver through the C2H descriptor queue.

The C2H engine has two major blocks to accomplish C2H streaming DMA,

  • Descriptor Prefetch Cache (PFCH)

  • C2H-ST DMA Write Engine

QDMA Driver needs to program the prefetch context along with the per queue context to achieve the ST C2H functionality.

The Prefetch Engine is responsible for calculating the number of descriptors needed for the DMA that is writing the packet. The buffer size is fixed per queue basis. For internal and cached bypass mode, the prefetch module can fetch up to 512 descriptors for a maximum of 64 different queues at any given time.

The Completion Engine is used to write to the Completion queues. When used with a DMA engine, the completion is used by the driver to determine how many bytes of data were transferred with every packet. This allows the driver to reclaim the descriptors.

PFCH cache has three main modes, on a per queue basis, called

  • Simple Bypass Mode

  • Internal Cache Mode

  • Cached Bypass Mode

While starting the queue in ST C2H mode using dma-ctl tool, user has the configuration options to configure the queue in any of these 3 modes.

The current ST C2H functionality implemented in QDMA driver is tightly coupled with the Example Design. Though the completion entry descriptor as per PG is fully configurable, this Example Design mandates to have the the color, error and desc_used bits in the first nibble. The completion entry format is defined in QDMA Driver code base sys/libqdma/include/qdma_reg_ext.h

struct c2h_wb_header_8B {
    /** @data_frmt : 0 indicates valid length field is present */
    UINT64 data_frmt      : 1;
    /** @color : Indicates the validity of the entry */
    UINT64 color          : 1;
    /** @desc_error : Indicates the error status */
    UINT64 desc_error     : 1;
    /** @desc_used : Indicates whether data descriptor used */
    UINT64 desc_used      : 1;
    /** @length : Length of the completion entry */
    UINT64 length         : 16;
    /** @user_rsv : Reserved */
    UINT64 user_rsv       : 4;
    /** @user_defined_0 : User Defined Data (UDD) */
    UINT64 user_defined_0 : 40;
};
static_assert(sizeof(c2h_wb_header_8B) == (8 * sizeof(UINT8)), "c2h_wb_header_8B must be 8 bytes wide!");

struct c2h_wb_header_16B : c2h_wb_header_8B {
    /** @user_defined_1 : User Defined Data (UDD) for 16B completion size */
    UINT64 user_defined_1;
};
static_assert(sizeof(c2h_wb_header_16B) == (16 * sizeof(UINT8)), "c2h_wb_header_16B must be 16 bytes wide!");

struct c2h_wb_header_32B : c2h_wb_header_16B {
    /** @user_defined_2 : User Defined Data (UDD) for 32B completion size */
    UINT64 user_defined_2[2];
};
static_assert(sizeof(c2h_wb_header_32B) == (32 * sizeof(UINT8)), "c2h_wb_header_32B must be 32 bytes wide!");

struct c2h_wb_header_64B : c2h_wb_header_32B {
    /** @user_defined_3 : User Defined Data (UDD) for 64B completion size */
    UINT64 user_defined_3[4];
};
static_assert(sizeof(c2h_wb_header_64B) == (64 * sizeof(UINT8)), "c2h_wb_header_64B must be 64 bytes wide!");

Completion entry is processed in st_service_c2h_queue() function which is part of sys/libqdma/source/qdma.cpp. If a different example design is opted, the QDMA driver code shall be updated to suit to the new example design in above mentioned files.

The ST C2H descriptor format described above shall be changed as per example design requirements.

Update this structure if any changes required in the descriptor format for bypass mode.

Accordingly, update the data flow functionality in st_service_c2h_queue() function in sys/libqdma/source/qdma.cpp.