QDMA DPDK Driver UseCases

QDMA IP is released with five example designs in the Vivado® Design Suite. They are

  1. AXI4 Memory Mapped And AXI-Stream with Completion
  2. AXI Memory Mapped
  3. AXI Stream with Completion
  4. AXI Stream Loopback
  5. Descriptor Bypass In/Out Loopback

Refer to QDMA_Product_Guide for more details on these example designs.

The driver functionality remains same for all the example designs. For Descriptor Bypass In/Out Loopback example design, application has to enable the bypass mode in driver.

All the flows described below are with respect to QDMA internal mode of operation. Also, the changes required in driver for bypass mode of operation are specified.

AXI4 Memory Mapped And AXI-Stream with Completion

This is the default example design used to test the MM and ST functionality using QDMA driver. This example design provides blocks to interface with the AXI4 Memory Mapped and AXI4-Stream interfaces. This example design covers most of the functionality provided by QDMA.

Refer to QDMA Product Guide for more details on the example design and its registers.

Below sections describes C2H and H2C data flow for ST and MM mode required in all the example designs.

MM H2C(Host-to-Card)

The Example Design provides BRAM with AXI-MM interface to achieve the MM H2C functionality. The CLI command dma_to_device supported by the DPDK software test application helps verify the MM H2C functionality. QDMA driver takes care of HW configuration and data processing.

The complete flow between the Host SW components and HW components is depicted in below sequence diagram.

  • Application needs to configure the queue in MM mode and H2C (Tx) direction
  • dma_to_device command takes a file containing the contents to be transmitted to FPGA memory as input. Application divides the file contents between the number of queues provided and calls rte_eth_tx_burst() API on each configured queue for data transmission.
  • Application sets the memory offset to write to in FPGA using rte_pmd_qdma_set_mm_endpoint_addr API
  • QDMA driver frees previous transmitted mbufs pending in the queue and checks for H2C ring capacity by finding the difference between PIDX and CIDX
  • QDMA driver programs the descriptors with buffer base address and length to be transmitted
  • QDMA driver updates the H2C ring PIDX and polls the status descriptor for CIDX to be same as PIDX
  • Upon H2C ring PIDX update, DMA engine fetches the descriptors and passes them to H2C MM Engine for processing
  • H2C MM Engine reads the buffer contents from the Host and writes to the BRAM at the given memory offset
  • Upon transfer completion, DMA Engine updates the CIDX in H2C ring completion status descriptor, indicating to SW that the transmission of corresponding descriptors is completed.
  • QDMA driver processes the completion status and sends the response back to the application
_images/MM_H2C_Flow.PNG

For MM (H2C and C2H) bypass mode, application needs to enable the bypass mode on the required queues. Application can enable C2H/H2C bypass using device arguments (c2h_byp_mode and h2c_byp_mode) during application launch or by invoking rte_pmd_qdma_configure_rx_bypass() / rte_pmd_qdma_configure_tx_bypass() APIs from application on the required queues.

The MM descriptor format used by the example design is defined in QDMA Driver code base at drivers/net/qdma/qdma_user.h

struct __attribute__ ((packed)) qdma_ul_mm_desc
{
        volatile uint64_t       src_addr;
        volatile uint64_t       len:28;
        volatile uint64_t       dv:1;
        volatile uint64_t       sop:1;
        volatile uint64_t       eop:1;
        volatile uint64_t       rsvd:33;
        volatile uint64_t       dst_addr;
        volatile uint64_t       rsvd2;

};

Update this structure if any changes required in the descriptor format for bypass mode. Accordingly, update the descriptor processing functions qdma_ul_update_mm_h2c_desc() / qdma_ul_update_mm_c2h_desc() in qdma_user.c file. Refer the data flow functionality in qdma_xmit_pkts_mm() / qdma_recv_pkts_mm() functions in drivers/net/qdma/qdma_rxtx.c.

MM C2H(Card-to-Host)

The Example Design provides BRAM with AXI-MM interface to achieve the MM C2H functionality. The CLI command dma_from_device supported by the DPDK software test application helps verify the MM C2H functionality. QDMA driver takes care of HW configuration and data processing.

The complete flow between the Host SW components and HW components is depicted in below sequence diagram.

  • Application needs to configure the queue in MM mode and C2H (Rx) direction
  • Application sets the memory offset to read from in FPGA using rte_pmd_qdma_set_mm_endpoint_addr API
  • Application distributes total requested data across the given queues and calls the rte_eth_rx_burst() API on each queue
  • QDMA driver programs the required descriptors with buffer base address and length based on the number of packets requested and the length of packet buffer (mbuf size)
  • QDMA driver updates the C2H ring PIDX and polls the status descriptor for CIDX to be same as PIDX
  • Upon C2H ring PIDX update, DMA engine fetches the descriptors and passes them to C2H MM Engine for processing
  • C2H MM Engine reads the BRAM contents and writes to the Host buffers
  • Upon transfer completion, DMA Engine updates the CIDX in C2H ring completion status descriptor, indicating to SW that the corresponding descriptors are available for consumption.
  • QDMA driver processes the completion status descriptor and sends the response back to the application with the data received
_images/MM_C2H_Flow.PNG

ST H2C(Host-to-Card)

In ST H2C, data is moved from Host to Device through H2C stream engine.The H2C stream engine moves data from the Host to the H2C Stream interface. The engine is responsible for breaking up DMA reads to MRRS size, guaranteeing the space for completions, and also makes sure completions are reordered to ensure H2C stream data is delivered to user logic in-order.The engine has sufficient buffering for up to 256 DMA reads and up to 32 KB of data. DMA fetches the data and aligns to the first byte to transfer on the AXI4 interface side. This allows every descriptor to have random offset and random length. The total length of all descriptors put to gather must be less than 64 KB.

There is no dependency on user logic for this use case. The CLI command dma_to_device supported by the DPDK software test application helps verify the ST H2C functionality.

The complete flow between the Host SW components and HW components is depicted in below sequence diagram.

  • User needs to configure the queue in ST mode (default) and H2C (Tx) direction
  • dma_to_device command takes a file containing the contents to be transmitted to FPGA as input. Application divides the file contents between the number of queues provided and calls rte_eth_tx_burst() API on each configured queue for data transmission.
  • QDMA driver frees previous transmitted mbufs pending in the queue and checks for H2C ring capacity by finding the difference between PIDX and CIDX.
  • QDMA driver programs the descriptors with buffer base address and length to be transmitted and updates the H2C ring PIDX
  • QDMA driver returns to the application the number packets it was able to write in the descriptor ring without waiting for acknowledgement from HW.
  • Upon H2C ring PIDX update, DMA engine fetches the descriptors and passes them to H2C MM Engine for processing
  • H2C MM Engine reads the buffer contents from the Host and writes to the BRAM at the given memory offset
  • Upon transfer completion, DMA Engine updates the CIDX in H2C ring completion status, indicating to SW that the transmission of corresponding descriptors is completed.
_images/ST_H2C_Flow.PNG

For ST H2C bypass mode, application needs to enable the bypass mode on the required queues. Application can enable H2C bypass using device argument (h2c_byp_mode) during application launch or by invoking rte_pmd_qdma_configure_tx_bypass() API from application on the required queues.

The ST H2C descriptor format used by the example design is defined in QDMA Driver code base at drivers/net/qdma/qdma_user.h

struct __attribute__ ((packed)) qdma_ul_st_h2c_desc
{
        volatile uint16_t       cdh_flags;
        volatile uint16_t       pld_len;
        volatile uint16_t       len;
        volatile uint16_t       flags;
        volatile uint64_t       src_addr;
};

Update this structure if any changes required in the descriptor format for bypass mode. Accordingly, update the ST H2C descriptor processing function qdma_ul_update_st_h2c_desc() in qdma_user.c file. Refer the ST H2C data flow functionality in qdma_xmit_pkts_st() function in drivers/net/qdma/qdma_rxtx.c.

ST C2H(Card-to-Host)

In ST C2H, data is moved from DMA Device to Host through C2H Stream Engine.

The C2H streaming engine is responsible for receiving data from the user logic and writing to the Host memory address provided by the C2H descriptor for a given queue. The C2H Stream Engine DMA writes the stream packets to the host memory into the descriptors provided by the host QDMA driver through the C2H descriptor queue.

The C2H engine has two major blocks to accomplish C2H streaming DMA,

  • Descriptor Prefetch Engine (PFCH)
  • C2H-ST DMA Write Engine

QDMA Driver needs to program the prefetch context along with the queue software context to achieve the ST C2H functionality.

The Prefetch Engine is responsible for calculating the number of descriptors needed for the DMA that is writing the packet. The buffer size is fixed per queue basis. For internal and cached bypass mode, the prefetch module can fetch up to 512 descriptors for a maximum of 64 different queues at any given time.

The Completion Engine is used to write to the Completion queues. Though the completion queue is independent of C2H queue, the example design binds it with C2H Stream engine so that when used with a DMA engine, the completion is used by the driver to determine how many bytes of data were transferred with every packet. This allows the driver to reclaim the descriptors.

The complete flow between the Host SW components and HW components is depicted in below sequence diagram.

  • Application needs to configure the queue in ST mode (default) and C2H (Rx) direction
  • Application calls the rte_eth_rx_burst() API on each queue with number of packets to receive as input
  • Application programs user logic registers to generate the required packets on a given queue, before calling rte_eth_rx_burst() API on that queue
  • QDMA driver processes the completion queue to determine the packet length of each received packet and updates the completion queue CIDX
  • QDMA driver retrieves the packets from the C2H ring based on the number of descriptors consumed per packet
  • QDMA driver populates the C2H ring descriptors with new packet buffer addresses
  • QDMA driver updates the C2H ring PIDX for HW to start using the new descriptors
  • QDMA driver returns the packets retrieved to the application
_images/ST_C2H_Flow.PNG

The Streaming C2H functionality implemented in QDMA driver is tightly coupled with the Example Design because the completion entry descriptor is defined by the user logic in the example design.

The completion entry format used by the example design is defined in QDMA Driver code base at drivers/net/qdma/qdma_user.h

struct __attribute__ ((packed)) qdma_ul_st_cmpt_ring
{
        /* For 2018.2 IP, this field determines the
         * Standard or User format of completion entry
         */
        volatile uint32_t       data_frmt:1;

        /* This field inverts every time
         * PIDX wraps the completion ring
         */
        volatile uint32_t       color:1;

        /* Indicates that C2H engine
         * encountered a descriptor error
         */
        volatile uint32_t       err:1;

        /* Indicates that the completion
         * packet consumes descriptor in C2H ring
         */
        volatile uint32_t       desc_used:1;

        /* Indicates length of the data packet */
        volatile uint32_t       length:16;

        /* Reserved field */
        volatile uint32_t       user_rsv:4;

        /* User logic defined data of
         * length based on CMPT entry length
         */
        volatile uint8_t        user_def[];
};

Completion entry is processed in qdma_ul_extract_st_cmpt_info() function in drivers/net/qdma/qdma_user.c. If a different example design is used, the QDMA driver code in drivers/net/qdma/qdma_user.h and drivers/net/qdma/qdma_user.c must be updated as per the descriptor format defined by the new example design. Refer the ST C2H data flow functionality in qdma_recv_pkts_st() function in drivers/net/qdma/qdma_rxtx.c.

For ST C2H bypass mode, application needs to enable the bypass mode on the required queues. Application can enable C2H bypass using device argument (c2h_byp_mode) during application launch or by invoking rte_pmd_qdma_configure_rx_bypass() API from application on the required queues.

The ST C2H descriptor format used by the example design is defined in QDMA Driver code base at drivers/net/qdma/qdma.h

struct __attribute__ ((packed)) qdma_c2h_desc
{
        volatile uint64_t       dst_addr;
};

Update this structure if any changes required in the descriptor format for bypass mode. Accordingly, update the data flow functionality in qdma_recv_pkts_st() function in drivers/net/qdma/qdma_rxtx.c.