Use Cases

This document outlines the operational use cases for the MDB5-DMA client driver, describing data transfer flows and architectural patterns. The MDB5-DMA Controller supports up to 8 read channels and 8 write channels per controller, with transfer sizes ranging from 1 byte to 4GB.

Host-to-Card (H2C)

Host-to-Card (H2C) transfers move data from the host system’s memory to the PCIe device’s memory. This is the primary method for sending configuration data, commands, or data payloads to the device.

Scatter-Gather (Linked-list) DMA

Scatter-Gather (Linked-list) DMA mode utilizes linked-list operation to handle fragmented buffers efficiently. The driver creates scatter-gather lists from user buffers and programs multiple descriptors as needed.

Scatter-Gather (Linked-list) DMA Data Flow

  • Application opens the H2C channel device node (e.g., /dev/mdb5_write00)

  • Application issues a write() system call with the user-space buffer

  • MDB5-DMA client driver receives the buffer and validates parameters

  • Driver pins user-space memory pages and creates Scatter-Gather (Linked-list) lists

  • DMA descriptors are prepared and submitted to the underlying dw-edma controller via the DMA Engine API

  • H2C MM Engine reads data from host memory locations and writes to device memory through PCIe interface

  • Upon transfer completion, DMA engine updates completion status and generates interrupt if required

  • Driver processes completion status and returns the number of bytes transferred to the application

Buffer Segmentation

When a user buffer is submitted for a scatter-gather transfer, the MDB5-DMA client driver breaks it down into a scatter-gather list for efficient DMA processing. The driver’s segmentation process follows these steps:

  • The driver pins user-space memory pages using get_user_pages_fast() to prevent the operating system from moving or swapping them during the transfer

  • This driver implementation uses PAGE_SIZE as the standard size for each scatter-gather entry (typically 4KB on x86 systems, though this varies by architecture)

  • While PAGE_SIZE alignment is not a hardware requirement and entry sizes can be configured differently if needed, this driver follows the PAGE_SIZE pattern

  • Entry sizes vary based on buffer alignment relative to page boundaries

  • The first entry may be partial if the buffer starts mid-page

  • Middle entries typically span complete pages when properly aligned

  • The final entry contains any remaining data that doesn’t fill a complete page

Example: A 10KB buffer starting at a 100-byte offset within a page would create:

  • Entry 0: 3,996 bytes (from offset 100 to end of first page)

  • Entry 1: 4,096 bytes (complete second page)

  • Entry 2: 1,908 bytes (remaining data on third page)

Configuration

Channels default to Scatter-Gather (Linked-list) Mode. The aperture size can be configured to optimize performance for specific transfer patterns using the control device interface.

struct ctrl_mode mode = {
    .name = "/dev/mdb5_write00",
    .mode = MDB5_MODE_SG
};
ioctl(ctrl_fd, IOCTL_MDB5_SET_TRANSFER_MODE, &mode);

Simple (Non Linked-list) Mode

In Simple (Non Linked-list) Mode, the MDB5-DMA driver operates in non-linked-list mode for direct single-buffer transfers. This mode is optimal for a single large chunk as it reduces the overhead of setting up Link List pointer (LLP).

The channel must be configured for Simple (Non Linked-list) Mode using the control device interface before performing transfers.

struct ctrl_mode mode = {
    .name = "/dev/mdb5_write00",
    .mode = MDB5_MODE_SIMPLE
};
ioctl(ctrl_fd, IOCTL_MDB5_SET_TRANSFER_MODE, &mode);

Simple (Non Linked-list) Mode data flow

  • Application opens the H2C channel device node

  • Channel is configured for Simple (Non Linked-list) Mode operation

  • Application issues a write() system call with a contiguous buffer

  • Driver maps the buffer directly without Scatter-Gather (Linked-list) list creation

  • Single DMA descriptor is prepared and submitted

  • H2C engine performs direct transfer from host to device memory

  • Completion notification is provided upon transfer completion

ASYNC IO

Asynchronous H2C operations enable non-blocking transfers using vectored I/O interfaces.

Async Data Flow

  • AIO context is established using io_setup()

  • I/O Control Blocks (iocb) are prepared for each transfer operation using io_prep_pwrite

  • Transfers are submitted using io_submit(), returning immediately to allow continued application execution

  • Driver and DMA hardware process transfers asynchronously in the background

  • Application uses io_getevents() to check for completed operations and retrieve results

Card-to-Host (C2H)

Card-to-Host (C2H) transfers move data from the PCIe device’s memory to the host system’s memory. This is essential for reading results, status information, or streaming data from the device.

Scatter-Gather (Linked-list) DMA

Scatter-Gather (Linked-list) DMA C2H mode efficiently handles large or fragmented read operations using linked-list descriptors. The driver manages Scatter-Gather (Linked-list) list creation and DMA mapping automatically.

Buffer Segmentation

When a user provides a destination buffer for a scatter-gather read transfer, the MDB5-DMA client driver segments it into a scatter-gather list for efficient DMA processing. The driver’s segmentation process for receiving data follows these steps:

  • The driver pins the destination buffer’s memory pages using get_user_pages_fast() to prevent the operating system from moving or swapping them during the transfer

  • This driver implementation uses PAGE_SIZE as the standard size for each scatter-gather entry (typically 4KB on x86 systems, though this varies by architecture)

  • While PAGE_SIZE alignment is not a hardware requirement and entry sizes can be configured differently if needed, this driver follows the PAGE_SIZE pattern for optimal memory management

  • Entry sizes vary based on buffer alignment relative to page boundaries

  • The first entry may be partial if the destination buffer starts mid-page

  • Middle entries typically span complete pages when properly aligned

  • The final entry contains space for any remaining data that doesn’t fill a complete page

Example: A 10KB destination buffer starting at a 100-byte offset within a page would create:

  • Entry 0: 3,996 bytes (from offset 100 to end of first page)

  • Entry 1: 4,096 bytes (complete second page)

  • Entry 2: 1,908 bytes (remaining space on third page)

Scatter-Gather (Linked-list) DMA Data Flow

The complete flow between the host components and hardware components follows this sequence:

  • Application opens the C2H channel device node (e.g., /dev/mdb5_read00)

  • Application issues a read() system call with a destination buffer

  • MDB5-DMA client driver validates the read request and buffer parameters

  • Driver pins user buffer pages and creates Scatter-Gather (Linked-list) lists

  • C2H DMA descriptors are prepared and submitted to the DMA engine

  • C2H MM Engine reads data from device memory and writes to host memory locations

  • Upon transfer completion, DMA engine updates completion status and generates interrupt if required

  • Driver processes completion status, unmaps pages, and returns the number of bytes transferred

Simple (Non Linked-list) Mode

Simple (Non Linked-list) Mode C2H operations use non-linked-list mode for direct buffer transfers. This mode is optimal for a single large chunk as it reduces the overhead of setting up Link List pointer (LLP).

Similar to H2C operations, C2H channels must be explicitly configured for Simple (Non Linked-list) Mode.

struct ctrl_mode mode = {
    .name = "/dev/mdb5_read00", 
    .mode = MDB5_MODE_SIMPLE
};
ioctl(ctrl_fd, IOCTL_MDB5_SET_TRANSFER_MODE, &mode);

Simple (Non Linked-list) Mode data flow

  • Application opens the C2H channel device node

  • Channel is configured for Simple (Non Linked-list) Mode operation

  • Application issues a read() system call with a contiguous buffer

  • Driver maps the destination buffer directly

  • Single DMA descriptor is prepared for the transfer

  • C2H engine reads from device memory and writes to host buffer

  • Completion status is returned with the number of bytes transferred

ASYNC IO

Asynchronous C2H operations enable non-blocking reads, particularly useful for streaming data applications where continuous data flow from the device is required without blocking the application.

Async Data Flow

  • Application opens the C2H channel with direct I/O flags

  • AIO context is created for managing asynchronous operations

  • Read operations are prepared using io_prep_pread and submitted via io_submit()

  • C2H engine performs transfers while application continues execution

  • Application receives completion events via io_getevents() when data is available