Use Cases

This document outlines the operational use cases for the MDB5-DMA client driver, describing data transfer flows and architectural patterns. The MDB5-DMA Controller supports up to 8 read channels and 8 write channels per controller, with transfer sizes ranging from 1 byte to 4GB.

Host-to-Card (H2C)

Host-to-Card (H2C) transfers move data from the host system’s memory to the PCIe device’s memory. This is the primary method for sending configuration data, commands, or data payloads to the device.

Scatter-Gather (Linked-list) DMA

Scatter-Gather (Linked-list) DMA mode utilizes linked-list operation to handle fragmented buffers efficiently. The driver creates scatter-gather lists from user buffers and programs multiple descriptors as needed.

Scatter-Gather (Linked-list) DMA Data Flow

Application opens the H2C channel device node (e.g., /dev/mdb5_write00)
Application issues a write() system call with the user-space buffer
MDB5-DMA client driver receives the buffer and validates parameters
Driver pins user-space memory pages and creates Scatter-Gather (Linked-list) lists
DMA descriptors are prepared and submitted to the underlying dw-edma controller via the DMA Engine API
H2C MM Engine reads data from host memory locations and writes to device memory through PCIe interface
Upon transfer completion, DMA engine updates completion status and generates interrupt if required
Driver processes completion status and returns the number of bytes transferred to the application

Buffer Segmentation

When a user buffer is submitted for a scatter-gather transfer, the MDB5-DMA client driver breaks it down into a scatter-gather list for efficient DMA processing. The driver’s segmentation process follows these steps:

The driver pins user-space memory pages using get_user_pages_fast() to prevent the operating system from moving or swapping them during the transfer
This driver implementation uses PAGE_SIZE as the standard size for each scatter-gather entry (typically 4KB on x86 systems, though this varies by architecture)
While PAGE_SIZE alignment is not a hardware requirement and entry sizes can be configured differently if needed, this driver follows the PAGE_SIZE pattern
Entry sizes vary based on buffer alignment relative to page boundaries
The first entry may be partial if the buffer starts mid-page
Middle entries typically span complete pages when properly aligned
The final entry contains any remaining data that doesn’t fill a complete page

Example: A 10KB buffer starting at a 100-byte offset within a page would create:

Entry 0: 3,996 bytes (from offset 100 to end of first page)
Entry 1: 4,096 bytes (complete second page)
Entry 2: 1,908 bytes (remaining data on third page)

Configuration

Channels default to Scatter-Gather (Linked-list) Mode. The aperture size can be configured to optimize performance for specific transfer patterns using the control device interface.

struct ctrl_mode mode = {
    .name = "/dev/mdb5_write00",
    .mode = MDB5_MODE_SG
};
ioctl(ctrl_fd, IOCTL_MDB5_SET_TRANSFER_MODE, &mode);

Simple (Non Linked-list) Mode

In Simple (Non Linked-list) Mode, the MDB5-DMA driver operates in non-linked-list mode for direct single-buffer transfers. This mode is optimal for a single large chunk as it reduces the overhead of setting up Link List pointer (LLP).

The channel must be configured for Simple (Non Linked-list) Mode using the control device interface before performing transfers.

struct ctrl_mode mode = {
    .name = "/dev/mdb5_write00",
    .mode = MDB5_MODE_SIMPLE
};
ioctl(ctrl_fd, IOCTL_MDB5_SET_TRANSFER_MODE, &mode);

Simple (Non Linked-list) Mode data flow

Application opens the H2C channel device node
Channel is configured for Simple (Non Linked-list) Mode operation
Application issues a write() system call with a contiguous buffer
Driver maps the buffer directly without Scatter-Gather (Linked-list) list creation
Single DMA descriptor is prepared and submitted
H2C engine performs direct transfer from host to device memory
Completion notification is provided upon transfer completion

ASYNC IO

Asynchronous H2C operations enable non-blocking transfers using vectored I/O interfaces.

Async Data Flow

AIO context is established using io_setup()
I/O Control Blocks (iocb) are prepared for each transfer operation using io_prep_pwrite
Transfers are submitted using io_submit(), returning immediately to allow continued application execution
Driver and DMA hardware process transfers asynchronously in the background
Application uses io_getevents() to check for completed operations and retrieve results

Card-to-Host (C2H)

Card-to-Host (C2H) transfers move data from the PCIe device’s memory to the host system’s memory. This is essential for reading results, status information, or streaming data from the device.

Scatter-Gather (Linked-list) DMA

Scatter-Gather (Linked-list) DMA C2H mode efficiently handles large or fragmented read operations using linked-list descriptors. The driver manages Scatter-Gather (Linked-list) list creation and DMA mapping automatically.

Buffer Segmentation

When a user provides a destination buffer for a scatter-gather read transfer, the MDB5-DMA client driver segments it into a scatter-gather list for efficient DMA processing. The driver’s segmentation process for receiving data follows these steps:

The driver pins the destination buffer’s memory pages using get_user_pages_fast() to prevent the operating system from moving or swapping them during the transfer
This driver implementation uses PAGE_SIZE as the standard size for each scatter-gather entry (typically 4KB on x86 systems, though this varies by architecture)
While PAGE_SIZE alignment is not a hardware requirement and entry sizes can be configured differently if needed, this driver follows the PAGE_SIZE pattern for optimal memory management
Entry sizes vary based on buffer alignment relative to page boundaries
The first entry may be partial if the destination buffer starts mid-page
Middle entries typically span complete pages when properly aligned
The final entry contains space for any remaining data that doesn’t fill a complete page

Example: A 10KB destination buffer starting at a 100-byte offset within a page would create:

Entry 0: 3,996 bytes (from offset 100 to end of first page)
Entry 1: 4,096 bytes (complete second page)
Entry 2: 1,908 bytes (remaining space on third page)

Scatter-Gather (Linked-list) DMA Data Flow

The complete flow between the host components and hardware components follows this sequence:

Application opens the C2H channel device node (e.g., /dev/mdb5_read00)
Application issues a read() system call with a destination buffer
MDB5-DMA client driver validates the read request and buffer parameters
Driver pins user buffer pages and creates Scatter-Gather (Linked-list) lists
C2H DMA descriptors are prepared and submitted to the DMA engine
C2H MM Engine reads data from device memory and writes to host memory locations
Upon transfer completion, DMA engine updates completion status and generates interrupt if required
Driver processes completion status, unmaps pages, and returns the number of bytes transferred

Simple (Non Linked-list) Mode

Simple (Non Linked-list) Mode C2H operations use non-linked-list mode for direct buffer transfers. This mode is optimal for a single large chunk as it reduces the overhead of setting up Link List pointer (LLP).

Similar to H2C operations, C2H channels must be explicitly configured for Simple (Non Linked-list) Mode.

struct ctrl_mode mode = {
    .name = "/dev/mdb5_read00", 
    .mode = MDB5_MODE_SIMPLE
};
ioctl(ctrl_fd, IOCTL_MDB5_SET_TRANSFER_MODE, &mode);

Simple (Non Linked-list) Mode data flow

Application opens the C2H channel device node
Channel is configured for Simple (Non Linked-list) Mode operation
Application issues a read() system call with a contiguous buffer
Driver maps the destination buffer directly
Single DMA descriptor is prepared for the transfer
C2H engine reads from device memory and writes to host buffer
Completion status is returned with the number of bytes transferred

ASYNC IO

Asynchronous C2H operations enable non-blocking reads, particularly useful for streaming data applications where continuous data flow from the device is required without blocking the application.

Async Data Flow

Application opens the C2H channel with direct I/O flags
AIO context is created for managing asynchronous operations
Read operations are prepared using io_prep_pread and submitted via io_submit()
C2H engine performs transfers while application continues execution
Application receives completion events via io_getevents() when data is available