AMR - Host to Card Communication¶
Overview of Physical Function 0¶
Applications deriving from AMR are inherently PCIe® centric. This section gives an overview of how the AMR hardware design has been configured to facilitate management of the AMR design from the PCIe host via the AMC firmware and companion AMR PCIe driver and utility application.
The Host to RPU Communication section describes the general principles of message exchange between the host OS and the AMC firmware implementing the AMR management interface for each card instance.
The Host to User Application Communication section suggests ways AMR can be extended to allow application data to be exchanged between application stack running on the PCIe host and the Alveo card.
Application developers will naturally alter the AMR base design to optimize application data flow to suit their specific application’s requirements and hence this page offers only high level guidance for user data transport to the Alveo card.
Hardware Discovery¶
On boot, the host system enumerates the set of PCIe cards in the system and attaches the reference AMR host OS drivers to instances of the Alveo cards it finds running AMR base hardware designs. A full PCIe-based management interface is not available until the AMR host OS driver has attached to the card and acquired a small amount of metadata from the card about its hardware design content. Once the AMR PCIe host driver is attached, the AMR management interface is available to allow the current design content of the AMR cards to be inspected, controlled, and updated.
PCIe BAR Layout¶
The AMR hardware base design declares a single memory aperture for PF0. This memory aperture enables AMI communication to the AMC firmware running on the RPU. Only a subset of this memory will be used in the AMR base design. However, as more features of the processing subsystem are enabled, additional message passing interfaces from the host to, for example, the User OS running in the AMD Versal™ APU, may make use of this memory region.
Host to RPU Communication¶
The AMI (host OS) and AMC (embedded firmware) components of AMR’s software stack communicate via a region of shared memory using a software protocol called the Generic Communication Queue (GCQ). The GCQ is implemented in software on both the host and firmware sides, utilizing a shared memory region accessible from both the PCIe host and the RPU.
Commands issued to the AMR design via the AMI utility on the PCIe host are transformed into a series of agreed message types that are pushed into the circular buffer in shared memory. The AMC firmware running in the RPU will be alerted to the availability of these messages and (when its control loop allows) begin to service those messages.
The GCQ software implementation uses a pair of circular buffers to act as submission and completion queues:
Submission Queue: The host writes commands to this queue for the RPU to process
Completion Queue: The RPU writes responses and status information back to the host
The shared memory region is organized as follows:
GCQ Base Address: 0x10000000 (mapped to PCIe BAR space)
Ring Buffer Base: 0x10001000
Ring Buffer Size: 4KB (0x1000)
The GCQ implementation also uses mailbox/interrupt mechanisms to avoid inefficient polling. In the base design, the AMC firmware makes use of interrupts to provide prompt servicing of requests from the AMI software on the host OS. AMR does not have strict PCIe interrupt architecture requirements to the host OS as it is anticipated that the application interrupt architecture will take precedence over these lower level (and relatively infrequent) command interactions over the management interface.
Host to User Application Communication¶
Extending the AMR base design to support additional layers of communication, specifically for the user application, will naturally follow the requirements of the specific application being implemented. Some applications may require several PCIe memory apertures to be accessed, others may be satisfied with a very small control interface aperture and data transport via Versal’s PCIe Slave Bridge to the host memory itself. When extending the AMR base design to accommodate application data transport the following extensions to the AMR base design may be common:
Enable additional physical functions to separate application data transport from card management data transport in the Host OS.
Implement additional GCQ software instances to allow message passing between the user space application on the Host OS and application level firmware operating in the APU.
Overview of Physical Function 1¶
The AMR hardware base design defines PF1 to enable CPM PCIe BAR configurations. This allows DDR and HBM memory to be targeted by CPM DMA transfers to and from host memory.
Note: The availability and configuration of PF1 may vary depending on the specific board implementation (V80, RAVE, etc.).
