AMR - Versal Architecture Overview (Common)

Introduction

This document describes the common Versal architecture concepts shared across all AMR Boards (V80, RAVE). For board-specific implementations, see:

The AMR design is a PCIe-based design that demonstrates the capabilities of Versal devices for adaptive management and acceleration. All AMR boards share a common architectural approach consisting of three major blocks: CIPS (Control, Interfaces, and Processing System), Programmable Logic (PL), and NoC Interconnect.

Block Diagram Concepts

All AMR boards contain three fundamental architectural blocks that work together to provide the complete system functionality. The CIPS (Control, Interfaces, and Processing System) configures the hard blocks including processing units, memory controllers, and board management functions. The programmable logic (PL) region contains soft IP implementations and I/O connections specific to each board’s requirements. The NoC (Network-on-Chip) provides high-bandwidth interconnect routing AXI traffic across the device between these components.

Board-specific implementations differ primarily in their PCIe connectivity approach, memory architecture, and peripheral integration. Some boards leverage hardened PCIe blocks while others implement PCIe functionality in programmable logic. Memory configurations vary from multiple memory types including HBM to simpler LPDDR4-only architectures. Peripheral integration ranges from minimal I/O to extensive on-board component integration depending on the target application requirements.

Control, Interfaces and Processing System (CIPS)

The Control, Interfaces, and Processing System (CIPS) IP configures the various hard blocks present in AMD Versal™ devices. The AMR design uses CIPS to configure processing units and boards management. The following blocks are common across all AMR boards:

Real-time Processing Unit (RPU)

The RPU block contains dual Arm Cortex-R5F processors used for running AMC (Adaptive Management Controller) firmware on the device. This block is part of the processing system low-power domain (PS LPD) and serves as the primary management processor for AMR boards. The RPU handles board management functions including sensor monitoring, power management, inter-processor communication, and host interface coordination.

More information: Versal TRM - LPD Architecture

Platform Management Controller (PMC)

The PMC block contains a MicroBlaze-based processor subsystem responsible for managing device bootup, configuration, and internal monitoring. The PMC interfaces with external flash devices to read and write device images during the boot process. This compulsory unit is required for all Versal devices to function, handling the initial power-on sequence, PDI loading, and device initialization before transferring control to other processors.

More information: Versal TRM - PMC Architecture

Processing System Peripherals

The PS provides integrated peripherals including I2C controllers for sensor and EEPROM communication, SPI controllers for flash and security module access, UART controllers for debug console and logging, and optional USB controllers for peripheral connectivity. These integrated peripherals enable board management and communication without requiring custom IP in the programmable logic.

More information: Versal TRM - PS Architecture

The key architectural difference between boards lies in the PCIe implementation. Boards using HBM-series Versal devices include the CPM5 (Coherent PCIe Module) hardened block which integrates PCIe controllers and QDMA engines in silicon. Boards using Edge-series Versal devices do not include CPM and must instead implement PCIe functionality using IP blocks instantiated in the programmable logic fabric.

Programmable Logic (PL)

The PL region contains soft logic IP and I/O connections. All AMR boards follow a minimal base design philosophy:

Common Design Principles

Minimal Base Design

The base design philosophy emphasizes minimal programmable logic IP instantiation to reduce complexity and resource usage. Management IP blocks such as hardware GCQ implementations, UUID ROM, EEPROM controllers, and hardcoded register tables are not included in the programmable logic. Management functionality is instead handled by firmware running on the RPU using CIPS-integrated peripherals.

The software-based GCQ uses shared DDR memory regions for inter-processor communication rather than dedicated hardware queuing logic. This approach provides flexibility in queue sizing and message formats while preserving programmable logic resources. Address space and interface connections such as M_AXI_LPD are reserved for future expansion, allowing designs to add management IP when specific requirements justify hardware implementation.

Test Infrastructure

Basic test functionality is provided through programmable logic to validate system operation. The Physical Function 1 (PF1) hierarchy includes a SmartConnect interconnect and AXI GPIO block configured for loopback testing. Software can write values to the GPIO output register and read back the same values from the input register, verifying the complete PCIe-to-PL register access path. Reset synchronization blocks ensure proper initialization and timing relationships for the test components.

Expansion Points

The base design provides several expansion interfaces for future IP integration. The M_AXI_LPD interface is enabled but not connected, reserved for SMBus controllers, additional management peripherals, or custom firmware-accessible IP blocks. PL-PS interrupts are configured but not connected, with IRQ 0 and IRQ 1 available for event signaling from programmable logic to firmware. The PF0 address space provides management register addressing for future IP additions.

Boards-specific programmable logic usage varies based on the device capabilities. Boards with hardened PCIe blocks maintain minimal PL usage with only test GPIO instantiated. Boards requiring PL-based PCIe implementation consume more resources with PCIe IP, QDMA IP, and potentially additional peripheral blocks.

NoC Interconnect

The NoC interconnect facilitates high-bandwidth transport between CIPS blocks, PL, and memory resources. All Versal devices use the programmable NoC based on AXI-4.

Common NoC Concepts

The NoC architecture includes several types of units that work together to route traffic through the device. NoC Master Units (NMU) connect AXI masters to the NoC fabric, with different NMU types optimized for different use cases. The NMU_512 provides full-featured connectivity for programmable logic masters, while NMU_128 offers optimized low-latency paths for hardened blocks.

NoC Slave Units (NSU) connect the NoC fabric to AXI slaves and memory controllers. The NSU_512 serves programmable logic slaves, while NSU_128 connects to hardened peripherals. DDRMC_NSU provides specialized interfaces to DDR memory controllers, converting NoC packet traffic directly to the memory controller domain.

NoC Packet Switches (NPS) route traffic between the various NMUs and NSUs based on address decoding and quality-of-service requirements. The INI (Inter-NoC Interconnect) protocol carries traffic between different NoC instances when multiple AXI NoC IP blocks are instantiated in the design.

Common NoC Usage Patterns

All AMR boards route PCIe traffic to programmable logic address space through NoC master interfaces. The NoC directs traffic destined for PF0 addresses to the management address space, while PF1 traffic routes to the user logic and test infrastructure. This routing enables the PCIe host to access registers and memory in the device through memory-mapped I/O operations.

PCIe DMA operations access DDR memory through NoC routing from the DMA engine to the memory controllers. The NoC handles the address translation and routing, with QoS settings ensuring adequate bandwidth allocation for DMA traffic alongside other memory access patterns.

Firmware executing on the RPU accesses DDR memory through the LPD_AXI_NOC_0 interface for code execution, data access, and inter-processor communication buffers. The PMC accesses DDR during boot and configuration through the PMC_NOC_AXI_0 interface, loading firmware code and initializing data structures.

Board-specific NoC topologies differ in their PCIe connectivity approach and memory controller integration. Boards with hardened PCIe blocks use NMU_128 interfaces for low-latency connections and may include HBM NoC ports for high-bandwidth memory access. Boards with PL-based PCIe use NMU_512 interfaces from programmable logic and typically have simpler topologies focused on DDR memory access.

Memory Controller Integration

The NoC connects to memory controllers via specialized NSU units:

  • DDRMC_NSU: Converts NoC packets to DDR memory controller domain

  • HBM_NSU: Converts NoC packets to HBM controller domain (V80 only)

More information: Versal Programmable NoC

Common Design Patterns

Two Physical Functions

All AMR Boards use 2 Physical Functions:

Function Purpose Common Configuration
PF0 Management Reserved for admin privileges, provides address space for management registers
PF1 User/DMA Provides test functionality and DMA capabilities

Software-Based GCQ

The General Command Queue (GCQ) is implemented as a software mechanism using shared DDR memory, not as a hardware IP block. This is common across all boards.

Firmware Architecture

  • RPU0 (R5): Runs AMC (Adaptive Management Controller) firmware

  • PMC: Manages boot sequence, configuration, power management

  • CIPS Peripherals: I2C, SPI, UART used for board-level management

  • No PL Management IP: All management handled in firmware

Boot and Configuration

  • OSPI Flash: Stores boot PDI (Platform Device Image)

  • Flash Partition Table (FPT): Manages multiple partitions

  • A/B Partition Support: Redundant boot images

  • PMC Boot Flow: PMC loads PDI from OSPI and initializes device

Board-Specific Documentation

For details on how each Board implements these concepts differently:

V80 Board

RAVE Board