AVED V80/V80P - NoC Configuration

Network-On-Chip (NoC) Overview

AMD Versal™ devices are designed around a programmable NoC interconnect based on AXI-4, which provides high bandwidth, long distance, communication throughout the device. NoC connections simplify the timing requirements needed when programmable logic creates a resource burden through congested logic, cross one or more SLRs, or connect to the opposite side of the device die. AVED is a three super logic region (SLR) device. Each SLR contains two Horizontal NoCs (HNoC) and four Vertical NoCs (VNoC), which connect to each other. The two HNoCs are located on the top and bottom of each SLR and span the entire width of the SLR. In addition to connecting to the VNoCs, the bottom HNoC also connects to the CIPS and DDRs. The top HNoC in SLR2 is optimized for connecting to the HBM. The four VNoCs in each SLR span the entire height of the SLR, and connect to adjacent SLRs. Connections to and from the NoC are made with the NoC packet switches (NPS). An illustration of the NoC is provided below. It illustrates the NoC resources (NMU, NSU, VNoC, HNoC, DDRMC, etc). Additional Information on NoC can be found in the Additional Resources section.

NoC Diagram


image1


NoC Components

NoC Master Units (NMU)

Versal HBM series devices contain NMU_512, NMU_128, and HBM_NMU connections to the NoC. These are described in more detail below.

https://docs.xilinx.com/r/en-US/pg313-network-on-chip/NoC-Master-Unit

NMU_512

This is a full featured NoC master unit (NMU). It allows AXI masters in the PL to connect to the Vertical NoC. For memory mapped interfaces, the data width of the NMU can be configured from 32-bits to 512-bits. For AXI4-Stream interfaces, the data width of the NMU can be configured from 128-bits to 512-bits.

https://docs.xilinx.com/r/en-US/pg313-network-on-chip/NMU512-PL

HBM_NMU

HBM_NMU are used to fully utilize the HBM bandwidth by providing direct access from the PL to the HBM. These NMU are distributed evenly across the top of SLR2 to help ease timing closure. The data width of the HBM_NMU is configurable from 32-bits to 256-bits. It does not support the AXI4-Stream protocol.

https://docs.xilinx.com/r/en-US/pg313-network-on-chip/HBM_NMU

NMU_128

These NMUs are optimized for the low latency requirements of the hardened blocks, such as CIPS. It has a fixed 128-bit AXI data width. It does not support the AXI4-Stream protocol or master-defined destination IDs.

https://docs.xilinx.com/r/en-US/pg313-network-on-chip/NMU128-Low-Latency

NoC Slave Units (NSU)

Versal HBM series devices contain NSU_512, NSU_128, DDRMC_NSU, and HBM_NSU connections to the NoC. These are described in more detail below.

https://docs.xilinx.com/r/en-US/pg313-network-on-chip/NoC-Slave-Unit

NSU_512

This is a full featured NoC slave unit (NSU). It allows Vertical NoC connections to AXI slaves in the PL. For memory mapped interfaces, the data width of the NSU can be configured from 32-bits to 512-bits. For AXI4-Stream interfaces, the data width of the NSU can be configured from 128-bits to 512-bits.

https://docs.xilinx.com/r/en-US/pg313-network-on-chip/NSU512-PL

NSU_128

These NSUs are optimized for the low latency requirements of the hardened blocks, such as CIPS. It has a fixed 128-bit AXI data width. It does not support the AXI4-Stream protocol.

https://docs.xilinx.com/r/en-US/pg313-network-on-chip/NSU128-Low-Latency

DDRMC_NSU

Each port of a DDRMC has a partial NSU (DDRMC_NSU). It converts the NoC packet domain to the memory controller domain without first converting it to the AXI protocol.

https://docs.xilinx.com/r/en-US/pg313-network-on-chip/DDRMC-NSU

HBM_NSU

Each pseudo channel of the HBM has two NSU (HBM_NSU). It converts the NoC packet domain to the HBM controller domain without first converting it to the AXI protocol.

https://docs.xilinx.com/r/en-US/pg313-network-on-chip/HBM_NSU

NoC Packet Switches (NPS)

Connects NMUs to NSUs.

AVED V80/V80P NoC Overview

AXI NoC Instantiations

AVED uses multiple AXI NoC IP instantiations to connect AXI and Inter-NoC interconnect (INI) interfaces to the NoC. This allows the host and RPU to interface with the PL, HBM, DDR, and DIMM through CIPS. The DIMM is only available on the V80 card, therefore, any reference to the DIMM connectivity applies only to AVED V80.

The configuration of each AXI NoC instantiation is described below.

  • axi_noc_cips

    • NoC to PL AXI connections to allow the PCIe® Host to access the PL management and xbtest functionality.

    • INI connections to allow the PCIe Host to access the DDR and DIMM MCs.

    • INI connections to allow the Platform management controller (PMC) access to the DDR and DIMM MCs.

    • INI connection to allow the RPU to access to the DDR MC.

    • PL AXI to NoC connections to allow the xbtest HBM memory kernel to access the HBM controllers.

  • axi_noc_mc_ddr4_0

    • INI connection to allow the CIPS to connect to the DDR MC. Only two of the four MC ports are used.

  • axi_noc_mc_ddr4_1

    • AXI connections to allow the xbtest memory kernel to connect to the DIMM MC (ports 2 and 3).

    • INI connections to allow the CIPS to connect to the DIMM MC (ports 0 and 1).

AVED does not use the PS APUs, but if they were used, the AXI NoC IP could be configured to support these connections. Additional connections could also be made to the PL and memory controllers (MC).

AXI NoC Performance

System performance depends on the NoC, DDR, DIMM, and HBM performance. There are various sources of overhead in the NoC which degrade the theoretical maximum bandwidth of a NoC NMU (NoC master unit) or NSU (NoC slave unit). These include contention, the amount of mixed traffic (rd% vs wr%), read address, write response packets, the number of NMUs/NSUs on each VNoC/HNoC and their requested QoS. To aid in the NoC configuration of routing and resources used, the Versal NoC compiler requires a traffic specification. The traffic specification consists of the NoC connections and the quality of service (QoS) requirements. QoS has two components:

  • Traffic Class: This defines how traffic on the connection is prioritized in the NoC compiler and in the hardware. The traffic class must be set on the NMU and is used for all paths from that NMU. Normally, the best effort class is chosen, but other options are available. AVED uses ‘Best Effort’ for all its NoC settings. With this setting, the NoC compiler will work to satisfy the BW and latency requirements of all low latency and Isochronous paths first. Then after those requirements have been met, the NoC compiler will work to satisfy the BW and latency requirements with paths using best effort. With the ‘Best Effort’ setting, AVED is able to meet its requirements.

    • Low latency: typically CPU to DDR memory transactions.

    • Isochronous: real-time deadlines.

    • Best effort: bulk transfers and not time critical. This is also the only available option for the HBM Read and Write traffic class.

  • Bandwidth requirements: This describes how much bandwidth is required in each direction (rd/wr). These requirements are associated with the slave ports. Each slave port can have a separate bandwidth requirement setting.

In addition to creating the traffic specification, the DIMM MC and HBM Controllers were tuned to optimize performance by modifying the MC address mapping. A description of the HBM optimizations are documented in Configure HBM Channels and the DIMM optimizations are documented in Configure DIMM Address Mapping.

Additional information on the NoC, NoC performance, and performance tuning can be found in the Additional Resources section.

NoC Resources

Available NMU & NSU

The table below shows the total number of NMU and NSU in the Versal device.

image2

CIPS-Specific NMU & NSU

All NMU_128 and NSU_128 connect to CIPS. The main CIPS core is located in SLR0 and it has multiple NoC connections which include PCIe, PMC, RPU, cache coherent, and Non-cache coherent. In V80 and V80P, there is also a CIPS core in SLR1 and SLR2. Each of these CIPS cores have one PMC_NMU and one PMC_NSU connection to it.


image3

NCI: Non-cache coherent interconnect

CCI: Cache coherent interconnect

https://docs.xilinx.com/r/en-US/pg352-cips/PS-NoC-Interfaces

AVED NMU & NSU Usage

The following table shows the total number of NMU and NSU used by AVED.

image4

*There are four DDRMC controllers, but AVED only uses two. The DDR uses two DDRMC NSU and the DIMM uses four DDRMC NSUs.


A break down of the different NMU and NSU connections are given in the following tables.

The host uses multiple connections (CPM_PCIE_NOC_0 and CPM_PCIE_NOC_1) to access multiple PL slaves, the DDR, and the DIMM for more optimal performance.

image5

The host uses multiple connections (CPM_PCIE_NOC_0 and CPM_PCIE_NOC_1) to access multiple PL slaves, the DDR, and the DIMM for more optimal performance.

image6

The PMC connects to the DDR and DIMM to manage primary pre-boot tasks and management of the hardware for reliable power-up and power-down of device resources.

The host and RPU, which operates in the LPD, communicate through the use of the DDR.

Xbtest connects to dedicated DIMM ports to exercise the the NoC and DIMM performance.

image7

Xbtest connects to dedicated HBM_NMU ports exercise the the NoC and HBM performance.

image8

Additional connections can be made to the NoC and memory ports as required by other designs. As more connections are made, additional tuning to the NoC components and DIMM/HBM addressing may be required for optimum performance.

Additional Resources

More information about the NoC can be found here:

The physical link raw BW information can be found here:

The peak expected bandwidths for AVED with different read/write ratios can be found here:

If more information is needed for more advanced NoC performance tuning, it can be found here:

AXI NoC CIPS Configurations

AXI NoC CIPS (axi_noc_cips)

The axi_noc_cips block connects the CIPS to the PL logic and memory controllers through NoC connections. The port connections and configuration of this block are described below.

image9

GUI Configuration of axi_noc_cips

For AVED, the NoC configuration changes are described below. More documentation can be found at https://docs.xilinx.com/r/en-US/pg313-network-on-chip/Configuring-the-AXI-NoC.

General

The general tab allows the number of AXI master and slave interfaces, INI master and slave interfaces, and integrated memory controller connections to be specified. The settings are explained below.

image10


AXI Interfaces - AXI Slave

AVED connects the following slave interfaces to the CIPS master. The following table shows the AXI slave connections to axi_noc_cips. The frequencies in the following table were chosen to be high enough to meet the QoS requirements on the DDR and HBM interfaces without oversaturating the NoC interfaces. In particular, xbtest runs at ~400MHz to maximize the performance with the HBM core running at 200MHz. Since the HBM is a DDR memory, the data bus toggles at twice the HBM core clock rate. It is possible for the HBM to saturate the NoC when running at 400MHz because it is using all 64 256-bit HBM_NMU (64 NMUs * 12,800MB/s = 819.2 GB/s Total). See the Versal Adaptive SoC Programmable Network on Chip and Integrated Memory Controller 1.0 LogiCORE IP Product Guide (PG313) for more information.

The axi_noc_cips connections outlined in the tables below are further illustrated in the NoC Connection Diagrams.

image11

AXI Interfaces - AXI Master

AVED connects the following master interfaces to the PL peripheral slave devices. The following table shows the axi_noc_cips connections to the PL peripherals. All the interfaces below are AXI4-Lite which do not require high bandwidth.

image12

AXI Interfaces - AXI Clocks

The AXI NoC IP uses the CIPS and peripheral clocks to manage the clock domain crossings to the NoC. The table below shows numerous clocks connected to axi_noc_cips.

image13

INI Interfaces - INI Slave

No Connections

INI Interfaces - INI Master

AVED V80 connects the following master INI interfaces to the DDR/DIMM memory controllers. The following table shows the axi_noc_cips INI connections to the DDR and DIMM MCs.

image14

Memory Controllers - DDR4

No DDR memory controllers are connected to the axi_noc_cips.

Memory Controllers - HBM

AVED axi_noc_cips is configured to connect to all 64 AXI Slave PL interfaces, to use all 32GB of the HBM memory, and allow for the maximum performance. It connects 64 ports from the xbtest memory kernel to the 64 ports of HBM in axi_noc_cips.

image15

Inputs

As mentioned in the AXI Interfaces Slave section above, there are four CIPS AXI masters connected to the axi_noc_cips slaves.

image16

Outputs

As mentioned in the AXI Interfaces Master section above, there are four CIPS AXI masters connected to the PL.

image17

Connectivity

By default, there are no connections enabled in the connectivity tab. This tab is a large matrix where a check mark indicates a connection to the input indicated by the row, with the output indicated by the column. Connectivity to the inputs (slave AXI and HBM AXI) must be made to the outputs (master AXI, master INI, and HBM memory controller pseudo channels).

As shown in the first row of the figure below, CIPS S00_AXI (pc_pcie) connects to all of the following:

  1. axi_noc_cips/M00_AXI (which then connects to base_logic/pcie_slr0_mgmt_sc/S00_AXI in the Block Design)

  2. axi_noc_cips/M01_AXI (which then connects to xbtest/pcie_slr0_sc/S00_AXI in the Block Design)

  3. axi_noc_cips/M02_AXI (which then connects to xbtest/pcie_slr1_sc/S00_AXI in the Block Design)

  4. axi_noc_cips/M03_AXI (which then connects to xbtest/pcie_slr2_sc/S00_AXI in the Block Design)

  5. axi_noc_cips/M00_INI (which then connects to axi_noc_mc_ddr4_0/S00_INI of the DDR in the Block Design)

  6. axi_noc_cips/M02_INI (which then connects to axi_noc_mc_ddr4_1/S00_INI of the DIMM in the Block Design)

See NoC Connection Diagrams for further details on the non HBM connections.

While the table does not capture the complete connection to the HBM, the same pattern is followed through HBM63_AXI to HBM15_PC1.

image18


NoC Connection Diagrams

The diagrams below capture the connections of the CIPS NoC interfaces. They do not capture the full AVED design, but capture enough information to illustrate the connections made in the connectivity matrix. The NoC connection from the CIPS interface can be followed to the axi_noc_cips block where the green line indicates the different connections made in the connectivity matrix. It also shows the egress port connections to the PL and DDRMCs.

CPM_PCIE_NOC_0 End-to-End NoC Connection Diagram

In the diagram below, the green lines in the axi_noc_cips block, illustrate the connections from CIPS PCIe (CPM_PCIE_NOC_0 ) to the base logic, xbtest, DDR, and DIMM. These are the same connections described above for the first row in the connection matrix:

  1. axi_noc_cips/M00_AXI (which then connects to base_logic/pcie_slr0_mgmt_sc/S00_AXI in the Block Design)

  2. axi_noc_cips/M01_AXI (which then connects to xbtest/pcie_slr0_sc/S00_AXI in the Block Design)

  3. axi_noc_cips/M02_AXI (which then connects to xbtest/pcie_slr1_sc/S00_AXI in the Block Design)

  4. axi_noc_cips/M03_AXI (which then connects to xbtest/pcie_slr2_sc/S00_AXI in the Block Design)

  5. axi_noc_cips/M00_INI (which then connects to axi_noc_mc_ddr4_0/S00_INI of the DDR in the Block Design)

  6. axi_noc_cips/M02_INI (which then connects to axi_noc_mc_ddr4_1/S00_INI of the DIMM in the Block Design)

image19

CPM_PCIE_NOC_1 End-to-End NoC Connection Diagram

In the diagram below, the green lines in the axi_noc_cips block, illustrate the connections from CIPS PCIe (CPM_PCIE_NOC_1) to the base logic, xbtest, DDR, and DIMM.

image20

PMC_NOC_AXI_0 End-to-End NoC Connection Diagram

In the diagram below, the green lines in the axi_noc_cips block, illustrate the connections from PMC to DDR and DIMM.

image21

LPD_AXI_NOC_0 End-to-End NoC Connection Diagram

In the diagram below, the green lines in the axi_noc_cips block, illustrate the connections from the RPU to the DDR.

image22

QoS

The QoS entries are used by the NoC compiler to determine routing of the NoC traffic through the device meeting the bandwidth requirements. If the requirements cannot be met, the NoC compiler can choose to use different NoC resources (NMU, NSU, NPS, etc.) to achieve the required performance.

Traffic Class

AVED uses the default ‘Best Effort’ setting for all its NoC settings, but other options are available.

  • Low latency: typically CPU to DDR memory transactions.

  • Isochronous: real-time deadlines.

  • Best effort: bulk transfers and not time critical. This is also the only available option for the HBM Read and Write traffic class.

Using ‘Best Effort’, the NoC compiler will work to satisfy the BW and latency requirements of all low latency and Isochronous paths first. After those requirements have been met, the NoC compiler will work to satisfy the BW and latency requirements with paths using best effort. With the ‘Best Effort’ setting, AVED is able to meet its requirements.

Bandwidth

Different NoC paths require different bandwidths. The M0x_AXI bandwidths are requesting 5MB/s because these are AXI4-Lite interfaces that do not require high bandwidth. The M0x_INI interfaces are higher performance than the AXI4-Lite interfaces. These are paths from the PMC, RPU, and Host PCIe that connect to the DDR (M00_INI and M01_INI) and DIMM (M02_INI and M03_INI). A bandwidth of 800MB/s satisfies the AVED bandwidth requirements while not saturating the NoC where multiple paths connect to the same DDR port (DDR and DIMM Port Connections). The HBM paths are the highest performance and request 12,800MB/s on all 64 pseudo channels to saturate the HBM. The HBM will saturate when all the 64 256-bit HBM_NMUs operate at 400MHz (819.2 GB or 12.8GB * 64).

More information can be found here:

The following figures show all the connections made to the axi_noc_cips inputs and their desired bandwidths.

image23


image24

After the implemented design has been opened in AMD Vivado™, the NoC performance can be verified by opening the Vivado Window → NoC window. This will also show the routed NoC paths.

Address Remap

The NMU supports address remapping as a way to override the default address map as well as to provide simple address virtualization. AVED uses this feature to remap the PCIe address to the DDR. This allows the PCIe host to have access to the DDR for communication with the RPU.

image25

See the Address Re-mapping section of the Versal Adaptive SoC Programmable Network on Chip and Integrated Memory Controller 1.0 LogiCORE IP Product Guide (PG313) for more information.

image26

HBM Configuration

Channel Configuration

Each HBM channel can be configured with different values. Since the same traffic pattern is used for all channels, AVED configures all 16 of the HBM channels the same. As a result, there will only be one HBM window to configure when the ‘Configure Channels‘ button is selected.

Clocking

The HBM uses two external LVDS clocks operating at 200MHz. These are dedicated clock inputs from banks 800 and 801 of the Versal device. One clock is connected to each HBM stack. The HBM clock source (internal vs external), and IO standard must be the same for both clocks. Configure the HBM clocks as shown below. See the Versal HBM Series - External Reference Clock Design Guidance Article for more information.

The ‘Configure Channels’ button opens a full set of HBM options. The options are explained in detail in the HBM Configuration Tab section of the Versal Adaptive SoC Programmable Network on Chip and Integrated Memory Controller 1.0 LogiCORE IP Product Guide (PG313):


image27


Configure Channels

The address mapping of the HBM has been modified to obtain optimum performance with xbtest. The change is described below.

HBM Address Map Options

HBM operation is similar to the AVED DDRs. In order to achieve maximum HBM performance, xbtest uses multiple linear pattern, traffic generators. Each traffic generator accesses a separate region of memory. Since the traffic pattern is well defined, the highest performance can be achieved by selecting the appropriate address mapping. Some general guidelines are provided below.

  • Linear traffic:

    • Map the column address (CA) to the second from rightmost entry. This will leave the page open as long as possible, minimizing the loss of efficiency caused by switching pages.

    • Use the bank group interleave option by setting 1BG on the rightmost entry. This decreases the amount of time spent waiting for execution because the controller splits the accesses between two Banks in two separate Bank Groups.

  • Multi-channel: Map the Bank Group (BG) to the leftmost entry so each xbtest channel can access different banks. This is done to avoid one channel from causing a page miss in another channel.

Xbtest uses two channels per HBM pseudo-channel (PC). By remapping 1BG, the HBM PC is split into two regions. This allows each xbtest channel to access different HBM PC regions.

The following notation is used for address mapping:

  • RA for Row Address

  • SID for Stack ID

  • BA for Bank Address

  • BG for Bank Group

  • CA for Column Address

The default and modified address mapping is shown below:

  • Default: 15RA-1SID-2BA-1BG-5CA-1BG

  • Xbtest: 1BG-15RA-1SID-2BA-5CA-1BG

More thorough descriptions of the configuration tabs can be found in the HBM Address Map Options Tab of the Versal Adaptive SoC Programmable Network on Chip and Integrated Memory Controller 1.0 LogiCORE IP Product Guide (PG313) .

image28

AVED uses a TCL file to set the HBM Address Map options:

HBM Address Map Options

set hbm_ch_cfg [get_property CONFIG.HBM_CHNL0_CONFIG [get_bd_cells /axi_noc_cips]]
dict set hbm_ch_cfg HBM_REFRESH_MODE                    {SINGLE_BANK_REFRESH}
dict set hbm_ch_cfg HBM_PC0_PRE_DEFINED_ADDRESS_MAP     {USER_DEFINED_ADDRESS_MAP}
dict set hbm_ch_cfg HBM_PC1_PRE_DEFINED_ADDRESS_MAP     {USER_DEFINED_ADDRESS_MAP}
dict set hbm_ch_cfg HBM_PC0_USER_DEFINED_ADDRESS_MAP    {1BG-15RA-1SID-2BA-5CA-1BG}
dict set hbm_ch_cfg HBM_PC1_USER_DEFINED_ADDRESS_MAP    {1BG-15RA-1SID-2BA-5CA-1BG}
dict set hbm_ch_cfg HBM_PC0_ADDRESS_MAP                 {BA3,RA14,RA13,RA12,RA11,RA10,RA9,RA8,RA7,RA6,RA5,RA4,RA3,RA2,RA1,RA0,SID,BA1,BA0,CA5,CA4,CA3,CA2,CA1,BA2,NC,NA,NA,NA,NA}
dict set hbm_ch_cfg HBM_PC1_ADDRESS_MAP                 {BA3,RA14,RA13,RA12,RA11,RA10,RA9,RA8,RA7,RA6,RA5,RA4,RA3,RA2,RA1,RA0,SID,BA1,BA0,CA5,CA4,CA3,CA2,CA1,BA2,NC,NA,NA,NA,NA}
set_property CONFIG.HBM_CHNL0_CONFIG $hbm_ch_cfg [get_bd_cells /axi_noc_cips]

Other designs may require different settings to meet performance requirements. More information on HBM addressing and routing can be found in the following links:

HBM Refresh and Power Saving Options

AVED uses Single Bank Refresh mode. When used in conjunction with the user defined address map above, it helps improve HBM performance. Temperature compensated refresh is not enabled because it could require more refreshes and impact performance. Power saving options are not enabled since they affect performance.

For more information see the HBM Refresh and Power Savings Tab of the Versal Adaptive SoC Programmable Network on Chip and Integrated Memory Controller 1.0 LogiCORE IPProduct Guide (PG313).

image29

HBM Reliability Options

AVED uses the default settings. Information on these settings can be found in the HBM Reliability Options Tab section of the Versal Adaptive SoC Programmable Network on Chip and Integrated Memory Controller 1.0 LogiCORE IP Product Guide (PG313).

  • Write Data Mask: This option is enabled and ECC is disabled.

  • DBI: Dynamic bus inversion is enabled for read and write data.

image30

AXI NoC Memory Controller - DDR (axi_noc_mc_ddr4_0)

AVED uses a high-efficiency, low-latency integrated DDR memory controller (MC) for general purpose CPU access to the DDR. Through INI connections, the CIPS PCIe Host, PMC, and RPU can access the DDR.

Ports

There are a minimal number of ports on axi_noc_mc_ddr4_0, as shown in the following figure.

image31

GUI Configuration of axi_noc_mc_ddr4_0

For AVED, the NoC DDRMC configuration may be specified in a TCL file or through the GUI.

# Create instance: axi_noc_mc_ddr4_0, and set properties
set axi_noc_mc_ddr4_0 [ create_bd_cell -type ip -vlnv xilinx.com:ip:axi_noc axi_noc_mc_ddr4_0 ]
set_property -dict [list \
  CONFIG.CONTROLLERTYPE {DDR4_SDRAM} \
  CONFIG.MC_CHAN_REGION1 {DDR_CH1} \
  CONFIG.MC_COMPONENT_WIDTH {x16} \
  CONFIG.MC_DATAWIDTH {72} \
  CONFIG.MC_DM_WIDTH {9} \
  CONFIG.MC_DQS_WIDTH {9} \
  CONFIG.MC_DQ_WIDTH {72} \
  CONFIG.MC_INIT_MEM_USING_ECC_SCRUB {true} \
  CONFIG.MC_INPUTCLK0_PERIOD {5000} \
  CONFIG.MC_MEMORY_DEVICETYPE {Components} \
  CONFIG.MC_MEMORY_SPEEDGRADE {DDR4-3200AA(22-22-22)} \
  CONFIG.MC_NO_CHANNELS {Single} \
  CONFIG.MC_RANK {1} \
  CONFIG.MC_ROWADDRESSWIDTH {16} \
  CONFIG.MC_STACKHEIGHT {1} \
  CONFIG.MC_SYSTEM_CLOCK {Differential} \
  CONFIG.NUM_CLKS {0} \
  CONFIG.NUM_MC {1} \
  CONFIG.NUM_MCP {4} \
  CONFIG.NUM_MI {0} \
  CONFIG.NUM_NMI {0} \
  CONFIG.NUM_NSI {2} \
  CONFIG.NUM_SI {0} \
] $axi_noc_mc_ddr4_0
General

The General tab allows the number of AXI master and slave interfaces, INI master and slave interfaces, and Integrated memory controller connections to be specified. The settings are explained below.

image32

AXI Interfaces

AXI Slave - No Connections

AXI Master - No Connections

INI Interfaces

INI Slave

AVED connects the following master interfaces to the DDRMC. The following table shows the INI master connections to axi_noc_mc_ddr4_0.

image33

M00_INI connects to the CIPS PCIe, PMC, and RPU through the axi_noc_cips.

M01_INI connects to the CIPS RPU through the axi_noc_cips.

INI Master - No Connections

Memory Controllers - DDR4

AVED uses a single memory controller with four ports. Access to the the 4GB DDR is design-specific as defined in the Discrete DDR Diagram. There are two 2G address ranges that address this DDR:

  • DDR LOW0 - 0x000_0000_0000 - 0x000_7FFF_FFFF

  • DDR CH1 - 0x500_8000_0000 - 0x500_FFFF_FFFF

Memory Controllers - HBM

N/A

Inputs

As mentioned in the INI Interfaces section above, there are two CIPS INI masters connected to the axi_noc_mc_ddr4_0 slaves.

image34

Outputs

N/A

Connectivity

By default, there are no connections enabled in the connectivity tab. Connectivity between the inputs (slave INI) and the outputs (DDR Ports) is established through checking boxes in a matrix, where a check mark indicates a connection between an input (indicated by the row) and an output (indicated by the column).

The figure below captures the connectivity enabled in AVED:

  • S00_INI (CIPS PCIe Host, PMC and RPU) connects to port 0 of the MC.

  • S0I_INI (CIPS PCIe Host) connects to port 1 of the MC.

image35



DDR and DIMM Port Connections

The diagram below shows the DDRMC access paths between CIPS and the DDRMC port through axi_noc_cips.

Note: Multiple connections are made to DDR Port 0 and DIMM Port 0.

  • CPM_PCIE_NOC_0

  • CPM_PCIE_NOC_1

  • PMC_NOC_AXI_0

  • LPD_AXI_NOC_0

image36
QoS

The QoS entries are used by the NoC compiler to determine routing of the NoC traffic through the device, meeting the bandwidth requirements. If the requirements cannot be met, the NoC compiler can choose to use different NoC resources (NMU, NSU, NPS, etc.) to achieve the required performance.

Traffic Class

AVED uses the default ‘Best Effort’ setting for all its NoC settings, but other options are available.

  • Low latency: typically CPU to DDR memory transactions.

  • Isochronous: real-time deadlines.

  • Best effort: bulk transfers and not time critical. This is also the only available option for the HBM Read and Write traffic class.

Using ‘Best Effort’, the NoC compiler will work to satisfy the BW and latency requirements of all Low Latency and Isochronous paths first. Then after those requirements have been met, the NoC compiler will work to satisfy the BW and latency requirements with paths using best effort. With the ‘Best Effort’ setting, AVED is able to meet its requirements.

Bandwidth

The desired bandwidth of the transactions. The MC ports are requesting 800MB/s for transactions between the PMC, RPU, and Host PCIe and the DDR (M00_INI and M01_INI).

More information can be found in the following:

  • The Quality of Service section of the Versal Adaptive SoC Technical Reference Manual (AM011).

  • The QoS tab section of the Versal Adaptive SoC Programmable Network on Chip and Integrated Memory Controller 1.0 LogiCORE IP Product Guide (PG313).

The following figure shows all the connections made to the axi_noc_mc_ddr4_0 inputs and their desired bandwidths.

image37



After the implemented design has been opened in Vivado, the NoC performance can be verified by opening the Vivado Window → NoC window. This will also show the routed NoC paths.

Address Remap

N/A

DDR Basic

Information on the DDR Memory options is found 7in the Configuring the Memory Controller section of the Versal Adaptive SoC Programmable Network on Chip and Integrated Memory Controller 1.0 LogiCORE IP Product Guide (PG313).

AVED supports DDR4 running at 200MHz.

image38

Clocks

The AXI NoC IP uses the DDR board clock to manage the clock domain crossings between the NoC and DDR.

image39

DDR Memory

AVED supports a 4GB memory. Choose the settings below for the proper part and speed grade.

image40

DDR Address Mapping

No change - leave at defaults

image41

DDR Advanced

AVED requires ECC per design requirements.

image42

AXI NoC Memory Controller - DIMM (axi_noc_mc_ddr4_1) (for AVED V80 only)

AVED V80 uses a high-efficiency, low-latency integrated DDR memory controller (MC) that can be used for general purpose CPU access and other traditional FPGA applications such as video or network buffering. Through AXI and INI connections from xbtest and CIPS, the xbtest kernel and CIPS PCIe Host and PMC can access the DIMM. Each input connects to one of the four MC ports. As shown in the QoS tab, the NoC BW is not saturated, so additional connections could be made to the ports if desired.

Ports

There are a minimal number of ports on axi_noc_mc_ddr4_1 as shown in the following figure.

image43

GUI Configuration of axi_noc_mc_ddr4_1

For AVED, the NoC DDRMC configuration may be specified in a TCL file or through the GUI.

# Create instance: axi_noc_mc_ddr4_1, and set properties
set axi_noc_mc_ddr4_1 [ create_bd_cell -type ip -vlnv xilinx.com:ip:axi_noc axi_noc_mc_ddr4_1 ]
set_property -dict [list \
  CONFIG.CONTROLLERTYPE {DDR4_SDRAM} \
  CONFIG.MC0_CONFIG_NUM {config21} \
  CONFIG.MC0_FLIPPED_PINOUT {false} \
  CONFIG.MC_CHAN_REGION0 {DDR_CH2} \
  CONFIG.MC_COMPONENT_WIDTH {x4} \
  CONFIG.MC_DATAWIDTH {72} \
  CONFIG.MC_INIT_MEM_USING_ECC_SCRUB {true} \
  CONFIG.MC_INPUTCLK0_PERIOD {5000} \
  CONFIG.MC_MEMORY_DEVICETYPE {RDIMMs} \
  CONFIG.MC_MEMORY_SPEEDGRADE {DDR4-3200AA(22-22-22)} \
  CONFIG.MC_NO_CHANNELS {Single} \
  CONFIG.MC_PARITY {true} \
  CONFIG.MC_RANK {1} \
  CONFIG.MC_ROWADDRESSWIDTH {18} \
  CONFIG.MC_STACKHEIGHT {1} \
  CONFIG.MC_SYSTEM_CLOCK {Differential} \
  CONFIG.NUM_CLKS {1} \
  CONFIG.NUM_MC {1} \
  CONFIG.NUM_MCP {4} \
  CONFIG.NUM_MI {0} \
  CONFIG.NUM_NMI {0} \
  CONFIG.NUM_NSI {2} \
  CONFIG.NUM_SI {0} \
] $axi_noc_mc_ddr4_1
General

The General tab allows the number of AXI master and slave interfaces, INI master and slave interfaces, and integrated memory controller connections to be specified. The settings are explained below.

image44

AXI Interfaces

AXI Slave

AVED V80 exercises the DIMM memory using two AXI connections from the xbtest memory controller. These interfaces operate at ~2x the DDR clock of 200MHz for maximum performance.

image45

AXI Master - No Connections

Clocks

The AXI NoC IP uses the DDR board clock, xbtest clock, and CIPS PL clock to manage the clock domain crossings between the NoC and DIMM.

image46

INI Interfaces

INI Slave

AVED V80 connects the following master INI interfaces to the memory controller.

image47

M00_INI connects to the CIPS PCIe, PMC, and RPU through the axi_noc_cips.

M01_INI connects to the CIPS RPU through the axi_noc_cips.

INI Master - No Connections

Memory Controllers - DDR4

AVED V80 uses a single memory controller with four ports. Access to the the 32GB DIMM is design specific as defined in the DIMM Diagram. There is one address ranges that addresses this DDR:

  • DDR CH2 - 0x600_0000_0000 - 0x67F_FFFF_FFFF

Memory Controllers - HBM

N/A

Inputs

As mentioned in the AXI Interfaces and INI Interfaces section above, there are two CIPS INI masters connected to the axi_noc_mc_ddr4_1 slaves.

image48

Outputs

N/A

Connectivity

By default, there are no connections enabled in the connectivity tab. Connectivity between the inputs (slave INI) and the outputs (DDR Ports) is established through checking boxes in a matrix, where a check mark indicates a connection between an input (indicated by the row) and an output (indicated by the column).

The figure below captures the connectivity enabled in AVED:

  • S00_INI (CIPS PCIe Host and PMC) connects to port 0 of the MC.

  • S0I_INI (CIPS PCIe Host) connects to port 1 of the MC.

  • S00_AXI (/xbtest/krnl_memtest_ddr_02_00_1/m01_axi) connects to port 2 of the MC.

  • S0I_AXI (/xbtest/krnl_memtest_ddr_02_00_1/m02_axi) connects to port 3 of the MC.

image49

DDRMC Access

The diagram below shows the V80 DIMM DDRMC port access paths with CIPS and xbtest.

  • CPM_PCIE_NOC_0

  • CPM_PCIE_NOC_1

  • PMC_NOC_AXI_0

  • xbtest

image50

QoS

The QoS entries are used by the NoC compiler to determine routing of the NoC traffic through the device meeting the bandwidth requirements. If the requirements cannot be met, the NoC compiler can choose to use different NoC resources (NMU, NSU, NPS, etc) to achieve the required performance.

Traffic Class

Normally, the best effort class is chosen, but other options are available. AVED V80 uses best effort for all its NoC settings. With this setting, the NoC compiler will work to satisfy the BW and latency requirements after any paths with low latency and Isochronous have been met.

  • Low latency: typically CPU to DDR memory transactions.

  • Isochronous: real-time deadlines.

  • Best effort: bulk transfers and not time critical.

Bandwidth

The desired bandwidth of the transactions. The MC ports are requesting 800MB/s for transactions between the PMC, RPU, and Host PCIe and the DDR (M00_INI and M01_INI).

More information can be found here:

  • The Quality of Service section of the Versal Adaptive SoC Technical Reference Manual (AM011).

  • The QoS tab section of the Versal Adaptive SoC Programmable Network on Chip and Integrated Memory Controller 1.0 LogiCORE IP Product Guide (PG313).

The following figure shows all the connections made to the axi_noc_mc_ddr4_1 inputs and their desired bandwidths.

image51



After the implemented design has been opened in Vivado, the NoC performance can be verified by opening the Vivado Window → NoC window. This will also show the routed NoC paths.

Address Remap

N/A

DDR Basic

AVED V80 uses an external 200MHz clock for the DDR.

image52

DDR Memory

AVED supports a 32GB DIMM memory (V80 only). Choose the settings below for the proper part and speed grade.

These options are enabled for future growth:

  • DRAM Command/Address Parity- Versal Adaptive SoC Programmable Network on Chip and Integrated Memory Controller 1.0 LogiCORE IP Product Guide (PG313)

  • Future Expansion for PCB designs - Versal Adaptive SoC Programmable Network on Chip and Integrated Memory Controller 1.0 LogiCORE IP Product Guide (PG313).

image53

DDR Address Mapping

In order to achieve maximum DIMM performance, xbtest uses multiple linear pattern traffic generators. Each traffic generator accesses a separate region of memory. Since the traffic pattern is well defined, the highest performance can be achieved by selecting the appropriate address mapping. Some general guidelines are provided below.

  • Linear traffic: Map the Column Address (CA) to the rightmost entry. This will leave the page open as long as possible, minimizing the loss of efficiency caused by switching pages.

  • Multi-channel: Map the Bank Group (BG) to the leftmost entry so each xbtest channel can access different banks. This is to avoid one channel causing page misses in another channel.

Xbtest uses two channels to access the DIMM. Based on the general description above, the DIMM is split into two regions by remapping 1BG so each xbtest channel targets different DDR regions.

The following notation is used for address mapping:

  • RA for Row Address

  • BA for Bank Address

  • BG for Bank Group

  • CA for Column Address

DIMM Address Mapping Change

  • Default Setting = 18RA-2BA-1BG-7CA-1BG-3CA

  • Xbtest Setting = 1BG-18RA-2BA-7CA-1BG-3CA


image54

AVED V80 uses a TCL file to set the DIMM Address Map option:

set_property -dict [list \
    CONFIG.MC_PRE_DEF_ADDR_MAP_SEL {USER_DEFINED_ADDRESS_MAP} \
    CONFIG.MC_USER_DEFINED_ADDRESS_MAP {1BG-18RA-2BA-7CA-1BG-3CA} \
] [get_bd_cells /axi_noc_mc_ddr4_1]

Other designs may require different settings to meet routing and performance requirements. More information can be found in the DRAM Address Mapping section of the Versal Adaptive SoC Programmable Network on Chip and Integrated Memory Controller 1.0 LogiCORE IP Product Guide (PG313).

DDR Advanced

AVED V80 requires ECC.

image55

Page Revision: v. 158