Appendix A

Alveo™ PCIe Info

The table below captures the PCIe information for the following Alveo™ products: - U2xx(U200, U250, U280) - U50x (U50, U50C, U50LV) - U30 - U55C, U55N - VCK5000 - V70 - UL3xxx - MA35D

NOTE: The following PCIe information are constant across all Alveo™ cards.

  • Vendor ID : ** 0x10EE**
  • Subsystem VID : ** 0x10EE**
  • Subsystem DID : ** 0x000E**

Table: Alveo PCIe Device ID

Card/Shell Device ID
U200 Golden 0xD000
U200 XDMA & 2RP PF0=0x5000 PF1=0x5001 PF2=0x5002 PF3=0x5003
U200 QDMA PF0=0x5010 PF1=0x5011 PF2=0x5012 PF3=0x5013
U250 Golden 0xD004
U250 XDMA & 2RP PF0=0x5004 PF1=0x5005 PF2=0x5006 PF3=0x5007
U250 QDMA PF0=0x5014 PF1=0x5015 PF2=0x5016 PF3=0x5017
U280 Golden 0xD00C
U280 XDMA PF0=0x500C PF1=0x500D PF2=0x500E PF3=0x500F
U50 Golden 0xD020
U50 XDMA PF0=0x5020 PF1=0x5021 PF2=0x5022 PF3=0x5023
U50LV XMDA PF0=0x5060 PF1=0x5061 PF2=0x5062 PF3=0x5063
U50C PF0=0x506C PF1=0x506D PF2=0x506E PF3=0x506F
U30 Golden 0xD03C
U30 (Production) PF0=0x513C PF1=0x513D PF2=0x513E PF3=0x513F
U55N PF0=0x5058 PF1=0x5059 PF2=0x505A PF3=0x505B
U55C PF0=0x505C PF1=0x505D PF2=0x505E PF3=0x505F
VCK5000 XDMA PF0=0x5044 PF1=0x5045 PF2=0x5046 PF3=0x5047
VCK5000 QDMA PF0=0x5048 PF1=0x5049 PF2=0x504A PF3=0x504B
V70 QDMA PF0=0x5094 PF1=0x5095 PF2=0x5096 PF3=0x5097
V70PQ2 QDMA PF0=0x50B0 PF1=0x50B1 PF2=0x50B2 PF3=0x50B3
UL3xxx PF0=0x50C0 PF1=0x50C1 PF2=0x50C2 PF3=0x50C3
MA35D PF0=0x5070 PF1=0x5071 PF2=0x5072 PF3=0x5073

PF -> Physical Function lspci command usage: sudo lspci -s :03:00.0 -vv

Thermal Limits

The following table captures the QSFP temperature limits for all QSFP capable Alveo™ products - U2xx(U200, U250, U280), U50x (U50, U50C, U50LV) , U55x (U55N, U55C) and VCK5000 products.

Table: QSFP Temperature Limits

Sensor Name Warning Limit Critical Limit Fatal Limit
QSFP temperature [1]
Commercial Type 65°C 70°C 75°C
Industrial Type 80°C 85°C 90°C

Notes:

[1] QSFP temperature limits may vary based on the device OEM (manufacturer) and model. For thermal references, QSFP devices are broadly categorized as Commercial or Industrial type.

Wherever supported by the QSFP device, SC FW accesses the ‘Max case temperature’ via I2C read at register offset 190 as per the SFF-8636 2.10a specification and dynamically sets the device specific critical thermal limit. For QSFP devices that doesn’t support SFF-8636 spec, SC FW assigns the critical limit as 70°C for Commercial (when register offset 190 returns 0x00) or 85°C for Industrial (when register returns 0xFF).

  • Warning limit = (Max case temperature - 5)
  • Critical limit = Max case temperature
  • Fatal limit = (Max case temperature + 5)

For additional thermal information, refer to the data sheet specific to the Alveo™ product:

Table: FPGA/ACAP/ASIC and Board Thermal Limits

Sensor Name Warning Limit Critical Limit Fatal Limit
U200/U250
FPGA Device temperature 88°C 97°C 107°C
Board temperature [4] 100°C 110°C 125°C
U280
Logical FPGA temperature [2] 88°C 97°C 107°C
Board temperature [4] 100°C 110°C 125°C
U50, U50LV, U55N/C
Logical FPGA temperature [2] 88°C 97°C 107°C
Board temperature [4] 100°C 110°C 125°C
U50C
Logical FPGA temperature [2] 88°C 100°C 107°C
Board temperature [4] 100°C 110°C 125°C
U30 [3]

Max FPGA junction temperature

(Max of ZYNQ1 & ZYNQ2 FPGA)

90°C 95°C 100°C
Board temperature 75°C 80°C 85°C
V70
Max ACAP junction temperature 97°C 100°C 105°C
Board temperature [4] 105°C 110°C 150°C

[2] Logical device temperature is maximum of FPGA die temperature and the HBM temperature.

Note: Alveo™ U30 specific [3] The OTP (One Time Programmable) values are programmed onto the temperature sensor device at the time of Manufacturing. Additionally, on boot-up, SC also configures the temperature device to ensure the temperature sensor device automatically shuts down the ZYNQ devices when either ZYNQ or board temperature exceeds the fatal limit.

[4] Board temperature is the maximum value among various on-board temperature sensors like inlet, outlet and VRs.

Note: The SC FW automatically shuts down the power to Accelerator device (FPGA/ACAP/ASIC) when any of the temperature value exceeds the fatal limit.

MA35D sensor threshold limits

The following table lists the threshold values for various thermal and electrical sensors in MA35D product:

_images/ma35d-thresholds.png

NOTE: The threshold values (Warning, Critical & Fatal) are provided to BMC via PLDM PDRs.

When any one of the following sensors reach their respective Fatal limits, Satellite Controller autonomously resets (i.e.) shuts down power to both the SuperNova ASIC devices.

  1. Outlet Temperature
  2. ASIC1/2 Temperatures
  3. Normalized VRM Temperature
  4. 3v3 Pex Voltage & Current
  5. 12v Pex Voltage & Current

AMD Support

For support resources such as answers, documentation, downloads, and forums, see the Alveo Accelerator Cards AMD/Xilinx Community Forum.

License

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License.

You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0

All images and documentation, including all debug and support documentation, are licensed under the Creative Commons (CC) Attribution 4.0 International License (the “CC-BY-4.0 License”); you may not use this file except in compliance with the CC-BY-4.0 License.

You may obtain a copy of the CC-BY-4.0 License at https://creativecommons.org/licenses/by/4.0/

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

XD038 | © Copyright 2023, Advanced Micro Devices Inc.