Alveo Debug Guide

SC Troubleshooting

This can help to determine if any communication break has happened between SC<->CMC . It is part of the larger Alveo debug guide. If you are just starting to debug please consult main page.

This Page Covers

This page covers issues observed when running the xbmgmt flash --scan and xbutil query commands and which can be related to SC issues.

You Will Need

Before beginning debug, you need to:

Common Cases

Flashable partition running on FPGA and Flashable partitions installed in system are identical.

When the platform has been installed correctly, the partition and SC version running on the FPGA and installed on the system will be identical.  The installed partitions can be displayed using the xbmgmt flash --scan command.  An example output is shown below.

Card [0000:c3:00.0]
   Card type:          u50
   Flash type:         SPI
   Flashable partition running on FPGA:
       xilinx_u50_gen3x16_xdma_201920_3,[ID=0xf465b0a3ae8c64f6],[SC=5.0.27]
           Logic UUID:
           f465b0a3ae8c64f619bc150384ace69b
   Flashable partitions installed in system:
       xilinx_u50_gen3x16_xdma_201920_3,[ID=0xf465b0a3ae8c64f6],[SC=5.0.27]

Next step:

  • None, this is expected output


Bad magic number

If xbmgmt flash --scan command returns bad magic number error as in the example below, there is a chance of a communication break between XRT and CMC. The CMC is not reading the pre-programmed magic number (0x74736574), an on card self check hex string.

read reg ERROR
ERROR: Failed to detect XMC, bad magic number: 7fff
read reg ERROR
ERROR: Failed to detect XMC, bad magic number: 7fff
Card [0000:af:00.0]
    Card type:                       u250
    Flash type:                      SPI
    Flashable partition running on FPGA:
        xilinx_u250_xdma_201830_2,[ID=0x5d14fbe6],[SC=INACTIVE]
    Flashable partitions installed in system:             
        xilinx_u250_xdma_201830_2,[ID=0x5d14fbe6],[SC=4.2.0]

Next steps:

  • Cold boot the system

    • Perform xbmgmt flash --scan

  • If issue persists

    • Pull power to the system

    • Perform xbmgmt flash --scan

  • If issue persists

  • If these steps do not resolve the issue look on the Xilinx forums


XMC not loaded

If xbmgmt flash --scan command returns XMC not loaded error as shown below, there is communication break between XRT and CMC.

ERROR: Failed to detect XMC, xmc.bin not loaded
ERROR: Failed to detect XMC, xmc.bin not loaded
ERROR: Failed to detect XMC, xmc.bin not loaded
Card [0000:d8:00.0]
    Card type:        u250
    Flash type:       SPI
    Flashable partition running on FPGA:
        xilinx_u250_xdma_201830_2,[ID=0x5d14fbe6],[SC=INACTIVE]
    Flashable partitions installed in system:    
        xilinx_u250_xdma_201830_2,[ID=0x5d14fbe6],[SC=4.2.0]

Next steps:


XMC not ready

If xbmgmt flash --scan command returns XMC not ready error as shown below, there is communication break between XRT and CMC.

ERROR: XMC is not ready: 0x3
ERROR: XMC is not ready: 0x3
ERROR: XMC is not ready: 0x3
Card [0000:a3:00.0]
    Card type:          u50
    Flash type:         SPI
    Flashable partition running on FPGA:
        xilinx_u50_gen3x16_xdma_201920_3,[ID=0xf465b0a3ae8c64f6],[SC=UNKNOWN]
            Logic UUID:
            f465b0a3ae8c64f619bc150384ace69b
    Flashable partitions installed in system:
        xilinx_u50_gen3x16_xdma_201920_3,[ID=0xf465b0a3ae8c64f6],[SC=5.0.27]
            Logic UUID:
            f465b0a3ae8c64f619bc150384ace69b

Next steps:

  • Cold boot the system

  • Perform xbmgmt flash --scan

  • If issue persists

    • Pull power to the system

    • Perform xbmgmt flash --scan

  • If issue persists

  • If these steps do not resolve the issue look on the Xilinx forums


SC is not ready

If xbmgmt flash --scan command returns SC is not ready error as shown below, there is communication break between SC and CMC.

ERROR: SC is not ready: 0x0
 Card [0000:c3:00.0]
 Card type: u50lv
 Flash type: SPI
 Flashable partition running on FPGA:
 xilinx_u50lv_gen3x4_xdma_base_2,[ID=0xc74bda63fe95d0e8],[SC=UNKNOWN]

Next steps:

  • Cold boot the system

  • Perform xbmgmt flash --scan

  • If issue persists

    • Pull power to the system

    • Perform xbmgmt flash --scan

  • If issue persists

  • If these steps do not resolve the issue look on the Xilinx forums


SC only displays two digits

If xbmgmt flash --scan command returns different number of digits of SC versions for Flashable partition running on FPGA, and Flashable partitions installed in system, there is a break in communication between XRT and CMC.

Card [0000:07:00.0]
    Card type:          u50
    Flash type:         SPI
    Flashable partition running on FPGA:
        xilinx_u50_gen3x16_xdma_201920_3,[ID=0xf465b0a3ae8c64f6],[SC=5.0]
            Logic UUID:
            f465b0a3ae8c64f619bc150384ace69b
    Flashable partitions installed in system:
        xilinx_u50_gen3x16_xdma_201920_3,[ID=0xf465b0a3ae8c64f6],[SC=5.0.27]

Next steps:


SC versions do not match

If xbmgmt flash --scan command returns different SC version displayed under Flashable partition running on FPGA, and Flashable partitions installed in system sections as shown below, the card has not been flashed to the SC version used by in the system. Both need to use the same SC version.

Card [0000:27:00.0]
    Card type:          u50
    Flash type:         SPI
    Flashable partition running on FPGA:
        xilinx_u50_gen3x16_xdma_201920_3,[ID=0xf465b0a3ae8c64f6],[SC=5.0.13]
            Logic UUID:
            f465b0a3ae8c64f619bc150384ace69b
    Flashable partitions installed in system:
        xilinx_u50_gen3x16_xdma_201920_3,[ID=0xf465b0a3ae8c64f6],[SC=5.0.27]

Next steps:


GOLDEN partition running on FPGA

When xbmgmt flash --scan command returns GOLDEN under Flashable partition running on FPGA as shown below, the card is fresh from the factory or the has been returned to factory state. The partition displayed under Flashable partitions installed in system shows only the partitions available on the system.

Card [0000:d3:00.0]
    Card type:          u50
    Flash type:         SPI
    Flashable partition running on FPGA:
        xilinx_u50_GOLDEN_9,[SC=INACTIVE]
    Flashable partitions installed in system:
        xilinx_u50_gen3x16_xdma_201920_3,[ID=0xf465b0a3ae8c64f6],[SC=5.0.27]

Next step:


Partition installed in system (None)

If xbmgmt flash --scan command returns Flashable partitions installed in system:   (None) as shown below, it means the corresponding platform running on the FPGA has not been installed in the system. The partition displayed under Flashable partitions running on FPGA will show the partition flashed on the card. The partitions on the card and system must match for applications to run.

Card [0000:03:00.0]
    Card type:          u250
    Flash type:         SPI
    Flashable partition running on FPGA:
        xilinx_u250_xdma_201830_2,[ID=0x5d14fbe6]
    Flashable partitions installed in system:   (None)

Next step:


No cards found

If xbmgmt flash --scan command returns No cards Found as shown below, the OS or XRT is unable to find the card.

No cards Found !

Next steps:


Voltage or temperature reports zero

If xbutil query command reports a zero value in FPGA TEMP, 12V PEX or 3V3 PEX as shown in the example below, there is a chance of communication break between in XRT/CMC/SC.

Temperature(C)
PCB TOP FRONT   PCB TOP REAR    PCB BTM FRONT   VCCINT TEMP
0               0               N/A             0
FPGA TEMP       TCRIT Temp      FAN Presence    FAN Speed(RPM)
0               0               P               N/A
QSFP 0          QSFP 1          QSFP 2          QSFP 3
N/A             N/A             N/A             N/A
HBM TEMP
N/A
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Electrical(mV|mA)
12V PEX         12V AUX         12V PEX Current 12V AUX Current
0               N/A             0               N/A
3V3 PEX         3V3 AUX         DDR VPP BOTTOM  DDR VPP TOP
0               N/A             N/A             N/A

Next steps:

  • Cold boot the system

  • Perform xbmgmt flash --scan

  • If issue persists

    • Pull power to the system

    • Perform xbmgmt flash --scan

  • If issue persists

  • If these steps do not resolve the issue look on the Xilinx forums


Failed to open device

If xbmgmt flash --scan command returns Failed to open device: as shown below, it means the driver was not successfully loaded or the card was not successfully flashed.

Failed to open device: 0000:3b:00.0
INFO: Found total 1 card(s); 0 are usable.

Next steps:

  • Cold boot the system

  • Perform xbmgmt flash --scan

  • If issue persists

    • Pull power to the system

    • Perform xbmgmt flash --scan

  • If issue persists

  • If these steps do not resolve the issue look on the Xilinx forums


Xilinx Support

For additional support resources such as Answers, Documentation, Downloads, and Alerts, see the Xilinx Support pages. For additional assistance, post your question on the Xilinx Community Forums – Alveo Accelerator Card.

Have a suggestion, or found an issue please send an email to alveo_cards_debugging@xilinx.com .

License

All software including scripts in this distribution are licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License.

You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0

All images and documentation, including all debug and support documentation, are licensed under the Creative Commons (CC) Attribution 4.0 International License (the “CC-BY-4.0 License”); you may not use this file except in compliance with the CC-BY-4.0 License.

You may obtain a copy of the CC-BY-4.0 License at https://creativecommons.org/licenses/by/4.0/

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

XD027 | © Copyright 2021 Xilinx, Inc.