Alveo Debug Guide

Power Delivery

This page will help guide you through steps to enure that your Alveo™ card can work in a system under load.

This Page Covers

Testing the power delivery to one or more Alveo cards

Xilinx has two test tools, xbutil validate and xbtest.

  • The xbutil validate is an XRT utility that does basic checks to determine the card is installed and operating correctly. It does not test the power envelope. See Card Validation for additional details.

  • The xbtest utility provides extra card testing via different host applications and additional test kernels. This application will load up the FPGA to test on-card memory, host/card power delivery, and cooling. This test can help determine if the system is stable while the card is running accleration tasks. See the xbtest solutions page for details.

You Will Need

Before starting to test the card, gather data

  1. Check system compatibility

  2. Based on your card(s) and OS, download the correct version of xbtest from the xbtest solutions page.

  3. Run xbutil validate and review the output to confirm the card is operating normally.

  4. Confirm card(s) are compatible with the host machine

Common Cases


Power Requirement for Vitis™ based applications

Vitis™ based applications in addition to the xbtest application require cards be installed with PCIe AUX power connected (where applicable) to allow full card power. Without full power, these applications may not run correctly.

For cards with PCIe AUX power connector, follow the instructions in the installation guide to install the PCIe AUX cable.

The U200, U250, and U280 cards require 225W of power to run Vitis™ acceleration loads and xbtest. The U55C requires 115W.

You can display the maximum available card power using xbmgmt examine command as shown below. If Max power is less than the maximum required, the server will not provide the power needed for the application.

In this example, the Max power is 150W, however the U200 card requires 225W and requires the 8-pin PCIe AUX power connector to be connected.

 sudo xbmgmt examine -d 17:00.0

-------------------------------------------------
[0000:17:00.0] : xilinx_u200_gen3x16_xdma_base_2
-------------------------------------------------
Flash properties
  Type                 : spi
  Serial Number        : 

Device properties
  Type                 : u200
  Name                 : AU200A64G
  Config Mode          : 7
  Max Power            : 150W

What to look for:

  • Look for the value reported by Max power

    • Should be 225W for U200, U250, and U280 cards being used in Vitis flows

    • Should be 115W or more for U55C card being used in Vitis flows

Next steps:


Card is in a suitable machine

Alveo cards have two different cooling solutions

  • Actively cooled - the card has a fan intended to cool the card when installed in a workstation

  • Passively cooled - the card depends on host chassis airflow for cooling.

See Determine active or passive card to determine if a card is actively or passively cooled. Passively cooled cards should only be placed in a server with sufficient airflow.

Next steps:

  • For active cards, go to xbtest for testing power below

  • For passive cards, confirm the following:

    • Server airflow meets card requirements

      • The card can be damaged when operated in a system without sufficient airflow

    • Lid needs to be on the server for testing

    • Go to xbtest for testing power


Shell not listed in xbtest downloads below

See xbtest solutions page for supported plaforms. To run xbtest, ensure the platform running on the card and system is supported.

If the platform is not supported, you need to update the system and card with a supported platform.

Next steps:


Card or system crashes during test

If the card overheats or demands more power than the host system provides, the test application, card, or system could crash.

Next Steps:


xbtest xclbin fails to load

The xbtest tests consist of a known good application and a set of known good accelerators, in xclbin format, that run on the card. If the accelerator fails to load, the test will fail with an error

  • Gen_029 message indicating the xclbin is not compatible with the platform on the card

The message can be seen in the example below:

~]$ xbtest -c power -d 0
INFO     : GENERAL      : GEN_016: Scanning xbtest libraries...
FAILURE  : GENERAL      : GEN_029: Could not find an xclbin compatible with target device at device index 0,
identified by interface_uuid = 4cda0ba9ab64b59c535adadf2e0b1930

Meaning:

  • There is a xbtest/platform mismatch

Next Steps:

  • Confirm the right card and platform are being targeted

  • Re-install the development platform for the card from the Alveo landing page

Appendix

xbtest for Testing Power

The xbtest stress test syntax changes from release to release. A version 4 cheat sheet follows:

  • Install xbtest following the directions in 1361

  • source /opt/xilinx/xrt/setup.(c)sh

  • source /opt/xilinx/xbtest/setup.(c)sh

  • Use xbutil list to determine the card device ID for the card

  • Run the predefined stress test with- xbtest -c stress -d (device ID)
    Note: xbtest version 4 still uses device ID syntax (for example:-d 3). It does not accept card BDF syntax (for example:-d a3:00.0)

  • The test will ramp up the toggle rate, increasing power. As the test runs, xbtest will report card temperature and power

STATUS   : POWER        : PWR_045:       94 sec. remaining; 76C; Power: raw 47.5W; Toggle Rate: 40.0%
STATUS   : POWER        : PWR_045:       93 sec. remaining; 76C; Power: raw 47.5W; Toggle Rate: 40.0%
  • Monitor power and temperature

  • Interrupt the test with CTRL+C if card temperature exceeds 95C or if the temperature suddenly falls.

  • At the end of the test, a pass or fail is indicated.

    • A test pass shows card is good for acceleration loads.

    • A failure indicates there may be an issue, follow normal escalation procedures.

  • For more details download and review the User Guide from the xbtest solutions page.


Xilinx Support

For additional support resources such as Answers, Documentation, Downloads, and Alerts, see the Xilinx Support pages. For additional assistance, post your question on the Xilinx Community Forums – Alveo Accelerator Card.

Have a suggestion, or found an issue please send an email to alveo_cards_debugging@xilinx.com .

License

All software including scripts in this distribution are licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License.

You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0

All images and documentation, including all debug and support documentation, are licensed under the Creative Commons (CC) Attribution 4.0 International License (the “CC-BY-4.0 License”); you may not use this file except in compliance with the CC-BY-4.0 License.

You may obtain a copy of the CC-BY-4.0 License at https://creativecommons.org/licenses/by/4.0/

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

XD027 | © Copyright 2021 Xilinx, Inc.