Alveo Debug Guide |
Power Delivery¶
This page will help guide you through steps to enure that your Alveo™ card can work in a system under load.
This Page Covers¶
Testing the power delivery to one or more Alveo cards
Xilinx has two test tools, xbutil validate and xbtest
.
The xbutil validate is an XRT utility that does basic checks to determine the card is installed and operating correctly. It does not test the power envelope. See Card Validation for additional details.
The
xbtest
utility provides extra card testing via different host applications and additional test kernels. This application will load up the FPGA to test on-card memory, host/card power delivery, and cooling. This test can help determine if the system is stable while the card is running accleration tasks. See the xbtest solutions page for details.
You Will Need¶
One or more Alveo cards
Before starting to test the card, gather data
Based on your card(s) and OS, download the correct version of xbtest from the xbtest solutions page.
Run xbutil validate and review the output to confirm the card is operating normally.
If there are errors, go to Card Validation and resolve.
Confirm card(s) are compatible with the host machine
Only test a passive card in server system with sufficient airflow.
If a passive card is not in a qualified host and card combination, you may need to manually increase server airflow by turning up the fan speed for the tests to pass.
Common Cases¶
Power Requirement for Vitis™ based applications¶
Vitis™ based applications in addition to the xbtest
application require cards be installed with PCIe AUX power connected (where applicable) to allow full card power. Without full power, these applications may not run correctly.
For cards with PCIe AUX power connector, follow the instructions in the installation guide to install the PCIe AUX cable.
The U200, U250, and U280 cards require 225W of power to run Vitis™ acceleration loads and xbtest
. The U55C requires 115W.
You can display the maximum available card power using xbmgmt examine command as shown below. If Max power
is less than the maximum required, the server will not provide the power needed for the application.
In this example, the Max power
is 150W, however the U200 card requires 225W and requires the 8-pin PCIe AUX power connector to be connected.
sudo xbmgmt examine -d 17:00.0
-------------------------------------------------
[0000:17:00.0] : xilinx_u200_gen3x16_xdma_base_2
-------------------------------------------------
Flash properties
Type : spi
Serial Number :
Device properties
Type : u200
Name : AU200A64G
Config Mode : 7
Max Power : 150W
What to look for:
Look for the value reported by
Max power
Should be
225W
for U200, U250, and U280 cards being used in Vitis flowsShould be
115W
or more for U55C card being used in Vitis flows
Next steps:
Delay testing until sufficient power can be supplied.
Safely hook up PCIe AUX power, see the Card installation guides
Card is in a suitable machine¶
Alveo cards have two different cooling solutions
Actively cooled - the card has a fan intended to cool the card when installed in a workstation
Passively cooled - the card depends on host chassis airflow for cooling.
See Determine active or passive card to determine if a card is actively or passively cooled. Passively cooled cards should only be placed in a server with sufficient airflow.
Next steps:
For active cards, go to xbtest for testing power below
For passive cards, confirm the following:
Server airflow meets card requirements
The card can be damaged when operated in a system without sufficient airflow
Lid needs to be on the server for testing
Go to xbtest for testing power
Shell not listed in xbtest downloads below¶
See xbtest solutions page for supported plaforms. To run xbtest, ensure the platform running on the card and system is supported.
If the platform is not supported, you need to update the system and card with a supported platform.
Next steps:
Download one of the supported platforms from the Alveo landing page
Follow installation directions for modifying an XRT install making sure both XRT and xbtest support the platform.
Card or system crashes during test¶
If the card overheats or demands more power than the host system provides, the test application, card, or system could crash.
Next Steps:
Monitor the power and temperature to make sure the card is operating within temperature and power limits.
xbtest xclbin fails to load¶
The xbtest tests consist of a known good application and a set of known good accelerators, in xclbin format, that run on the card. If the accelerator fails to load, the test will fail with an error
Gen_029
message indicating the xclbin is not compatible with the platform on the card
The message can be seen in the example below:
~]$ xbtest -c power -d 0
INFO : GENERAL : GEN_016: Scanning xbtest libraries...
FAILURE : GENERAL : GEN_029: Could not find an xclbin compatible with target device at device index 0,
identified by interface_uuid = 4cda0ba9ab64b59c535adadf2e0b1930
Meaning:
There is a xbtest/platform mismatch
Next Steps:
Confirm the right card and platform are being targeted
Re-install the development platform for the card from the Alveo landing page
Appendix¶
xbtest for Testing Power¶
The xbtest stress test syntax changes from release to release. A version 4 cheat sheet follows:
Install xbtest following the directions in 1361
source /opt/xilinx/xrt/setup.(c)sh
source /opt/xilinx/xbtest/setup.(c)sh
Use
xbutil list
to determine the card device ID for the cardRun the predefined stress test with-
xbtest -c stress -d (device ID)
Note: xbtest version 4 still uses device ID syntax (for example:-d 3
). It does not accept card BDF syntax (for example:-d a3:00.0
)The test will ramp up the toggle rate, increasing power. As the test runs, xbtest will report card temperature and power
STATUS : POWER : PWR_045: 94 sec. remaining; 76C; Power: raw 47.5W; Toggle Rate: 40.0%
STATUS : POWER : PWR_045: 93 sec. remaining; 76C; Power: raw 47.5W; Toggle Rate: 40.0%
Interrupt the test with
CTRL+C
if card temperature exceeds 95C or if the temperature suddenly falls.At the end of the test, a pass or fail is indicated.
A test pass shows card is good for acceleration loads.
A failure indicates there may be an issue, follow normal escalation procedures.
For more details download and review the User Guide from the xbtest solutions page.
Xilinx Support¶
For additional support resources such as Answers, Documentation, Downloads, and Alerts, see the Xilinx Support pages. For additional assistance, post your question on the Xilinx Community Forums – Alveo Accelerator Card.
Have a suggestion, or found an issue please send an email to alveo_cards_debugging@xilinx.com .
License¶
All software including scripts in this distribution are licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License.
You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
All images and documentation, including all debug and support documentation, are licensed under the Creative Commons (CC) Attribution 4.0 International License (the “CC-BY-4.0 License”); you may not use this file except in compliance with the CC-BY-4.0 License.
You may obtain a copy of the CC-BY-4.0 License at https://creativecommons.org/licenses/by/4.0/
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
XD027 | © Copyright 2021 Xilinx, Inc.