Alveo Debug Guide
This page will help guide you through steps to enure that your Alveo™ card can work in a system under load.
This Page Covers¶
Testing the power delivery to one or more Alveo cards
Xilinx has two test tools,
xbutil validate and
xbutil validateis an XRT utility that does basic checks to determine the card is installed and operating correctly. It does not test the power envelope. See Card Validation for additional details.
xbtestprovides extra card testing via different a host application and additional test kernels. This application will load up the FPGA to test on card memory, host/card power delivery, and cooling. This test can help determine if the system is stable while the card is running accleration tasks. See the xbtest solutions page for details.
You Will Need¶
Before starting to test the card, gather data
Based on your cards and OS download the correct version of xbtest from xbtest solutions page.
xbutil validateand review the output to confirm the card is operating normally.
If there are errors go to Card Validation and resolve.
Confirm card(s) are compatible with the host machine
A U200, U250, or U280 has less than 225W power¶
The U200, U250, and U280 cards require 225W power to run Vitis™ acceleration loads and
Max power level: is < 225W the server will not provide the power needed for the test.
You can confirm the maximum power level by using
xbmgmt flash --scan --verbose as given below. In this example, the card has insufficient power.
sudo /opt/xilinx/xrt/bin/xbmgmt flash --scan --verbose Card [0000:03:00.0] Card type: u250 Flash type: SPI Flashable partition running on FPGA: xilinx_u250_xdma_201830_3,[ID=0x5eeb5a43],[SC=4.3.9] ... Max power level: 150W
What to look for:
Look for the value reported by
Max power level:
225Wfor U200, U250, and U280 cards being used in Vitis flows
Delay testing until sufficient power.
Safely hook up 8 pin AUX power, see the Card installation guide
Card is in a suitable machine¶
Alveo cards have two different cooling solutions
Actively cooled - the card has a fan intended to cool the card in a workstation
Passively cooled - the card depends on host chassis airflow for cooling.
Passivley cooled cards should only be placed in a server with sufficient airflow.
~> sudo /opt/xilinx/xrt/bin/xbmgmt flash --scan --verbose Card [0000:03:00.0] Card type: u250 Flash type: SPI Flashable partition running on FPGA: xilinx_u250_xdma_201830_3,[ID=0x5eeb5a43],[SC=4.3.9] Flashable partitions installed in system: xilinx_u250_xdma_201830_3,[ID=0x5eeb5a43],[SC=4.3.9] ... Fan presence: P
What to look for:
Look for the value reported by Fan presence:
Fan presence: Ameans the card has a fan and is suitable for use in a workstation
Fan presence: Pmeans the card depends on server fans for cooling
Shell not listed in xbtest downloads below¶
If the platform is not supported you need to update the system and card with a supported platform.
Card or system crashes during test¶
If the card overheats or uses more power that the system provides the test application, card, or system could crash.
Monitor the power and temperature to make sure the card is operating within temperature and power limits.
xbtest xclbin fails to load¶
The xbtest tests consist of a known good application and a set of known good accelerators, in xclbin format, that run on the card. If the accelerator fails to load, the test will fail with an error
Gen_029message indicating the xclbin is not compatible with the platform on the card
The message can be seen in the example below:
~]$ xbtest -c power -d 0 INFO : GENERAL : GEN_016: Scanning xbtest libraries... FAILURE : GENERAL : GEN_029: Could not find an xclbin compatible with target device at device index 0, identified by interface_uuid = 4cda0ba9ab64b59c535adadf2e0b1930
There is a xbtest/platform mismatch
Confirm the you are testing the right card and platform
Re-install the development platform for the card from the Alveo landing page
xbtest for Testing Power¶
The xbtest stress test syntax changes from release to release. A version 4 cheat sheet follows:
Install xbtest following the directions in 1361
xbutil listto determine the card device ID for the card
Run the predefined stress test with-
xbtest -c stress -d (device ID)
Note: xbtest version 4 still uses device ID syntax (for example:
-d 3). It does not accept card BDF syntax (for example:
The test will ramp up the toggle rate, increasing power. As the test runs xbtest will report card temperature and power
STATUS : POWER : PWR_045: 94 sec. remaining; 76C; Power: raw 47.5W; Toggle Rate: 40.0% STATUS : POWER : PWR_045: 93 sec. remaining; 76C; Power: raw 47.5W; Toggle Rate: 40.0%
Interrupt the test with
CTRL+Cif card temperature goes over 95C or if the temperature suddenly falls.
At the end there will be a test pass or fail.
A test pass shows card is good for acceleration loads.
A failure indicates there may be an issue, follow normal escalation procedures.
For more details download and review the User Guide from the xbtest solutions page.
For additional support resources such as Answers, Documentation, Downloads, and Alerts, see the Xilinx Support pages. For additional assistance, post your question on the Xilinx Community Forums – Alveo Accelerator Card.
Have a suggestion, or found an issue please send an email to email@example.com .
All software including scripts in this distribution are licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License.
You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
All images and documentation, including all debug and support documentation, are licensed under the Creative Commons (CC) Attribution 4.0 International License (the “CC-BY-4.0 License”); you may not use this file except in compliance with the CC-BY-4.0 License.
You may obtain a copy of the CC-BY-4.0 License at https://creativecommons.org/licenses/by/4.0/
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
XD027 | © Copyright 2021 Xilinx, Inc.