Alveo Debug Guide |
XRT Troubleshooting¶
The Xilinx Runtime library (XRT) is an open-source easy to use software stack that facilitates management and usage of Alveo accelerator cards. Users use familiar programming languages like C/C++ or Python to write host code which uses XRT to interact with the FPGA on the Alveo card. For more information on XRT, please go to the XRT Github page. If you are just starting to debug, please consult the main page to determine if this is the best page for your purposes.
This Page Covers¶
This page covers issues users have reported when using XRT. If your issue is not covered, please post on the Xilinx forums.
You Will Need¶
Before beginning debug, you need to:
Common Cases¶
Driver build succeeds¶
During XRT package installation, the XRT drivers are compiled and added to the linux kernel. Look for messages that the drivers are built and registered with the linux kernel.
The following example is the expected output for a successful installation:
Building initial module for 4.18.0-240.el8.x86_64
Done.
xocl.ko.xz:
Running module version sanity check.
- Original module
- No original module exists within this kernel
- Installation
- Installing to /lib/modules/4.18.0-240.el8.x86_64/extra/
xclmgmt.ko.xz:
Running module version sanity check.
- Original module
- No original module exists within this kernel
- Installation
- Installing to /lib/modules/4.18.0-240.el8.x86_64/extra/
Adding any weak-modules
depmod......
DKMS: install completed.
Finished DKMS common.postinst
Loading new XRT Linux kernel modules
Installing MSD / MPD daemons
In XRT versions 2022.2 and later, a table has been added at the end of the install:
| Components | Status |
|------------------------------|--------------------|
| XOCL & XCLMGMT Kernel Driver | Success |
| XRT USERSPACE | Success |
| MPD/MSD | Success |
Install hits an issue while apt/yum are running¶
If there are one or more error messages while apt or yum are running the install of XRT or the installation fails without an error message, there is an issue with the package manager.
Next step:
Go to Package manager
Install issues may need local IT involvement
xdma driver on the system¶
If the xdma driver is installed and talking to the Alveo card this prevents XRT from talking to the card. This can be confirmed either with the lsmod | grep xdma
command showing the xdma driver running or lspci
showing the xdma driver attached to the card. Examples of the output with each command are shown below.
lsmod
would show
:~> lsmod | grep xdma
xdma 194893
lspci
would show
db:00.0 Memory controller: Xilinx Corporation Device 6987
Subsystem: Xilinx Corporation Device 1351
Flags: bus master, fast devsel, latency 0, NUMA node 1
Memory at 387f02000000 (64-bit, prefetchable) [size=32M]
Memory at 387f04010000 (64-bit, prefetchable) [size=64K]
Capabilities: <access denied>
Kernel driver in use: xdma
Kernel modules: xdma, xclmgmt
Next step:
Reload the drivers via the following commands
modprobe -r xdma
modprobe xclmgmt
modprobe xocl
XRT reports unknown driver version¶
If XRT shows that XOCL: unknown
or XCLMGMT: unknown
, XRT is not seeing the driver kernel modules. Without the drivers loaded, XRT will not be able to communicate with the card. This can be seen with the command xbutil examine
below:
:~> xbutil examine
System Configuration
OS Name : Linux
Release : 5.15.0-50-generic
Version : #56-Ubuntu SMP Tue Sep 20 13:23:26 UTC 2022
Machine : x86_64
CPU Cores : 12
Memory : 46901 MB
Distribution : Ubuntu 22.04.1 LTS
GLIBC : 2.35
Model : PowerEdge R740
XRT
Version : 2.15.225
Branch : 2023.1
Hash : adf27adb3cfadc6e4c41d6db814159f1329b24f3
Hash Date : 2023-05-03 10:13:19
XOCL : unknown, unknown
XCLMGMT : unknown, unknown
WARNING: xclmgmt version is unknown. Is xclmgmt driver loaded? Or is MSD/MPD running?
Next steps:
See if the drivers are present on the system with
lsmod
lsmod | grep xocl
lsmod | grep xclmgmt
If both drivers are present Reload the XRT drivers
If a driver is missing go to Driver not installed into Kernel (below)
Drivers not installed into kernel¶
If the XRT drivers aren’t visible on lsmod
with the lsmod | grep xocl
and lsmod | grep xclmgmt
commands, there was an issue with the installation. During XRT package installation, the XRT drivers are compiled and added to the linux kernel. Review the installation messages in the console for messages like those below:
DKMS failed to install XRT drivers.
or
Building for 5.11.0-25-generic
Building initial module for 5.11.0-25-generic
Error! Build of xocl.ko failed for: 5.11.0-25-generic (x86_64)
Consult the make.log in the build directory
/var/lib/dkms/xrt/2.12.0/build/ for more information.
****************************************************************
* DKMS failed to install XRT drivers.
* Please check if kernel development headers are installed for OS variant used.
*
* Check build logs in /var/lib/dkms/xrt/2.12.0
****************************************************************
Note: the tail end of the install messaging will look OK. XRT installation does not report errors at the end.
Installed:
xrt-2.12.385-1.x86_64
Complete!
In XRT versions 2022.2 and later, a table has been added at the end of the install and would look like below in a failed install:
| Components | Status |
|------------------------------|--------------------|
| XOCL & XCLMGMT Kernel Driver | Failed. Check build log : /var/lib/dkms/xrt/2.14.354/build/make.log
| XRT USERSPACE | Success |
| MPD/MSD | Success |
Next steps:
Confirm the XRT version is supported by this OS release
Determine kernel headers match the release, and are installed correctly
Check the XRT release notes (UG1451) for the OS support list
Download the latest XRT from the Alveo landing page
Re-install XRT
Check for dependency issues
For RH/Centos, follow the steps on the Xilinx Runtime install to enable the optional-rpms, or epel-release
If issue persists capture error messages and post on the Xilinx forums.
No card is found¶
If the xbmgmt examine
command does not recognize the platform on the card it displays 0 devices present
as shown below.
:~> xbmgmt examine
System Configuration
OS Name : Linux
Release : 5.15.0-50-generic
Version : #56-Ubuntu SMP Tue Sep 20 13:23:26 UTC 2022
Machine : x86_64
CPU Cores : 12
Memory : 46901 MB
Distribution : Ubuntu 22.04.1 LTS
GLIBC : 2.35
Model : PowerEdge R740
XRT
Version : 2.11.634
Branch : 2021.1
Hash : 5ad5998d67080f00bca5bf15b3838cf35e0a7b26
Hash Date : 2021-06-08 22:08:45
XOCL : 2.11.634, 5ad5998d67080f00bca5bf15b3838cf35e0a7b26
XCLMGMT : 2.11.634, 5ad5998d67080f00bca5bf15b3838cf35e0a7b26
Devices present
0 devices found
BDF : Shell Platform UUID Device ID Device Ready*
--------------------------------------------------------
* Devices that are not ready will have reduced functionality when using XRT tools
If this occurs and lspci
does recognize the card (displays Kernel driver in use: xclmgmt
shown below), there is a communication issue between XRT and the card. Example output below:
:~> sudo lspci -vd 10ee:
83:00.0 Processing accelerators: Xilinx Corporation Device 5020
Subsystem: Xilinx Corporation Device 000e
Physical Slot: 4
Flags: bus master, fast devsel, latency 0, NUMA node 1
Memory at 1232000000 (64-bit, prefetchable) [size=32M]
Memory at 1234020000 (64-bit, prefetchable) [size=128K]
Capabilities: [40] Power Management version 3
Capabilities: [60] MSI-X: Enable+ Count=32 Masked-
Capabilities: [70] Express Endpoint, MSI 00
Capabilities: [100] Advanced Error Reporting
Capabilities: [1c0] #19
Capabilities: [e00] Access Control Services
Capabilities: [e10] #15
Capabilities: [e80] Vendor Specific Information: ID=0020 Rev=0 Len=010 <?>
Kernel driver in use: xclmgmt
Kernel modules: xclmgmt
Next steps:
If card was last used in a Vivado flow, XRT will not be able to communicate to the card. Revert the card to the golden image using AR 71757.
If card wasn’t used in Vivado flow, Remove XRT
Download the latest XRT from the Alveo landing page
Re-install XRT
Install the desired deployment platform
Use the Confirm XRT/Platform compatibility to ensure the platform and XRT are compatible
If the card does not have the desired platform Flash the card with the deployment platform
Nothing else has worked¶
Look for similar issues on Xilinx forums
Log machine state and post on the Xilinx forums
Xilinx Support¶
For additional support resources such as Answers, Documentation, Downloads, and Alerts, see the Xilinx Support pages. For additional assistance, post your question on the Xilinx Community Forums – Alveo Accelerator Card.
Have a suggestion, or found an issue please send an email to alveo_cards_debugging@xilinx.com .
License¶
All software including scripts in this distribution are licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License.
You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
All images and documentation, including all debug and support documentation, are licensed under the Creative Commons (CC) Attribution 4.0 International License (the “CC-BY-4.0 License”); you may not use this file except in compliance with the CC-BY-4.0 License.
You may obtain a copy of the CC-BY-4.0 License at https://creativecommons.org/licenses/by/4.0/
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
XD027 | © Copyright 2021 Xilinx, Inc.