Alveo Debug Guide

XRT Troubleshooting

The Xilinx Runtime library (XRT) is an open-source easy to use software stack that facilitates management and usage of Alveo accelerator cards. Users use familiar programming languages like C/C++ or Python to write host code which uses XRT to interact with the FPGA on the Alveo card. For more information on XRT, please go to the XRT Github page. If you are just starting to debug, please consult the main page to determine if this is the best page for your purposes.

This Page Covers

This page covers issues users have reported when using XRT. If your issue is not covered, please post on the Xilinx forums.

You Will Need

Before beginning debug, you need to:

Common Cases


Driver build succeeds

During XRT package installation, the XRT drivers are compiled and added to the linux kernel. Look for messages that the drivers are built and registered with the linux kernel.

The following example is the expected output for a successful installation:

Building initial module for 4.18.0-240.el8.x86_64
Done.

xocl.ko.xz:
Running module version sanity check.
 - Original module
   - No original module exists within this kernel
 - Installation
   - Installing to /lib/modules/4.18.0-240.el8.x86_64/extra/

xclmgmt.ko.xz:
Running module version sanity check.
 - Original module
   - No original module exists within this kernel
 - Installation
   - Installing to /lib/modules/4.18.0-240.el8.x86_64/extra/
Adding any weak-modules

depmod......

DKMS: install completed.
Finished DKMS common.postinst
Loading new XRT Linux kernel modules
Installing MSD / MPD daemons

In XRT versions 2022.2 and later, a table has been added at the end of the install:

| Components                   |      Status        |
|------------------------------|--------------------|
| XOCL & XCLMGMT Kernel Driver | Success            |
| XRT USERSPACE                | Success            |
| MPD/MSD                      | Success            |

Install hits an issue while apt/yum are running

If there are one or more error messages while apt or yum are running the install of XRT or the installation fails without an error message, there is an issue with the package manager.

Next step:


xdma driver on the system

If the xdma driver is installed and talking to the Alveo card this prevents XRT from talking to the card. This can be confirmed either with the lsmod | grep xdma command showing the xdma driver running or lspci showing the xdma driver attached to the card. Examples of the output with each command are shown below.

  • lsmod would show

:~> lsmod | grep xdma
xdma                 194893
  • lspci would show

db:00.0 Memory controller: Xilinx Corporation Device 6987
Subsystem: Xilinx Corporation Device 1351
Flags: bus master, fast devsel, latency 0, NUMA node 1
Memory at 387f02000000 (64-bit, prefetchable) [size=32M]
Memory at 387f04010000 (64-bit, prefetchable) [size=64K]
Capabilities: <access denied>
Kernel driver in use: xdma
Kernel modules: xdma, xclmgmt

Next step:

  • Reload the drivers via the following commands

    • modprobe -r xdma

    • modprobe xclmgmt

    • modprobe xocl


XRT reports unknown driver version

If XRT shows that XOCL: unknown or XCLMGMT: unknown, XRT is not seeing the driver kernel modules. Without the drivers loaded, XRT will not be able to communicate with the card. This can be seen with the command xbutil examine below:

:~> xbutil examine
System Configuration
  OS Name              : Linux
  Release              : 5.15.0-50-generic
  Version              : #56-Ubuntu SMP Tue Sep 20 13:23:26 UTC 2022
  Machine              : x86_64
  CPU Cores            : 12
  Memory               : 46901 MB
  Distribution         : Ubuntu 22.04.1 LTS
  GLIBC                : 2.35
  Model                : PowerEdge R740

XRT
  Version              : 2.15.225
  Branch               : 2023.1
  Hash                 : adf27adb3cfadc6e4c41d6db814159f1329b24f3
  Hash Date            : 2023-05-03 10:13:19
  XOCL                 : unknown, unknown
  XCLMGMT              : unknown, unknown
WARNING: xclmgmt version is unknown. Is xclmgmt driver loaded? Or is MSD/MPD running?

Next steps:


Drivers not installed into kernel

If the XRT drivers aren’t visible on lsmod with the lsmod | grep xocl and lsmod | grep xclmgmt commands, there was an issue with the installation. During XRT package installation, the XRT drivers are compiled and added to the linux kernel. Review the installation messages in the console for messages like those below:

DKMS failed to install XRT drivers.

or

Building for 5.11.0-25-generic
Building initial module for 5.11.0-25-generic
Error! Build of xocl.ko failed for: 5.11.0-25-generic (x86_64)
Consult the make.log in the build directory
/var/lib/dkms/xrt/2.12.0/build/ for more information.
****************************************************************
* DKMS failed to install XRT drivers.
* Please check if kernel development headers are installed for OS variant used.
*
* Check build logs in /var/lib/dkms/xrt/2.12.0
****************************************************************

Note: the tail end of the install messaging will look OK. XRT installation does not report errors at the end.

Installed:
  xrt-2.12.385-1.x86_64       

Complete!

In XRT versions 2022.2 and later, a table has been added at the end of the install and would look like below in a failed install:

| Components                   |      Status        |
|------------------------------|--------------------|
| XOCL & XCLMGMT Kernel Driver | Failed. Check build log : /var/lib/dkms/xrt/2.14.354/build/make.log
| XRT USERSPACE                | Success            |
| MPD/MSD                      | Success            |

Next steps:


No card is found

If the xbmgmt examine command does not recognize the platform on the card it displays 0 devices present as shown below.

:~> xbmgmt examine
System Configuration
  OS Name              : Linux
  Release              : 5.15.0-50-generic
  Version              : #56-Ubuntu SMP Tue Sep 20 13:23:26 UTC 2022
  Machine              : x86_64
  CPU Cores            : 12
  Memory               : 46901 MB
  Distribution         : Ubuntu 22.04.1 LTS
  GLIBC                : 2.35
  Model                : PowerEdge R740

XRT
  Version              : 2.11.634
  Branch               : 2021.1
  Hash                 : 5ad5998d67080f00bca5bf15b3838cf35e0a7b26
  Hash Date            : 2021-06-08 22:08:45
  XOCL                 : 2.11.634, 5ad5998d67080f00bca5bf15b3838cf35e0a7b26
  XCLMGMT              : 2.11.634, 5ad5998d67080f00bca5bf15b3838cf35e0a7b26

Devices present
  0 devices found
BDF  :  Shell  Platform UUID  Device ID  Device Ready*  
--------------------------------------------------------


* Devices that are not ready will have reduced functionality when using XRT tools

If this occurs and lspci does recognize the card (displays Kernel driver in use: xclmgmt shown below), there is a communication issue between XRT and the card. Example output below:

:~> sudo lspci -vd 10ee:
83:00.0 Processing accelerators: Xilinx Corporation Device 5020
        Subsystem: Xilinx Corporation Device 000e
        Physical Slot: 4
        Flags: bus master, fast devsel, latency 0, NUMA node 1
        Memory at 1232000000 (64-bit, prefetchable) [size=32M]
        Memory at 1234020000 (64-bit, prefetchable) [size=128K]
        Capabilities: [40] Power Management version 3
        Capabilities: [60] MSI-X: Enable+ Count=32 Masked-
        Capabilities: [70] Express Endpoint, MSI 00
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [1c0] #19
        Capabilities: [e00] Access Control Services
        Capabilities: [e10] #15
        Capabilities: [e80] Vendor Specific Information: ID=0020 Rev=0 Len=010 <?>
        Kernel driver in use: xclmgmt
        Kernel modules: xclmgmt

Next steps:


Nothing else has worked


Xilinx Support

For additional support resources such as Answers, Documentation, Downloads, and Alerts, see the Xilinx Support pages. For additional assistance, post your question on the Xilinx Community Forums – Alveo Accelerator Card.

Have a suggestion, or found an issue please send an email to alveo_cards_debugging@xilinx.com .

License

All software including scripts in this distribution are licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License.

You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0

All images and documentation, including all debug and support documentation, are licensed under the Creative Commons (CC) Attribution 4.0 International License (the “CC-BY-4.0 License”); you may not use this file except in compliance with the CC-BY-4.0 License.

You may obtain a copy of the CC-BY-4.0 License at https://creativecommons.org/licenses/by/4.0/

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

XD027 | © Copyright 2021 Xilinx, Inc.