Alveo Debug Guide

# XRT Troubleshooting The Xilinx Runtime library [(XRT)](terminology.html#xrt) is an open-source easy to use software stack that facilitates management and usage of Alveo accelerator cards. Users use familiar programming languages like C/C++ or Python to write host code which uses XRT to interact with the FPGA on the Alveo card. For more information on XRT, please go to the [XRT Github page](https://xilinx.github.io/XRT/master/html/index.html). If you are just starting to debug, please consult the [main page](../README.md) to determine if this is the best page for your purposes. ## This Page Covers This page covers issues users have reported when using XRT. If your issue is not covered, please post on the [Xilinx forums](https://support.xilinx.com/s/topic/0TO2E000000YKXlWAO/alveo-accelerator-cards). ## You Will Need Before beginning debug, you need to: - Have [root/sudo permissions](common-steps.html#root-sudo-access) - [Confirm System Compatibility](check-system-compatibility.md) ## Common Cases  - - - ### Driver build succeeds During [XRT](terminology.html#xrt) package installation, the XRT drivers are compiled and added to the linux kernel. Look for messages that the drivers are built and registered with the linux kernel. The following example is the expected output for a successful installation:
``` Building initial module for 4.18.0-240.el8.x86_64 Done. xocl.ko.xz: Running module version sanity check. - Original module - No original module exists within this kernel - Installation - Installing to /lib/modules/4.18.0-240.el8.x86_64/extra/ xclmgmt.ko.xz: Running module version sanity check. - Original module - No original module exists within this kernel - Installation - Installing to /lib/modules/4.18.0-240.el8.x86_64/extra/ Adding any weak-modules depmod...... DKMS: install completed. Finished DKMS common.postinst Loading new XRT Linux kernel modules Installing MSD / MPD daemons ``` In XRT versions 2022.2 and later, a table has been added at the end of the install: ``` | Components | Status | |------------------------------|--------------------| | XOCL & XCLMGMT Kernel Driver | Success | | XRT USERSPACE | Success | | MPD/MSD | Success | ``` - - - ### Install hits an issue while apt/yum are running If there are one or more error messages while apt or yum are running the install of [XRT](terminology.html#xrt) or the installation fails without an error message, there is an issue with the package manager. Next step: - Go to [Package manager](package-manager.md) * Install issues may need local IT involvement - - - ### xdma driver on the system If the xdma driver is installed and talking to the Alveo card this prevents [XRT](terminology.html#xrt) from talking to the card. This can be confirmed either with the `lsmod | grep xdma` command showing the xdma driver running or `lspci` showing the xdma driver attached to the card. Examples of the output with each command are shown below. - `lsmod` would show ``` :~> lsmod | grep xdma xdma 194893 ``` - `lspci` would show ``` db:00.0 Memory controller: Xilinx Corporation Device 6987 Subsystem: Xilinx Corporation Device 1351 Flags: bus master, fast devsel, latency 0, NUMA node 1 Memory at 387f02000000 (64-bit, prefetchable) [size=32M] Memory at 387f04010000 (64-bit, prefetchable) [size=64K] Capabilities: Kernel driver in use: xdma Kernel modules: xdma, xclmgmt ``` Next step: - Reload the drivers via the following commands - `modprobe -r xdma` - `modprobe xclmgmt` - `modprobe xocl` - - - ### XRT reports unknown driver version If [XRT](terminology.html#xrt) shows that `XOCL: unknown` or `XCLMGMT: unknown`, XRT is not seeing the driver kernel modules. Without the drivers loaded, XRT will not be able to communicate with the card. This can be seen with the command `xbutil examine` below: ``` :~> xbutil examine System Configuration OS Name : Linux Release : 5.15.0-50-generic Version : #56-Ubuntu SMP Tue Sep 20 13:23:26 UTC 2022 Machine : x86_64 CPU Cores : 12 Memory : 46901 MB Distribution : Ubuntu 22.04.1 LTS GLIBC : 2.35 Model : PowerEdge R740 XRT Version : 2.15.225 Branch : 2023.1 Hash : adf27adb3cfadc6e4c41d6db814159f1329b24f3 Hash Date : 2023-05-03 10:13:19 XOCL : unknown, unknown XCLMGMT : unknown, unknown WARNING: xclmgmt version is unknown. Is xclmgmt driver loaded? Or is MSD/MPD running? ``` Next steps: - See if the drivers are present on the system with `lsmod` * `lsmod | grep xocl` * `lsmod | grep xclmgmt` - If both drivers are present [Reload the XRT drivers](common-steps.html#unload-reload-xrt-drivers) - If a driver is missing go to [Driver not installed into Kernel](#drivers-not-installed-into-kernel) (below) - - - ### Drivers not installed into kernel If the [XRT](terminology.html#xrt) drivers aren't visible on `lsmod` with the `lsmod | grep xocl` and `lsmod | grep xclmgmt` commands, there was an issue with the installation. During XRT package installation, the XRT drivers are compiled and added to the linux kernel. Review the installation messages in the console for messages like those below: ``` DKMS failed to install XRT drivers. ``` or ``` Building for 5.11.0-25-generic Building initial module for 5.11.0-25-generic Error! Build of xocl.ko failed for: 5.11.0-25-generic (x86_64) Consult the make.log in the build directory /var/lib/dkms/xrt/2.12.0/build/ for more information. **************************************************************** * DKMS failed to install XRT drivers. * Please check if kernel development headers are installed for OS variant used. * * Check build logs in /var/lib/dkms/xrt/2.12.0 **************************************************************** ``` Note: the tail end of the install messaging will look OK. XRT installation does not report errors at the end. ``` Installed: xrt-2.12.385-1.x86_64 Complete! ``` In XRT versions 2022.2 and later, a table has been added at the end of the install and would look like below in a failed install: ``` | Components | Status | |------------------------------|--------------------| | XOCL & XCLMGMT Kernel Driver | Failed. Check build log : /var/lib/dkms/xrt/2.14.354/build/make.log | XRT USERSPACE | Success | | MPD/MSD | Success | ``` Next steps: - Confirm the XRT version is supported by this OS release * [Determine the linux release](common-steps.html#determine-linux-release) * [Determine kernel headers](common-steps.html#determine-linux-kernel-and-header-information) match the release, and are installed correctly * Check the XRT release notes ([UG1451](https://docs.xilinx.com/r/en-US/ug1451-xrt-release-notes)) for the OS support list - [Remove XRT](common-steps.html#remove-xrt) - Download the latest XRT from the [Alveo landing page](https://www.xilinx.com/products/boards-and-kits/alveo.html) - Re-install XRT * Check for dependency issues * For RH/Centos, follow the steps on the [Xilinx Runtime install](https://xilinx.github.io/XRT/master/html/install.html) to enable the optional-rpms, or epel-release - If issue persists capture error messages and post on the [Xilinx forums](https://support.xilinx.com/s/topic/0TO2E000000YKXlWAO/alveo-accelerator-cards). - - - ### No card is found If the `xbmgmt examine` command does not recognize the platform on the card it displays `0 devices present` as shown below. ``` :~> xbmgmt examine System Configuration OS Name : Linux Release : 5.15.0-50-generic Version : #56-Ubuntu SMP Tue Sep 20 13:23:26 UTC 2022 Machine : x86_64 CPU Cores : 12 Memory : 46901 MB Distribution : Ubuntu 22.04.1 LTS GLIBC : 2.35 Model : PowerEdge R740 XRT Version : 2.11.634 Branch : 2021.1 Hash : 5ad5998d67080f00bca5bf15b3838cf35e0a7b26 Hash Date : 2021-06-08 22:08:45 XOCL : 2.11.634, 5ad5998d67080f00bca5bf15b3838cf35e0a7b26 XCLMGMT : 2.11.634, 5ad5998d67080f00bca5bf15b3838cf35e0a7b26 Devices present 0 devices found BDF : Shell Platform UUID Device ID Device Ready* -------------------------------------------------------- * Devices that are not ready will have reduced functionality when using XRT tools ``` If this occurs and `lspci` does recognize the card (displays `Kernel driver in use: xclmgmt` shown below), there is a communication issue between XRT and the card. Example output below: ``` :~> sudo lspci -vd 10ee: 83:00.0 Processing accelerators: Xilinx Corporation Device 5020 Subsystem: Xilinx Corporation Device 000e Physical Slot: 4 Flags: bus master, fast devsel, latency 0, NUMA node 1 Memory at 1232000000 (64-bit, prefetchable) [size=32M] Memory at 1234020000 (64-bit, prefetchable) [size=128K] Capabilities: [40] Power Management version 3 Capabilities: [60] MSI-X: Enable+ Count=32 Masked- Capabilities: [70] Express Endpoint, MSI 00 Capabilities: [100] Advanced Error Reporting Capabilities: [1c0] #19 Capabilities: [e00] Access Control Services Capabilities: [e10] #15 Capabilities: [e80] Vendor Specific Information: ID=0020 Rev=0 Len=010 Kernel driver in use: xclmgmt Kernel modules: xclmgmt ``` Next steps: - If card was last used in a Vivado flow, XRT will not be able to communicate to the card. Revert the card to the golden image using [AR 71757](https://www.xilinx.com/support/answers/71757.html). - If card wasn't used in Vivado flow, [Remove XRT](common-steps.html#remove-xrt) - Download the latest XRT from the [Alveo landing page](https://www.xilinx.com/products/boards-and-kits/alveo.html) - Re-install XRT - Install the desired deployment platform - Use the [Confirm XRT/Platform compatibility](common-steps.html#confirm-xrt-platform-compatibility) to ensure the platform and XRT are compatible - [Determine platform on card and system](common-steps.html#display-card-and-host-platform-and-sc-versions) - If the card does not have the desired platform [Flash the card with the deployment platform](common-steps.html#flash-the-card-with-a-deployment-platform) - - - ### Nothing else has worked - [Unload and reload the XRT drivers](common-steps.html#unload-reload-xrt-drivers) - Look for similar issues on [Xilinx forums](https://support.xilinx.com/s/topic/0TO2E000000YKXlWAO/alveo-accelerator-cards) - [Log machine state](common-steps.html#log-machine-state) and post on the [Xilinx forums](https://support.xilinx.com/s/topic/0TO2E000000YKXlWAO/alveo-accelerator-cards) - - - ### Xilinx Support For additional support resources such as Answers, Documentation, Downloads, and Alerts, see the [Xilinx Support pages](http://www.xilinx.com/support). For additional assistance, post your question on the Xilinx Community Forums – [Alveo Accelerator Card](https://support.xilinx.com/s/topic/0TO2E000000YKXlWAO/alveo-accelerator-cards). Have a suggestion, or found an issue please send an email to alveo_cards_debugging@xilinx.com . ### License All software including scripts in this distribution are licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at [http://www.apache.org/licenses/LICENSE-2.0](http://www.apache.org/licenses/LICENSE-2.0) All images and documentation, including all debug and support documentation, are licensed under the Creative Commons (CC) Attribution 4.0 International License (the "CC-BY-4.0 License"); you may not use this file except in compliance with the CC-BY-4.0 License. You may obtain a copy of the CC-BY-4.0 License at [https://creativecommons.org/licenses/by/4.0/]( https://creativecommons.org/licenses/by/4.0/) Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

XD027 | © Copyright 2021 Xilinx, Inc.