Kria™ KV260 Vision AI Starter Kit NLP SmartVision Tutorial

Setting Up the Board and Application Deployment

Setting Up the Board and Application Deployment


This document shows how to set up the board, and run the nlp-smartvision application.

This guide and its prebuilt are targeted for Ubuntu® 22.04 and AMD 2022.1 toolchain. The previous version of this application (on AMD 2021.1 toolchain) targeted to PetaLinux is still available online.

Booting Up Linux

Before continuing with NLP application specific instructions, if not yet done so, boot Linux with instructions from the Kria Starter Kit Linux boot page.

Setting Up the Board

Application Specific Hardware Setup

Besides the hardware configurations required in Kria Starter Kit Linux boot for booting Linux, AIBox application requires a 4k monitor to display up to four channels of 1080p video.

GitHub Logo

  • Monitor:

    The monitor for this application must support 1024x768 resolution. Before booting, connect the monitor to the board via the DP/HDMI port.

  • Camera sensors:

    This application supports the following three camera modules.

    • AR1335 sensor module in J7

    • Raspberry pi sensor module in J9

    • USB webcam in any of the available USB ports

    Install the required sensor modules in their respective locations.

  • USB Microphone:

    Connect the microphone to any of the USB ports. If you USB webcam has a build-in microphone, it will also acts as the USB microphone.

Downloading and Loading the Application Firmware

  1. Download the firmware.

    • Search the package feed for packages compatible with the KV260.

      ubuntu@kria:~$ sudo apt search xlnx-firmware-kv260
      Sorting... Done
      Full Text Search... Done
      xlnx-firmware-kv260-aibox-reid/jammy 0.1-0xlnx1 arm64
       FPGA firmware for Xilinx boards - kv260 aibox-reid application
      xlnx-firmware-kv260-benchmark-b4096/jammy 0.1-0xlnx1 arm64
       FPGA firmware for Xilinx boards - kv260 benchmark-b4096 application
      xlnx-firmware-kv260-defect-detect/jammy 0.1-0xlnx1 arm64
       FPGA firmware for Xilinx boards - kv260 defect-detect application
      xlnx-firmware-kv260-nlp-smartvision/jammy,now 0.1-0xlnx1 arm64 [installed]
       FPGA firmware for Xilinx boards - kv260 nlp-smartvision application
      xlnx-firmware-kv260-smartcam/jammy 0.1-0xlnx1 arm64
       FPGA firmware for Xilinx boards - kv260 smartcam application
    • Install the firmware binaries.

      sudo apt install xlnx-firmware-kv260-nlp-smartvision
  2. Dynamically load the application package.

    The firmware consists of a bitstream, device tree overlay (dtbo) file. The firmware is loaded dynamically on user request once Linux is fully booted. The xmutil utility can be used for that purpose.

    • After installing the firmware, execute xmutil listapps to verify that it is captured under the listapps function and to have dfx-mgrd re-scan and register all accelerators in the firmeware directory tree.

      sudo xmutil listapps
    • Switch to a different platform for a different application.

      When there is already another accelerator/firmware being activated apart from xlnx-app-kv260-pmod-rs485-test, unload it first, then switch to xlnx-app-kv260-nlp-smartvision.

      sudo xmutil unloadapp
      sudo xmutil loadapp kv260-nlp-smartvision

Docker Based Application Preparation

  1. Pull the 2022.1 Docker image for nlp-smartvision using the following command.

    docker pull xilinx/nlp-smartvision:2022.1

    Once the above step is done, you can check for the available images as follows.

    ubuntu@kria:~$ docker images -a
    REPOSITORY                                                                  TAG        IMAGE ID       CREATED                  SIZE
    xilinx/nlp-smartvision                                                      2022.1     3c16ce65624a   Less than a second ago   1.41GB

    The storage volume on the SD card can be limited with multiple Dockers. If there are space issues, use the following command to remove the existing container:

    docker rmi --force <other containers>
  2. This application features display output either on Ubuntu GUI or the entire monitor as mentioned in the follwoing table:

| S.No | Scenario   |      Display Sink      |  Note | Required Steps |
| 1 | User runs the Docker on a terminal opened on Ubuntu GUI |  ximagesink | Opens new window on Ubuntu GUI to display output | Disable access control for local docker with ```xhost +local:docker``` |
| 2 | User runs the Docker on serial port or SSH session |    kmssink   |   Uses entire monitor to display the output | Disable Ubuntu GUI with ```sudo xmutil desktop_disable``` |
> **NOTE:**	Once you are done with running the application, enable the access control with ```xhost -local:docker```.
  1. Launch the Docker using the following command. You should launch the NLP Docker container as user “ubuntu” and not as other user or “sudo”:

    docker run \
    --env="DISPLAY" \
    -h "xlnx-docker" \
    --env="XDG_SESSION_TYPE" \
    --net=host \
    --privileged \
    -v /tmp:/tmp \
    -v /dev:/dev \
    -v /sys:/sys \
    -v /etc/vart.conf:/etc/vart.conf \
    -v /lib/firmware/xilinx:/lib/firmware/xilinx \
    -v /run:/run \
    -it xilinx/nlp-smartvision:2022.1 bash

    It will launch the nlp-smartvision image in a new container.


Run the Application

There are two ways to interact with application: via Jupyter Notebook or command line.

Jupyter nNtebook

  • Launch the Jupyter Notebook with the root privilege using the following command:

   jupyter lab --allow-root --notebook-dir=/opt/xilinx/kv260-nlp-smartvision/share/notebooks/ --ip=<ip address> &

    // fill in ip-address from ifconfig, eth0

Output example:

[I 2021-08-02 15:54:31.141 LabApp] JupyterLab extension loaded from /usr/lib/python3.8/site-packages/jupyterlab
[I 2021-08-02 15:54:31.141 LabApp] JupyterLab application directory is /usr/share/jupyter/lab
[I 2021-08-02 15:54:31.164 ServerApp] jupyterlab | extension was successfully loaded.
[I 2021-08-02 15:54:31.166 ServerApp] Serving notebooks from local directory: /opt/xilinx/kv260-nlp-smartvision/share/notebooks/
[I 2021-08-02 15:54:31.166 ServerApp] Jupyter Server 1.2.1 is running at:
[I 2021-08-02 15:54:31.166 ServerApp]
[I 2021-08-02 15:54:31.166 ServerApp]  or
[I 2021-08-02 15:54:31.167 ServerApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 2021-08-02 15:54:31.186 ServerApp]

    To access the server, open this file in a browser:
    Or copy and paste one of these URLs:
  • You can access the server by opening the server URL from the previous steps with a Chrome browser.

    In the notebook, the usage of app and the commands needed to run live usecase are explained.

Command Line

This allows you to run the “nlp-smartvision” application on CLI. These are to be executed using the universal asynchronous receiver-transmitter (UART)/debug interface.

Run the following command to launch the application for live audio input via a USB microphone. You need to be silent for the first few seconds (2.5s apx.) for the application to dynamically decide the noise threshold value as per your input device and environment. Once you see the following message “Noise Threshold is set. You can start speaking the keywords now..” you are ready to start pronouncing any of the ten keywords (Yes, No, Off, On, Up, Down, Left, Right, Stop, Go).

Application Usage

Usage: nlp-smartvision [OPTION] [arg1] [arg2]

-h (or) --help                                    help
-m (or) --mipi <isp/rpi>                          test the application with live video from mipi cameras rpi(default)/isp
-u (or) --usb                                     test the application with live video from USB camera
-f (or) --file-audio  <testing_list>.txt          test the keyword spotting with audio files listed in the .txt file
-t (or) --test <sample_image> <model>             test the DPU with sample images. Input is Model and sample jpeg. Supported models are densebox_640_360, yolov2_voc_pruned_0_77 & plate_detect
-v (or) --verbose  

Run the App with Default MIPI Sensor (RPI)

nlp-smartvision -m


nlp-smartvision --mipi

The detected keyword will be displayed on the terminal, and the corresponding action on the input video stream will be displayed on the monitor, which is connected to the board through a DP/HDMI cable.

To print FPS along with the above application, use the -v or --verbose flag shown in the following command. The FPS is measured as average over 90 consecutive frames. Also the latency of the keywords spotting and action is printed while the keyword is detected.

nlp-smartvision -m -v

You should be able to see the video the camera is capturing on the monitor connected to the board.

  • The application starts with facedetect. When there is a face captured by the camera, there should be a blue bounding box drawn around the face, and the box should follow the movement of the face.

  • Speak the desired keyword into the microphone; the application will perform the following assigned tasks.

Keyword action mapping


  • The Google Command dataset has audio clips of one second durations. Thus, the expectation by KWS task is that one keyword is spoken within a duration of one second.

  • Stop command resets display setting, but does not change monitor on/off mode.

File Based Testing and Accuracy Measurement of KWS Only

NLP SmartVision provides a mode which is dedicated for testing the accuracy of keyword spotting (no vision task is running during this mode) on pre-recorded audio files. You need to provide audio files along with a text file that consists of paths to the audio files which are to be tested. The application expects the audio files to be grouped under folders with keyword as the folder name. Thus, the text file will consist of lines with keyword/*.wav paths corresponding to each audio file (example: yes/audio1.wav). For more details, refer Testing Accuracy on the Google Command Dataset and Testing Custom Input Audio Files.

The following command tests the audio files listed in the testing_list.txt file.

## Change your dircetory to the dircetory where you have the testing_list.txt file having proper paths to the audio files.
nlp-smartvision -f testing_list.txt


nlp-smartvision --file-audio testing_list.txt

Testing Accuracy on the Google Command Dataset

You can download the open source Google speech command dataset for testing the application in file input mode. This dataset consists of pre-recorded audio files for 30 keywords, and the audio files that are separated for testing are listed in the testing_list.txt file. Use the following commands to download and extract this dataset. These commands also create the datafiles that are required for testing the application with 10 keywords for which the model has been trained.

TIP: You can copy the following commands and create a single script. Then directly executes that script to do all the required steps one after the other.

mkdir Google_dataset
cd Google_dataset
tar -xf speech_commands_v0.01.tar.gz
mkdir keywords
mv -t ./keywords/ on off up down left right yes no stop go
sed -n -e '/down\//p; /go\//p; /left\//p; /no\//p; /off\//p; /on\//p; /right\//p; /stop\//p; /yes\//p; /up\//p ' testing_list.txt > ./keywords/testing_list.txt
find . -maxdepth 1 ! -name keywords -print0|xargs -0 rm -r --

These commands will create a directory with the name Google_dataset/keywords inside the current working directory.

NOTE: The commands might take few minutes (depending on the internet speed) to download and process the dataset.

Output after running the command for file based testing will also report the accuracy. Sample output on the Google Command Dataset is as follows.

Ground truth : yes            Predicted : yes
Ground truth : yes            Predicted : yes
Ground truth : yes            Predicted : yes
Number of keywords tested = 2552
Number of keywords detected correctly = 2383
Accuracy = 93.3777%

Testing Custom Input Audio Files

The application expects audio file names to be stored as keyword/audio_filename.wav format into the audio files list file. For example, a prerecorded audio file of keyword ‘yes’ needs to be listed as yes/file_001.wav. The application uses the main directory name (‘yes’ in this example) as ground truth to compare against the detected keyword. New line character must be placed after every audio file name to differentiate multiple audio files (even after the last file name). Moreover, the audio file needs to be copied to the SD card into the directory from which the application will be invoked. For example, /keywords/yes/file_001.wav.

The test audio files should have the following specifications.

  • Sampling rate: 16 kHz

  • Sample width: 16 bits per sample

  • Sample encoding: Little endian

  • Number of channels: 1 (mono)

  • Supported format: S16_LE (PCM signed 16-bit Little-endian)

  • Audio Length: 1 second

Image Based Testing of DPU Only

NLP SmartVision provides a mode which is dedicated for testing the Vision models on DPU (no KWS task is running during this mode) on image files. You need to provide image files along with the AI model that is under test.

The following command tests the image files:

nlp-smartvision -t <image.jpg/image.png> <model>

The command returns the metadata along with a jpg file containing a bounding box on the input image.

Files Structure of the Application

The application is installed as:

  • Binary File: => /opt/xilinx/kv260-nlp-smartvision/bin

Filename Description
nlp-smartvision Main application
  • Script File: => /opt/xilinx/kv260-nlp-smartvision/bin/

Filename Description Configures ISP media nodes to run 1024.768@RGB Configures RPI media nodes to run 1024.768@RGB Application executable
  • Jupyter Notebook file: => /opt/xilinx/kv260-nlp-smartvision/share/notebooks/nlp-smartvision

Filename Description
nlp-smartvision.ipynb Jupyter Notebook file for nlp-smartvision demo.

Next Steps

Copyright © 2021-2024 Advanced Micro Devices, Inc

Terms and Conditions