Step by Step Example

This repository includes 5 examples from the Vitis Tutorials and 4 individual user examples to demonstrate how to use the test harness on VCK190 and 1 example from the Vitis Accelerated Libraries to use the test harness on VEK280 to test the AIE graph on a hardware board.

All these examples include a standard Makefile which supports the following actions:

  • Building the example for SW emulation (VCK190 only) and HW, respectively:

    make package TARGET=sw_emu
    make package TARGET=hw
    
  • Simulating the example in x86sim, AIEsim, and SW emulation (VCK190 only), respectively:

    make run TARGET=x86sim
    make run TARGET=aiesim
    make run TARGET=sw_emu
    
  • Clean all files:

    make cleanall
    

Running an Example

This section describes how to run the examples/vck190/super-sampling-rate-fir/SingleKernel example. The same steps apply to all other examples included in the repository.

  1. If not already done, install the AIE Test Harness:

    git clone https://github.com/Xilinx/AI-Engine-Test-Harness.git
    
  2. Set up your environment to use the 2023.2 versions Vitis, XRT and the prebuilt Embedded SW Image for Versal. The XILINX_VITIS, XILINX_XRT and SDKTARGETSYSROOT environment variables must be properly defined.

  3. Set up your environment to use the test harness, and navigate to the desired example folder:

    cd AI-Engine-Test-Harness
    source setup.sh
    cd examples/vck190/super-sampling-rate-fir/SingleKernel
    
  4. Run the example in SW emulation mode:

    make cleanall run TARGET=sw_emu
    
  5. Package the example to run on the VCK190 board:

    make cleanall package TARGET=hw
    
  6. Flash the sd_card.img file located in the package.hw.<shell name> folder on a SD card. On Windows, use the Balena Etcher tool to flash the card. On Linux, use the dd command.

  7. Using a client such as PuTTY, connect to the VCK190 board using the correct USB Serial Port. Use speed 115200.

  8. Insert the SD card in card reader on the VCK190 board, and turn on the power switch. NOTE: Make sure that the SW1 DIP switch on the VCK190 board is set to [1110] to boot from the SD card.

  9. After the boot sequence is completed, login using username: petalinux and password: petalinux

  10. Run the application as shown below, using petalinux as the root password. The test application transfers data to the AIE graph and reports the number of clock cycles needed to send or receive data for each PLIO. Upon successfull completion of the test, the run script will finish with a “TEST PASSED” message:

    sudo su
    cd /run/media/mmcblk0p1/
    source ./run_script.sh
    

Understanding the Example

This section describes the source code changes made to run the examples/vck190/super-sampling-rate-fir/SingleKernel example with the AIE Test Harness.

For additional details on the steps described below, refer to the documentation about Using the Test Harness and about the Software APIs.

Test Harness PLIOs

This example uses 1 input PLIO and 1 output PLIO. The width of each PLIO is set 128 bits, as required by the test harness. We chose PLIO PLIO_01_TO_AIE to send data to the AI-Engine and PLIO PLIO_02_FROM_AIE to receive data from the AI-Engine. As explained in the section about Placement of PLIOs, the PLIO names indicate the generic name and direction of each PLIO. All valid PLIO names can be found in include/test_harness_port_name.hpp.

TopGraph() {
    input_plio plin = input_plio::create("PLIO_01_TO_AIE", plio_128_bits, "data/PhaseIn_0.txt", 250);
    output_plio plout = output_plio::create("PLIO_02_FROM_AIE", plio_128_bits, "data/Output_0.txt", 250);
    connect<>(plin.out[0], G1.in);
    connect<>(G1.out, plout.in[0]);
}

Unused PLIOs

Because this example does not use up the available PLIOs implemented in the test harness, we need to put an additional dummy graph to occupy all unused PLIOs. For this purpose, the occupyUnusedPLIO helper class is instantiated in the examples/vck190/super-sampling-rate-fir/SingleKernel/aie/graph.cpp file. The template parameters indicate number of used input and output PLIOs, and the constructor parameters indicate the names of used input and output PLIOs.

static std::vector<std::string> cust_in = {"PLIO_01_TO_AIE"};
static std::vector<std::string> cust_out = {"PLIO_02_FROM_AIE"};
TopGraph G;
vck190_test_harness::occupyUnusedPLIO<1, 1, 36> dummyGraph(cust_in, cust_out);

SW Application

A SW application running on the embedded ARM core of the Versal is necessary to run the test on HW with AIE test harness. The source code for this application is provided in the examples/vck190/super-sampling-rate-fir/SingleKernel/ps/host.cpp file. The application must be developed using the test harness software APIs.

test_harness_mgr<36, 16, 4096> mgr(0, xclbin_path, {"G"});
std::vector<test_harness_args> args;
args.push_back({channel_index(PLIO_01_TO_AIE), in_sz, 4, 0, (char*)in_data[0]});
args.push_back({channel_index(PLIO_02_FROM_AIE), out_sz, 4, 0, (char*)out_data[0]});
mgr.runAIEGraph(0, 4);
mgr.runTestHarness(args);
mgr.waitForRes(10000);
bool is_valid = mgr.result_valid;

The application needs one instance of the test_harness_mgr class. The name of the PL kernel that is pre-compiled in the XSA, the testing mode, and the device type will be automatically derived from the encoded name of xclbin_path, the argument {"G"} is a vector of string with the name of the graph instantiated in the graph.cpp file. If any incorrect argument is provided, the application will report an error during runtime.

Then, a vector of test_harness_args is created to configure the DMA channels associated with each PLIO used by the AIE graph. As seen in step #1, the input of the graph is mapped to PLIO PLIO_01_TO_AIE, and the output is mapped to PLIO_02_FROM_AIE. The channel_index member of the test_harness_args descriptors must be set accordingly in the SW application. The replay count of the test_harness_args are set to 4. This programs the test harness to issue the input data 4 times and to expect 4x the output data.

After the test is configured, the graph is started with the test_harness_mgr::runAIEGraph() API. We know that each iteration of graph G consumes in_sz bytes of data and emits out_sz bytes of data. Since we programmed the test harness to replay the data transfer 4 times, the graph will be run for 4 iterations to ensure that the data sizes match between the test harness and the AIE graph.

The test harness is started with the test_harness_mgr::runTestHarness() API. Starting the harness AFTER the graph is necessary to ensure accurate measurement of DMA channel latencies.

Then, use test_harness_mgr::waitForRes() API to wait for the finish of the data transfer, and get the validity of the result by getting the boolean value of the test_harness_mgr::result_valid().