Creating a new Application¶
Introduction¶
This section provides an details to create applications for accelerators on DFX example design. It also provides steps to install and run new accelerator firmware on the target. Usually, an accelerator takes input data, processes it, and produces the output data. This output data can be consumed by the user or passed further to another accelerator for processing. On the DFX uexample design, the input and output data are stored on DDR. An IP with virtual AXI stream channel support is provided for data movement between DDR and the accelerators. The virtual channels are realized on a single AXI Stream Bus by using different TID Values. Data Packets use TID value 0 and Control packets use TID values 1 through 7. The application can differentiate between control and data packets based on the TID. A total of 8 TID values, 0 to 7, are supported. The data movement APIs use UIO drivers to map the custom IPs like data movement IP etc. The APIs are described in the following section.
Pre-Requisites and Assumptions¶
Accelerators are configured with streams with TIDs ranging from 1 to 7 on the data mover.
Data is provided to accelerators with TID = 0 on the data mover.
Streaming accelerator functions by reading data from the input stream and writing data on the output stream. Data mover API DataToAccel reads data from DDR and provides it to the Accelerator input stream. Data mover API DataFromAccel reads data from the accelerator’s output stream and writes the data to DDR. In cases when data needs to be read from DDR for processing and written to DDR after processing, both the APIs - DataToAccel and DataFromAccel, need to be called. Calling only DataToAccel will stall the pipe after sometime as the accelerator’s output data is not read.
In the two-slot design, the user can load different accelerators in each of the slots and run the applications in paralell. The max limit of DDR that the users can allocate for each application is 256MB.
For buffer allocation using XRT, the pre-requisite is that zocl must be loaded. The dtsi file should have a zyxclmm_drm entry which ensures that zocl will be loaded when the device tree overlay is applied.
Data Movement APIs¶
All the APIs take one common argument called slot number which denotes the reconfigurable partition where the accelerator is loaded. The RMs have an inbuilt AXIS data mover to take care of data movement between the AXIS accelerators and DDR. The in-built data mover reads data from a source address in DDR and provides an input stream to Accelerator; reads data from the output stream of an accelerator and writes to a destination address on DDR.
InitializeMapRMs - Initialize and Map the data mover IPs using UIO drivers.
InitializeMapRMs(slot); // Possible slot values 0 or 1 based on which slot is being Initialized and Mapped.
DataToAccel - DDR to accelerator data movement
Code example:
#define DKB_OFFSET_MEM 0x0
#define EB_OFFSET_MEM 0x100
#define KEYBUFF_SIZE 0x2
#define TID_0 0x0
DataToAccel(slot,DKB_OFFSET_MEM,KEYBUFF_SIZE,TID_1);
DataToAccel(slot,EB_OFFSET_MEM,BUFF_SIZE,TID_0);
DataToAccelDone – Check whether the previous DataToAccel function finished execution. Call to this API will wait until the last DataToAccel API has finished its operation.
Code example:
int status = DataToAccelDone(slot);
DataFromAccel - Data movement from accelerator to DDR.
Code example:
#define RESULT_OFFSET_MEM 0x300
#define BUFF_SIZE 0x10
DataFromAccel(slot, RESULT_OFFSET_MEM, BUFF_SIZE);
DataFromAccelDone - Check whether the previous DataFromAccel function finished execution. Call to this API will wait until the last DataFromAccel API has finished its operation.
Code example:
int status = DataFromAccelDone(slot);
FinaliseUnmapRMs - Unmap the data mover IPs that were mapped as part of the call to initializeMapRMs(int slot).
Code example:
FinaliseUnmapRMs(slot);
Buffer Allocation for the Application¶
Every application needs buffer allocation for providing input data to Accelerator and storing the output data from the accelerator. XRT Native APIs are used for buffer allocation and deallocation.
Code example:
//Allocate XRT buffer to be used for input and output of the application
auto device = xrt::device(0);
auto bufferObject = xrt::bo(device, SIZE_IN_BYTES, 0);
uint32_t *vptr = (uint32_t *)bufferObject.map<int*>(); //Use vptr to work with the allocated xrt buffer
mapBuffer(bufferObject);
XRT takes care of deallocating the buffer when the application exits.
Application Example Template¶
//Includes
//Functions
int main(int argc,char *argv[])
{
//Set slot from argv[1]
//Initialize and Memory map RMs
//Allocate memory Buffer using XRT
//Application access to the allocated buffer
//Accelerator access to allocated buffer differentiated by TIDs
//Application Logic
//Unmap RMs
}
Application Example for Accelerator doing Addition/Subtraction¶
/*
* Copyright (C) 2022, Advanced Micro Devices, Inc. All rights reserved.
* SPDX-License-Identifier: MIT
*/
#include <string.h>
#include <stdint.h>
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/mman.h>
#include <fcntl.h>
// XRT includes
#include "experimental/xrt_bo.h"
#include "experimental/xrt_device.h"
#include "experimental/xrt_kernel.h"
#define SIZE_IN_BYTES 0x4000000 //64MB
#define A_OFFSET 0 //Decryption Key Offset at 0
#define B_OFFSET 8 //Encryption Key Offset at 32
#define OPERATION_OFFSET 64 //Encrypted Buffer Offset at 256
#define RESULT_OFFSET 128 //Decryped Buffer Offset at 512
#define A_OFFSET_MEM 0x0 //Decryption Key Mem Offset in Hex
#define B_OFFSET_MEM 0x20 //Encryption Key Mem Offset in Hex
#define OPERATION_OFFSET_MEM 0x100 //Encrypted Buffer Mem Offset in Hex
#define RESULT_OFFSET_MEM 0x200 //Result Buffer Mem Offset in Hex
#define INPUT_SIZE 0x2 //Size of Input Data
#define OPERATION_SIZE 0x1 //Size of Operation
#define TID_0 0x0 //TID 0
#define TID_1 0x1 //TID 1
#define TID_2 0x2 //TID 2
int InitializeMapRMs(int slot);
int StartAccel(int slot);
int FinaliseUnmapRMs(int slot);
void mapBuffer(xrt::bo boa);
int DataToAccel(int slot, uint64_t data, uint64_t size, uint8_t tid);
int DataFromAccel(int slot, uint64_t data, uint64_t size);
int DataToAccelDone(int slot);
int DataFromAccelDone(int slot);
// A Input Buffer
uint32_t A[] = {
0x55555555, 0x44444444, 0x33333333, 0x22222222,
0x55555555, 0x44444444, 0x33333333, 0x22222222
};
// B Input Buffer
uint32_t B[] = {
0x11111111, 0x11111111, 0x11111111, 0x11111111,
0x11111111, 0x11111111, 0x11111111, 0x11111111
};
// Operation
uint32_t OperationAdd[] = {
0x00000001, 0x00000000, 0x00000000, 0x00000000
};
uint32_t OperationSub[] = {
0x00000000, 0x00000000, 0x00000000, 0x00000000
};
int main(int argc, char *argv[])
{
//Default slot Set to 0 unless passed as an argument
int slot =0;
if(argc>1)
{
//Updating slot number provided as command line argument
slot = atoi (argv[1]);
if (slot != 1 && slot != 0)
{
printf("- Invalid slot number provided %s. Valid values : 0 or 1\n",argv[1]);
return 0;
}
}
//Initialize and memory map RMs
if(InitializeMapRMs(slot) == -1)
{
printf("- Check the slot number where the accelerator is loaded and run the test on the specific slot.\n");
return 0;
}
//Allocate XRT buffer to be used for input and output of the application
auto device = xrt::device(0);
auto bufferObject = xrt::bo(device, SIZE_IN_BYTES, 0);
uint32_t *vptr = (uint32_t *)bufferObject.map<int*>();
mapBuffer(bufferObject);
// Write A of Size 32 bytes (4bytes x 8 )
std::memcpy(vptr+A_OFFSET, &A, sizeof(A));
// Write B of Size 32 bytes (4bytes x 8 )
std::memcpy(vptr+B_OFFSET, &B, sizeof(B));
// Write Add Operation
std::memcpy(vptr+OPERATION_OFFSET, &OperationAdd, sizeof(OperationAdd));
// Write Sub Operation
//std::memcpy(vptr+OPERATION_OFFSET, &OperationSub, sizeof(OperationSub));
//Initialize RM
StartAccel(slot);
//Program A to Accelerator - Size 32 bytes (16bytes x 2 ) to Offset 0 (0x0)
DataToAccel(slot,A_OFFSET_MEM,INPUT_SIZE,TID_0);
int status = DataToAccelDone(slot);
//Program B to Accelerator - Size 32 bytes (16bytes x 2 ) to Offset 8 (0x20)
if(status)
{
DataToAccel(slot,OPERATION_OFFSET_MEM,OPERATION_SIZE,TID_1);
status = DataToAccelDone(slot);
}
//Program Operation to Accelerator - Size 16 bytes (16bytes x 1 ) to Offset 64 (0x100)
if(status)
{
DataToAccel(slot,B_OFFSET_MEM,INPUT_SIZE,TID_2);
status = DataToAccelDone(slot);
}
if(status)
{
//DataFromAccel - Size 16 bytes (16bytes x 1 ) to Offset 128 (0x200)
DataFromAccel(slot, RESULT_OFFSET_MEM, INPUT_SIZE);
status = DataFromAccelDone(slot);
printf("\t Success: Selected Operation Done !.\n");
}
if(status)
{
FinaliseUnmapRMs(slot);
}
return 0;
}
Building apps for new accelerators¶
The applications shoukld be built on the target.
Copy the above application example from the localhost as main.c to the target
Run the below steps to build an application on the target
sudo apt install uuid-dev libdfx-dev libdfx-mgr-dev #Installing required libraries
sudo git clone --branch xlnx_rel_v2022.1 --recursive https://github.com/Xilinx/kria-dfx-apps.git #Cloning Application Git Repo that has siha APIs
cd kria-dfx-apps/src
sudo mkdir ADD #Create directory to copy/create application source code
sudo scp user@localhost:<path to directory containing main.c on localhost>/main.c ./ADD #Copy application source code
cd ..
INC='-I/usr/include/xrt -I/usr/include/dfx-mgr' #Setting up Include files
LNK='-luuid -lxrt_coreutil -lxrt++ -ldfx-mgr' #Setting up Library dependencies
sudo g++ -Wall -g -std=c++1y $INC lib/siha.c src/ADD/main.c $LNK -o testADD #Compile application
Run new accelerator RM application on the target¶
This section provides details to install new custom RM firmware on target and run its corresponding application.
Prerequisites¶
Clone the kria-apps-firmware repository on target
git clone --branch xlnx_rel_v2022.1 --recursive https://github.com/Xilinx/kria-apps-firmware.git
Steps to install firmware for new RMs¶
Copy the new RM firmware directory created here to the directory kria-apps-firmware/k26-dfx/2rp on the target.
Navigate to the directory kria-apps-firmware/k26-dfx/2rp
Run Makefile to install new accelerator RM firmware on the target.
cd kria-apps-firmware/k26-dfx/2rp/
sudo apt install bootgen-xlnx
sudo make install
Load new accelerator RM on the target¶
Check if free slots are available on target using xmutil listapps.
sudo xmutil listapps
If all slots on target are full, free up a slot
sudo xmutil unloadapp
Load the new accelerator RM on the target
sudo xmutil loadapp NewRMName
Run applications for new RM¶
On the target, navigate to the directory where the new RM application is present.
cd ~/kria-dfx-apps
Run the new RM application built by providing the slot number where the RM was loaded.
sudo ./NewRMApplication SlotNumber
References¶
License¶
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License.
You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Copyright© 2021 Xilinx