Overview¶

The Vitis vision library has been designed to work in the Vitis development environment, and provides a software interface for computer vision functions accelerated on an FPGA device. Vitis vision library functions are mostly similar in functionality to their OpenCV equivalent. Any deviations, if present, are documented.

Basic Features¶

All Vitis vision library functions follow a common format. The following properties hold true for all the functions.

All the functions are designed as templates and all arguments that are images, must be provided as xf::cv::Mat.
All functions are defined in the xf::cv namespace.
Some of the major template arguments are:
- Maximum size of the image to be processed
- Datatype defining the properties of each pixel
- Number of pixels to be processed per clock cycle
- Other compile-time arguments relevent to the functionality.

The Vitis vision library contains enumerated datatypes which enables you to configure xf::cv::Mat. For more details on xf::cv::Mat, see the xf::cv::Mat Image Container Class.

Vitis Vision Kernel on Vitis¶

The Vitis vision library is designed to be used with the Vitis development environment.

The OpenCL host code is written in the testbench file, whereas the calls to Vitis Vision functions are done from the accel file. The image containers for Vitis vision library functions are xf::cv::Mat objects. For more information, see the xf::cv::Mat Image Container Class.

Vitis Vision Library Contents¶

The following table lists the contents of the Vitis vision library.

Vitis Vision Library Contents¶
Folder	Details
L1/examples	Contains the sample testbench code to facilitate running unit tests on Vitis/Vivado HLS. The examples/ has folders with algorithm names. Each algorithm folder contains testbench, accel, config, Makefile , Json file and a ‘build’ folder.
L1/include/common	Contains the common library infrastructure headers, such as types specific to the library.
L1/include/core	Contains the core library functionality headers, such as the `math` functions.
L1/include/features	Contains the feature extraction kernel function definitions. For example, `Harris`.
L1/include/imgproc	Contains all the kernel function definitions related to image proce ssing definitions.
L1/include/video	Contains all the kernel function definitions, related to video proc essing functions.eg:Optical flow
L1/include/dnn	Contains all the kernel function definitions, related to deep lea rning preprocessing.
L1/tests	Contains all test folders to run simulations, synthesis and export RTL.The tests folder contains the folders with algorithm names.Each algorithm folder further contains configuration folders, that has makefile and tcl files to run tests.
L1/examples/build	Contains xf_config_params.h file, which has configurable macros and varibales related to the particula r example.
L2/examples	Contains the sample testbench code to facilitate running unit tests on Vitis. The examples/ contains the folders with algorithm names. Each algorithm folder contains testbench, accel, config, Makefile , Json file and a ‘build’ folder.
L2/tests	Contains all test folders to run software, hardware emulations and hardware build. The tests cont ains folders with algorithm names. Each algorithm folder further cont ains configuration folders, that has makefile and tcl files to run tests.
L2/examples/build	Contains xf_config_params.h file, which has configurable macros and varibales related to the particula r example.
L3/examples	Contains the sample testbench code to build pipeline functions on Vitis. The examples/ contains the folders with algorithm names. Each algorithm folder contains testbench, accel, config, Makefile , Json file and a ‘build’ folder.
L3/tests	Contains all test folders to run software, hardware emulations and hardware build.The tests cont ains folders with algorithm names. Each algorithm name folder contai ns the configuration folders, inside configuration folders makefile is present to run tests.
L3/examples/build	Contains xf_config_params.h file, which has configurable macros and varibales related to the particula r example.
L3/benchmarks	Contains benchmark examples to compare the software implementation versus FPGA implementation using Vitis vision library.
ext	Contains the utility functions related to opencl hostcode.

Getting Started with Vitis Vision¶

Describes the methodology to create a kernel, corresponding host code and a suitable makefile to compile an Vitis Vision kernel for any of the supported platforms in Vitis. The subsequent section also explains the methodology to verify the kernel in various emulation modes and on the hardware.

Prerequisites¶

Valid installation of Vitis™ 2021.1 or later version and the corresponding licenses.
Install the Vitis Vision libraries, if you intend to use libraries compiled differently than what is provided in Vitis.
Install the card for which the platform is supported in Vitis 2021.1 or later versions.
If targeting an embedded platform, set up the evaluation board.
Xilinx® Runtime (XRT) must be installed. XRT provides software interface to Xilinx FPGAs.
Install/compile OpenCV libraries(with compatible libjpeg.so). Appropriate version (X86/aarch32/aarch64) of compiler must be used based on the available processor for the target board.
libOpenCL.so must be installed if not present along with the platform.

Note

All Vitis Vision functions were tested against OpenCV version - 4.4.0

Vitis Design Methodology¶

There are three critical components in making a kernel work on a platform using Vitis™:

Host code with OpenCL constructs
Wrappers around HLS Kernel(s)
Makefile to compile the kernel for emulation or running on hardware.

Host Code with OpenCL¶

Host code is compiled for the host machine that runs on the host and provides the data and control signals to the attached hardware with the FPGA. The host code is written using OpenCL constructs and provides capabilities for setting up, and running a kernel on the FPGA. The following functions are executed using the host code:

Loading the kernel binary on the FPGA – xcl::import_binary_file() loads the bitstream and programs the FPGA to enable required processing of data.
Setting up memory buffers for data transfer – Data needs to be sent and read from the DDR memory on the hardware. cl::Buffers are created to allocate required memory for transferring data to and from the hardware.
Transfer data to and from the hardware –enqueueWriteBuffer() and enqueueReadBuffer() are used to transfer the data to and from the hardware at the required time.
Execute kernel on the FPGA – There are functions to execute kernels on the FPGA. There can be single kernel execution or multiple kernel execution that could be asynchronous or synchronous with each other. Commonly used command is enqueueTask().
Profiling the performance of kernel execution – The host code in OpenCL also enables measurement of the execution time of a kernel on the FPGA. The function used in our examples for profiling is getProfilingInfo().

Wrappers around HLS Kernel(s)¶

All Vitis Vision kernels are provided with C++ function templates (located at <Github repo>/include) with image containers as objects of xf::cv::Mat class. In addition, these kernels will work either in stream based (where complete image is read continuously) or memory mapped (where image data access is in blocks).

Vitis flow (OpenCL) requires kernel interfaces to be memory pointers with width in power(s) of 2. So glue logic is required for converting memory pointers to xf::cv::Mat class data type and vice-versa when interacting with Vitis Vision kernel(s). Wrapper(s) are build over the kernel(s) with this glue logic. Below examples will provide a methodology to handle different kernel (Vitis Vision kernels located at <Github repo>/include) types (stream and memory mapped).

Stream Based Kernels¶

To facilitate the conversion of pointer to xf::Mat and vice versa, two adapter functions are included as part of Vitis Vision xf::cv::Array2xfMat() and xf::cv::xfMat2Array(). It is necessary for the xf::Mat objects to be invoked as streams using HLS pragma with a minimum depth of 2. This results in a top-level (or wrapper) function for the kernel as shown below:

extern “C”
{
void func_top (ap_uint *gmem_in, ap_uint *gmem_out, ...) {
xf::cv::Mat<…> in_mat(…), out_mat(…);
#pragma HLS dataflow
xf::cv::Array2xfMat<…> (gmem_in, in_mat);
xf::cv::Vitis Vision-func<…> (in_mat, out_mat…);
xf::cv::xfMat2Array<…> (gmem_out, out_mat);
}
}

The above illustration assumes that the data in xf::cv::Mat is being streamed in and streamed out. You can also create a pipeline with multiple functions in pipeline instead of just one Vitis Vision function.

For the stream based kernels with different inputs of different sizes, multiple instances of the adapter functions are necessary. For this,

extern “C” {
void func_top (ap_uint *gmem_in1, ap_uint *gmem_in2, ap_uint *gmem_in3, ap_uint *gmem_out, ...) {
xf::cv::Mat<...,HEIGHT,WIDTH,…> in_mat1(…), out_mat(…);
xf::cv::Mat<...,HEIGHT/4,WIDTH,…>  in_mat2(…), in_mat3(…);
#pragma HLS dataflow
xf::cv::accel_utils obj_a, obj_b;
obj_a.Array2xfMat<…,HEIGHT,WIDTH,…> (gmem_in1, in_mat1);
obj_b.Array2xfMat<…,HEIGHT/4,WIDTH,…> (gmem_in2, in_mat2);
obj_b.Array2xfMat<…,HEIGHT/4,WIDTH,…> (gmem_in3, in_mat3);
xf::cv::Vitis-Vision-func(in_mat1, in_mat2, int_mat3, out_mat…);
xf::cv::xfMat2Array<…> (gmem_out, out_mat);
}
}

For the stream based implementations, the data must be fetched from the input AXI and must be pushed to xfMat as required by the xfcv kernels for that particular configuration. Likewise, the same operations must be performed for the output of the xfcv kernel. To perform this, two utility functions are provided, xf::cv::Array2xfMat() and xf::cv::xfMat2Array().

Array2xfMat¶

This function converts the input array to xf::cv::Mat. The Vitis Vision kernel would require the input to be of type, xf::cv::Mat. This function would read from the array pointer and write into xf::cv::Mat based on the particular configuration (bit-depth, channels, pixel-parallelism) the xf::cv::Mat was created. Array2xfMat supports line stride. Line stride is the number of pixels which needs to be added to the address in the first pixel of a row in order to access the first pixel of the next row.

//Without Line stride support
template <int PTR_WIDTH, int MAT_T, int ROWS, int COLS, int NPC>
void Array2xfMat(ap_uint< PTR_WIDTH > *srcPtr, xf::cv::Mat<MAT_T,ROWS,COLS,NPC>& dstMat)

//With Line stride support
template <int PTR_WIDTH, int MAT_T, int ROWS, int COLS, int NPC>
void Array2xfMat(ap_uint< PTR_WIDTH > *srcPtr, xf::cv::Mat<MAT_T,ROWS,COLS,NPC>& dstMat, int stride)

Table. Array2xfMat Parmater Description¶
Parameter	Description
PTR_WIDTH	Data width of the input pointer. The value must be power 2, starting from 8 to 512.
MAT_T	Input Mat type. Example XF_8UC1, XF_16UC1, XF_8UC3 and XF_8UC4
ROWS	Maximum height of image
COLS	Maximum width of image
NPC	Number of pixels computed in parallel. Example XF_NPPC1, XF_NPPC8
srcPtr	Input pointer. Type of the pointer based on the PTR_WIDTH.
dstMat	Output image of type xf::cv::Mat
stride	Line stride. Default value is dstMat.cols

xfMat2Array¶

This function converts the input xf::cv::Mat to output array. The output of the xf::kernel function will be xf::cv::Mat, and it will require to convert that to output pointer. xfMat2Array supports line stride. Line stride is the number of pixels which needs to be added to the address in the first pixel of a row in order to access the first pixel of the next row.

//Without Line stride support
template <int PTR_WIDTH, int MAT_T, int ROWS, int COLS, int NPC, int FILLZERO = 1>
void xfMat2Array(xf::cv::Mat<MAT_T,ROWS,COLS,NPC>& srcMat, ap_uint< PTR_WIDTH > *dstPtr)

//With Line stride support
template <int PTR_WIDTH, int MAT_T, int ROWS, int COLS, int NPC, int FILLZERO = 1>
void xfMat2Array(xf::cv::Mat<MAT_T,ROWS,COLS,NPC>& srcMat, ap_uint< PTR_WIDTH > *dstPtr, int stride)

Table . xfMat2Array Parameter Description¶
Parameter	Description
PTR_WIDTH	Data width of the output pointer. The value must be power 2, from 8 to 512.
MAT_T	Input Mat type. Example XF_8UC1, XF_16UC1, XF_8UC3 and XF_8UC4
ROWS	Maximum height of image
COLS	Maximum width of image
NPC	Number of pixels computed in parallel. Example XF_NPPC1, XF_NPPC8
FILLZERO	Line padding Flag. Use when line stride support is needed. Default value is 1
dstPtr	Output pointer. Type of the pointer based on the PTR_WIDTH.
srcMat	Input image of type xf::cv::Mat
stride	Line stride. Default value is srcMat.cols

Interface pointer widths¶

Minimum pointer widths for different configurations is shown in the following table:

Table . Minimum and maximum pointer widths for different Mat types¶
MAT type	Parallelism	Min PTR_WIDTH	Max PTR_WIDTH
XF_8UC1	XF_NPPC1	8	512
XF_16UC1	XF_NPPC1	16	512
XF_ 8UC1	XF_NPPC8	64	512
XF_ 16UC1	XF_NPPC8	128	512
XF_ 8UC3	XF_NPPC1	32	512
XF_ 8UC3	XF_NPPC8	256	512
XF_8UC4	XF_NPPC8	256	512
XF_8UC3	XF_NPPC16	512	512

Kernel-to-Kernel streaming¶

There are two utility functions available in Vitis Vision, axiStrm2xfMat and xfMat2axiStrm to support streaming of data between two kernels. For more details on kernel-to-kernel streaming, refer to the “Streaming Data Transfers Between the Kernels” section of [UG1393](https://www.xilinx.com/support/documentation/sw_manuals/xilinx2021_1/ug1393-vitis-application-acceleration.pdf) document.

axiStrm2xfMat¶

axiStrm2xfMat is used by consumer kernel to support streaming data transfer between two kernels. Consumer kernel receives data from producer kernel through kernel streaming interface which is defined by hls:stream with the ap_axiu< PTR_WIDTH, 0, 0, 0> data type. axiStrm2xfMat would read from AXI stream and write into xf::cv:Mat based on particular configuration (bit-depth, channels, pixel-parallelism) the xf::cv:Mat was created.

template <int PTR_WIDTH, int MAT_T, int ROWS, int COLS, int NPC>
void axiStrm2xfMat(hls::stream<ap_axiu<PTR_WIDTH, 0, 0, 0> >& srcPtr, xf::cv::Mat<MAT_T, ROWS, COLS, NPC>& dstMat)

Table . Parameter description of axiStrm2xfMat function¶
Parameter	Description
PTR_WIDTH	Data width of the input pointer. The value must be power 2, starting from 8 to 512.
MAT_T	Input Mat type. Example XF_8UC1, XF_16UC1, XF_8UC3 and XF_8UC4
ROWS	Maximum height of image
COLS	Maximum width of image
NPC	Number of pixels computed in parallel. Example XF_NPPC1, XF_NPPC8
srcPtr	Input image of type hls::stream<ap_axiu<PTR_WIDTH, 0, 0, 0> >
dstMat	Output image of type xf::cv::Mat

xfMat2axiStrm¶

xfMat2axiStrm is used by producer kernel to support streaming data transfer between two kernels. This function converts the input xf:cv::Mat to AXI stream based on particular configuration (bit-depth, channels, pixel-parallelism).

template <int PTR_WIDTH, int MAT_T, int ROWS, int COLS, int NPC>
void xfMat2axiStrm(xf::cv::Mat<MAT_T, ROWS, COLS, NPC>& srcMat, hls::stream<ap_axiu<PTR_WIDTH, 0, 0, 0> >& dstPtr)

Table . Parameter description of xfMat2axiStrm function¶
Parameter	Description
PTR_WIDTH	Data width of the input pointer. The value must be power 2, starting from 8 to 512.
MAT_T	Input Mat type. Example XF_8UC1, XF_16UC1, XF_8UC3 and XF_8UC4
ROWS	Maximum height of image
COLS	Maximum width of image
NPC	Number of pixels computed in parallel. Example XF_NPPC1, XF_NPPC8
srcPtr	Input image of type hls::stream<ap_axiu<PTR_WIDTH, 0, 0, 0> >
dstMat	Output image of type xf::cv::Mat

Memory Mapped Kernels¶

In the memory map based kernels such as crop, Mean-shift tracking and bounding box, the input read will be for particular block of memory based on the requirement for the algorithm. The streaming interfaces will require the image to be read in raster scan manner, which is not the case for the memory mapped kernels. The methodology to handle this case is as follows:

extern “C”
{
void func_top (ap_uint *gmem_in, ap_uint *gmem_out, ...) {
xf::cv::Mat<…> in_mat(…,gmem_in), out_mat(…,gmem_out);
xf::cv::kernel<…> (in_mat, out_mat…);
}
}

The gmem pointers must be mapped to the xf::cv::Mat objects during the object creation, and then the memory mapped kernels are called with these mats at the interface. It is necessary that the pointer size must be same as the size required for the xf::Vitis-Vision-func, unlike the streaming method where any higher size of the pointers (till 512-bits) are allowed.

Makefile¶

Examples for makefile are provided in the examples and tests section of GitHub.

Design example Using Library on Vitis¶

Following is a multi-kernel example, where different kernel runs sequentially in a pipeline to form an application. This example performs Canny edge detection, where two kernels are involved, Canny and edge tracing. Canny function will take gray-scale image as input and provided the edge information in 3 states (weak edge (1), strong edge (3), and background (0)), which is being fed into edge tracing, which filters out the weak edges. The prior works in a streaming based implementation and the later in a memory mapped manner.

Host code¶

The following is the Host code for the canny edge detection example. The host code sets up the OpenCL platform with the FPGA of processing required data. In the case of Vitis Vision example, the data is an image. Reading and writing of images are enabled using called to functions from Vitis Vision.

// setting up device and platform
    std::vector<cl::Device> devices = xcl::get_xil_devices();
    cl::Device device = devices[0];
    cl::Context context(device);
    cl::CommandQueue q(context, device,CL_QUEUE_PROFILING_ENABLE);
    std::string device_name = device.getInfo<CL_DEVICE_NAME>();

    // Kernel 1: Canny
    std::string binaryFile=xcl::find_binary_file(device_name,"krnl_canny");
    cl::Program::Binaries bins = xcl::import_binary_file(binaryFile);
    devices.resize(1);
    cl::Program program(context, devices, bins);
    cl::Kernel krnl(program,"canny_accel");

    // creating necessary cl buffers for input and output
    cl::Buffer imageToDevice(context, CL_MEM_READ_ONLY,(height*width));
    cl::Buffer imageFromDevice(context, CL_MEM_WRITE_ONLY,(height*width/4));


    // Set the kernel arguments
    krnl.setArg(0, imageToDevice);
    krnl.setArg(1, imageFromDevice);
    krnl.setArg(2, height);
    krnl.setArg(3, width);
    krnl.setArg(4, low_threshold);
    krnl.setArg(5, high_threshold);

    // write the input image data from host to device memory
    q.enqueueWriteBuffer(imageToDevice, CL_TRUE, 0,(height*(width)),img_gray.data);
    // Profiling Objects
    cl_ulong start= 0;
    cl_ulong end = 0;
    double diff_prof = 0.0f;
    cl::Event event_sp;

    // Launch the kernel
    q.enqueueTask(krnl,NULL,&event_sp);
    clWaitForEvents(1, (const cl_event*) &event_sp);

    // profiling
    event_sp.getProfilingInfo(CL_PROFILING_COMMAND_START,&start);
    event_sp.getProfilingInfo(CL_PROFILING_COMMAND_END,&end);
    diff_prof = end-start;
    std::cout<<(diff_prof/1000000)<<"ms"<<std::endl;

    // Kernel 2: edge tracing
    cl::Kernel krnl2(program,"edgetracing_accel");

    cl::Buffer imageFromDeviceedge(context, CL_MEM_WRITE_ONLY,(height*width));

    // Set the kernel arguments
    krnl2.setArg(0, imageFromDevice);
    krnl2.setArg(1, imageFromDeviceedge);
    krnl2.setArg(2, height);
    krnl2.setArg(3, width);

    // Profiling Objects
    cl_ulong startedge= 0;
    cl_ulong endedge = 0;
    double diff_prof_edge = 0.0f;
    cl::Event event_sp_edge;

    // Launch the kernel
    q.enqueueTask(krnl2,NULL,&event_sp_edge);
    clWaitForEvents(1, (const cl_event*) &event_sp_edge);

    // profiling
    event_sp_edge.getProfilingInfo(CL_PROFILING_COMMAND_START,&startedge);
    event_sp_edge.getProfilingInfo(CL_PROFILING_COMMAND_END,&endedge);
    diff_prof_edge = endedge-startedge;
    std::cout<<(diff_prof_edge/1000000)<<"ms"<<std::endl;


    //Copying Device result data to Host memory
    q.enqueueReadBuffer(imageFromDeviceedge, CL_TRUE, 0,(height*width),out_img_edge.data);
    q.finish();

Top level kernel¶

Below is the top-level/wrapper function with all necessary glue logic.

// streaming based kernel
#include "xf_canny_config.h"

extern "C" {
void canny_accel(ap_uint<INPUT_PTR_WIDTH> *img_inp, ap_uint<OUTPUT_PTR_WIDTH> *img_out, int rows, int cols,int low_threshold,int high_threshold)
{
#pragma HLS INTERFACE m_axi     port=img_inp  offset=slave bundle=gmem1
#pragma HLS INTERFACE m_axi     port=img_out  offset=slave bundle=gmem2
#pragma HLS INTERFACE s_axilite port=img_inp  bundle=control
#pragma HLS INTERFACE s_axilite port=img_out  bundle=control

#pragma HLS INTERFACE s_axilite port=rows     bundle=control
#pragma HLS INTERFACE s_axilite port=cols     bundle=control
#pragma HLS INTERFACE s_axilite port=low_threshold     bundle=control
#pragma HLS INTERFACE s_axilite port=high_threshold     bundle=control
#pragma HLS INTERFACE s_axilite port=return   bundle=control

    xf::cv::Mat<XF_8UC1, HEIGHT, WIDTH, INTYPE> in_mat(rows,cols);

    xf::cv::Mat<XF_2UC1, HEIGHT, WIDTH, XF_NPPC32> dst_mat(rows,cols);

    #pragma HLS DATAFLOW

    xf::cv::Array2xfMat<INPUT_PTR_WIDTH,XF_8UC1,HEIGHT,WIDTH,INTYPE>(img_inp,in_mat);
    xf::cv::Canny<FILTER_WIDTH,NORM_TYPE,XF_8UC1,XF_2UC1,HEIGHT, WIDTH,INTYPE,XF_NPPC32,XF_USE_URAM>(in_mat,dst_mat,low_threshold,high_threshold);
    xf::cv::xfMat2Array<OUTPUT_PTR_WIDTH,XF_2UC1,HEIGHT,WIDTH,XF_NPPC32>(dst_mat,img_out);


}
}
// memory mapped kernel
#include "xf_canny_config.h"
extern "C" {
void edgetracing_accel(ap_uint<INPUT_PTR_WIDTH> *img_inp, ap_uint<OUTPUT_PTR_WIDTH> *img_out, int rows, int cols)
{
#pragma HLS INTERFACE m_axi     port=img_inp  offset=slave bundle=gmem3
#pragma HLS INTERFACE m_axi     port=img_out  offset=slave bundle=gmem4
#pragma HLS INTERFACE s_axilite port=img_inp  bundle=control
#pragma HLS INTERFACE s_axilite port=img_out  bundle=control

#pragma HLS INTERFACE s_axilite port=rows     bundle=control
#pragma HLS INTERFACE s_axilite port=cols     bundle=control
#pragma HLS INTERFACE s_axilite port=return   bundle=control


    xf::cv::Mat<XF_2UC1, HEIGHT, WIDTH, XF_NPPC32> _dst1(rows,cols,img_inp);
    xf::cv::Mat<XF_8UC1, HEIGHT, WIDTH, XF_NPPC8> _dst2(rows,cols,img_out);
    xf::cv::EdgeTracing<XF_2UC1,XF_8UC1,HEIGHT, WIDTH, XF_NPPC32,XF_NPPC8,XF_USE_URAM>(_dst1,_dst2);

}
}

Evaluating the Functionality¶

You can build the kernels and test the functionality through software emulation, hardware emulation, and running directly on a supported hardware with the FPGA. Use the following commands to setup the basic environment:

$ cd <path to the folder where makefile is present>
$ source <path to the Vitis installation folder>/Vitis/<version number>/settings64.sh
$ export DEVICE=<path-to-platform-directory>/<platform>.xpfm

For PCIe devices, set the following:

$ source <path to Xilinx_xrt>/setup.sh

$ export OPENCV_INCLUDE=< path-to-opencv-include-folder >

$ export OPENCV_LIB=< path-to-opencv-lib-folder >

$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:< path-to-opencv-lib-folder >

For embedded devices, set the following:

Download the platform, and common-image from Xilinx Download Center. Run the sdk.sh script from the common-image directory to install sysroot using the command :
$ ./sdk.sh -y -d ./ -p

Unzip the rootfs file :
$ gunzip ./rootfs.ext4.gz

$ export SYSROOT=< path-to-platform-sysroot >

$ export EDGE_COMMON_SW=< path-to-rootfs-and-Image-files >

$ export PERL=<path-to-perl-installation-location> #For example, "export PERL=/usr/bin/perl". Please make sure that Expect.pm package is available in your Perl installation.

Software Emulation¶

Software emulation is equivalent to running a C-simulation of the kernel. The time for compilation is minimal, and is therefore recommended to be the first step in testing the kernel. Following are the steps to build and run for the software emulation:

For PCIe devices:

$ make host xclbin TARGET=sw_emu

$ make run TARGET=sw_emu

For embedded devices:

$ make host xclbin TARGET=sw_emu HOST_ARCH=< aarch32 | aarch64 >

$ make run TARGET=sw_emu HOST_ARCH=< aarch32 | aarch64 >

Hardware Emulation¶

Hardware emulation runs the test on the generated RTL after synthesis of the C/C++ code. The simulation, since being done on RTL requires longer to complete when compared to software emulation. Following are the steps to build and run for the hardware emulation:

For PCIe devices:

$ make host xclbin TARGET=hw_emu

$ make run TARGET=hw_emu

For embedded devices:

$ make host xclbin TARGET=hw_emu HOST_ARCH=< aarch32 | aarch64 >

$ make run TARGET=hw_emu HOST_ARCH=< aarch32 | aarch64 >

Testing on the Hardware¶

To test on the hardware, the kernel must be compiled into a bitstream (building for hardware). This would consume some time since the C/C++ code must be converted to RTL, run through synthesis and implementation process before a bitstream is created. As a prerequisite the drivers has to be installed for corresponding XSA, for which the example was built for. Following are the steps to build the kernel and run on a hardware:

For PCIe devices:

$ make host xclbin TARGET=hw

$ make run TARGET=hw

For embedded devices:

$ make host xclbin TARGET=hw HOST_ARCH=< aarch32 | aarch64 >

$ make run TARGET=hw HOST_ARCH=< aarch32 | aarch64 >

Note1. For non-DFX platforms, BOOT.BIN has to be manually copied from < build-directory >/< xclbin-folder >/sd_card / to the top level sd_card folder.

Note2. For hw run on embedded devices, copy the generated sd_card folder content under package_hw directory to an SDCARD and run the following commands on the board:

source /opt/xilinx/xrt/setup.sh

cd /mnt

export XCL_BINDIR=< xclbin-folder-present-in-the-sd_card > #For example, "export XCL_BINDIR=xclbin_zcu102_base_hw"

./< executable > < arguments >

Using the Vitis vision Library¶

This section describes using the Vitis vision library in the Vitis development environment.

Note: The instructions in this section assume that you have downloaded and installed all the required packages.

include folder constitutes all the necessary components to build a Computer Vision or Image Processing pipeline using the library. The folders common and core contain the infrastructure that the library functions need for basic functions, Mat class, and macros. The library functions are categorized into 4 folders, features, video, dnn, and imgproc based on the operation they perform. The names of the folders are self-explanatory.

To work with the library functions, you need to include the path to the the include folder in the Vitis project. You can include relevant header files for the library functions you will be working with after you source the include folder’s path to the compiler. For example, if you would like to work with Harris Corner Detector and Bilateral Filter, you must use the following lines in the host code:

#include “features/xf_harris.hpp” //for Harris Corner Detector
#include “imgproc/xf_bilateral_filter.hpp” //for Bilateral Filter
#include “video/xf_kalmanfilter.hpp”

After the headers are included, you can work with the library functions as described in the Vitis vision Library API Reference using the examples in the examples folder as reference.

The following table gives the name of the header file, including the folder name, which contains the library function.

Table : Vitis Vision Library¶
Function Name	File Path in the include folder
xf::cv::accumulate	imgproc/xf_accumulate_image.hpp
xf::cv::accumulateSquare	imgproc/xf_accumulate_squared.hpp
xf::cv::accumulateWeighted	imgproc/xf_accumulate_weighted.hp p
xf::cv::absdiff, xf::cv::add, xf::cv::subtract, xf::cv::bitwise_and, xf::cv::bitwise_or, xf::cv::bitwise_not, xf::cv::bitwise_xor,xf::cv::multiply ,xf::cv::Max, xf::cv::Min,xf::cv::compare, xf::cv::zero, xf::cv::addS, xf::cv::SubS, xf::cv::SubRS ,xf::cv::compareS, xf::cv::MaxS, xf::cv::MinS, xf::cv::set	core/xf_arithm.hpp
xf::cv::addWeighted	imgproc/xf_add_weighted.hpp
xf::cv::autowhitebalance	imgproc/xf_autowhitebalance.hpp
xf::cv::autoexposurecorrection	imgproc/xf_aec.hpp
xf::cv::bilateralFilter	imgproc/xf_bilaterealfilter.hpp
xf::cv::blackLevelCorrection	imgproc/xf_black_level.hpp
xf::cv::bfmatcher	imgproc/xf_bfmatcher.hpp
xf::cv::boxFilter	imgproc/xf_box_filter.hpp
xf::cv::boundingbox	imgproc/xf_boundingbox.hpp
xf::cv::badpixelcorrection	imgproc/xf_bpc.hpp
xf::cv::Canny	imgproc/xf_canny.hpp
xf::cv::colorcorrectionmatrix	imgproc/xf_colorcorrectionmatrix. hpp
xf::cv::Colordetect	imgproc/xf_colorthresholding.hpp, imgproc/xf_bgr2hsv.hpp, imgproc/xf_erosion.hpp, imgproc/xf_dilation.hpp
xf::cv::merge	imgproc/xf_channel_combine.hpp
xf::cv::extractChannel	imgproc/xf_channel_extract.hpp
xf::cv::ccaCustom	imgproc/xf_cca_custom.hpp
xf::cv::clahe	imgproc/xf_clahe.hpp
xf::cv::convertTo	imgproc/xf_convert_bitdepth.hpp
xf::cv::crop	imgproc/xf_crop.hpp
xf::cv::distanceTransform	imgproc/xf_distancetransform.hpp
xf::cv::nv122iyuv, xf::cv::nv122rgba, xf::cv::nv122yuv4, xf::cv::nv212iyuv, xf::cv::nv212rgba, xf::cv::nv212yuv4, xf::cv::rgba2yuv4, xf::cv::rgba2iyuv, xf::cv::rgba2nv12, xf::cv::rgba2nv21, xf::cv::uyvy2iyuv, xf::cv::uyvy2nv12, xf::cv::uyvy2rgba, xf::cv::yuyv2iyuv, xf::cv::yuyv2nv12, xf::cv::yuyv2rgba, xf::cv::rgb2iyuv,xf::cv::rgb2nv12, xf::cv::rgb2nv21, xf::cv::rgb2yuv4, xf::cv::rgb2uyvy, xf::cv::rgb2yuyv, xf::cv::rgb2bgr, xf::cv::bgr2uyvy, xf::cv::bgr2yuyv, xf::cv::bgr2rgb, xf::cv::bgr2nv12, xf::cv::bgr2nv21, xf::cv::iyuv2nv12, xf::cv::iyuv2rgba, xf::cv::iyuv2rgb, xf::cv::iyuv2yuv4, xf::cv::nv122uyvy, xf::cv::nv122yuyv, xf::cv::nv122nv21, xf::cv::nv212rgb, xf::cv::nv212bgr, xf::cv::nv212uyvy, xf::cv::nv212yuyv, xf::cv::nv212nv12, xf::cv::uyvy2rgb, xf::cv::uyvy2bgr, xf::cv::uyvy2yuyv, xf::cv::yuyv2rgb, xf::cv::yuyv2bgr, xf::cv::yuyv2uyvy, xf::cv::rgb2gray, xf::cv::bgr2gray, xf::cv::gray2rgb, xf::cv::gray2bgr, xf::cv::rgb2xyz, xf::cv::bgr2xyz…	imgproc/xf_cvt_color.hpp
xf::cv::densePyrOpticalFlow	video/xf_pyr_dense_optical_flow.h pp
xf::cv::DenseNonPyrLKOpticalFlow	video/xf_dense_npyr_optical_flow. hpp
xf::cv::dilate	imgproc/xf_dilation.hpp
xf::cv::demosaicing	imgproc/xf_demosaicing.hpp
xf::cv::erode	imgproc/xf_erosion.hpp
xf::cv::fast	features/xf_fast.hpp
xf::cv::filter2D	imgproc/xf_custom_convolution.hpp
xf::cv::flip	features/xf_flip.hpp
xf::cv::GaussianBlur	imgproc/xf_gaussian_filter.hpp
xf::cv::gaincontrol	imgproc/xf_gaincontrol.hpp
xf::cv::gammacorrection	imgproc/xf_gammacorrection
xf::cv::cornerHarris	features/xf_harris.hpp
xf::cv::calcHist	imgproc/xf_histogram.hpp
xf::cv::equalizeHist	imgproc/xf_hist_equalize.hpp
xf::cv::extractExposureFrames	imgproc/xf_extract_eframes.hpp
xf::cv::HDRMerge_bayer	imgproc/xf_hdrmerge.hpp
xf::cv::HOGDescriptor	imgproc/xf_hog_descriptor.hpp
xf::cv::Houghlines	imgproc/xf_houghlines.hpp
xf::cv::inRange	imgproc/xf_inrange.hpp
xf::cv::integralImage	imgproc/xf_integral_image.hpp
xf::cv::KalmanFilter	video/xf_kalmanfilter.hpp
xf::cv::Lscdistancebased	imgproc/xf_lensshadingcorrection .hpp
xf::cv::LTM::process	imgproc/xf_ltm.hpp
xf::cv::LUT	imgproc/xf_lut.hpp
xf::cv::magnitude	core/xf_magnitude.hpp
xf::cv::MeanShift	imgproc/xf_mean_shift.hpp
xf::cv::meanStdDev	core/xf_mean_stddev.hpp
xf::cv::medianBlur	imgproc/xf_median_blur.hpp
xf::cv::minMaxLoc	core/xf_min_max_loc.hpp
xf::cv::modefilter	imgproc/xf_modefilter.hpp
xf::cv::OtsuThreshold	imgproc/xf_otsuthreshold.hpp
xf::cv::phase	core/xf_phase.hpp
xf::cv::preProcess	dnn/xf_pre_process.hpp
xf::cv::paintmask	imgproc/xf_paintmask.hpp
xf::cv::pyrDown	imgproc/xf_pyr_down.hpp
xf::cv::pyrUp	imgproc/xf_pyr_up.hpp
xf::cv::xf_QuatizationDithering	imgproc/xf_quantizationdithering .hpp
xf::cv::reduce	imgrpoc/xf_reduce.hpp
xf::cv::remap	imgproc/xf_remap.hpp
xf::cv::resize	imgproc/xf_resize.hpp
xf::cv::convertScaleAbs	imgproc/xf_convertscaleabs.hpp
xf::cv::Scharr	imgproc/xf_scharr.hpp
xf::cv::SemiGlobalBM	imgproc/xf_sgbm.hpp
xf::cv::Sobel	imgproc/xf_sobel.hpp
xf::cv::StereoPipeline	imgproc/xf_stereo_pipeline.hpp
xf::cv::sum	imgproc/xf_sum.hpp
xf::cv::StereoBM	imgproc/xf_stereoBM.hpp
xf::cv::SVM	imgproc/xf_svm.hpp
xf::cv::lut3d	imgproc/xf_3dlut.hpp
xf::cv::Threshold	imgproc/xf_threshold.hpp
xf::cv::warpTransform	imgproc/xf_warp_transform.hpp

Changing the Hardware Kernel Configuration¶

To modify the configuration of any function, update the following file:

<path to vitis vision git folder>/vision/L1/examples/<function>/build/xf_config_params.h .

Using the Vitis vision Library Functions on Hardware¶

The following table lists the Vitis vision library functions and the command to run the respective examples on hardware. It is assumed that your design is completely built and the board has booted up correctly.

Table : Using the Vitis vision Library Function on Hardware¶
Example	Function Name	Usage on Hardware
accumulate	xf::cv::accumulate	./<executable name>.elf <path to input image 1> <path to input image 2>
accumulatesq uared	xf::cv::accumulateSquare	./<executable name>.elf <path to input image 1> <path to input image 2>
accumulatewe ighted	xf::cv::accumulateWeighted	./<executable name>.elf <path to input image 1> <path to input image 2>
addS	xf::cv::addS	./<executable name>.elf <path to input image>
arithm	xf::cv::absdiff, xf::cv::subtract, xf::cv::bitwise_and, xf::cv::bitwise_or, xf::cv::bitwise_not, xf::cv::bitwise_xor	./<executable name>.elf <path to input image 1> <path to input image 2>
addweighted	xf::cv::addWeighted	./<executable name>.elf <path to input image 1> <path to input image 2>
Autoexposure correction	xf::cv::autoexposurecorr ection	./<executable name>.elf <path to input image>
Autowhite balance	xf::cv::autowhitebalance	./<executable name>.elf <path to input image>
Bilateralfil ter	xf::cv::bilateralFilter	./<executable name>.elf <path to input image>
BlackLevel Correction	xf::cv::blackLevel Correction	./<executable name>.elf <path to input image>
BruteForce	xf::cv::bfmatcher	./<executable name>.elf <path to input image>
Boxfilter	xf::cv::boxFilter	./<executable name>.elf <path to input image>
Badpixelcorr ection	xf::cv::badpixelcorrection	./<executable name>.elf <path to input image>
Boundingbox	xf::cv::boundingbox	./<executable name>.elf <path to input image> <No of ROI’s>
Canny	xf::cv::Canny	./<executable name>.elf <path to input image>
ccaCustom	xf::cv::ccaCustom	./<executable name>.elf <path to input image>
channelcombi ne	xf::cv::merge	./<executable name>.elf <path to input image 1> <path to input image 2> <path to input image 3> <path to input image 4>
Channelextra ct	xf::cv::extractChannel	./<executable name>.elf <path to input image>
CLAHE	xf::cv::clahe	./<executable name>.elf <path to input image>
Colordetect	xf::cv::bgr2hsv, xf::cv::colorthresholding, xf::cv:: erode, xf::cv:: dilate	./<executable name>.elf <path to input image>
color correction matrix	xf::cv::colorcorrection matrix	./<executable name>.elf <path to input image>
compare	xf::cv::compare	./<executable name>.elf <path to input image 1> <path to input image 2>
compareS	xf::cv::compareS	./<executable name>.elf <path to input image>
Convertbitde pth	xf::cv::convertTo	./<executable name>.elf <path to input image>
convertScale Abs	xf::cv::convertScaleAbs	./<executable name>.elf <path to input image>
Cornertracke r	xf::cv::cornerTracker	./exe <input video> <no. of frames> <Harris Threshold> <No. of frames after which Harris Corners are Reset>
crop	xf::cv::crop	./<executable name>.elf <path to input image>
Customconv	xf::cv::filter2D	./<executable name>.elf <path to input image>
cvtcolor IYUV2NV12	xf::cv::iyuv2nv12	./<executable name>.elf <path to input image 1> <path to input image 2> <path to input image 3>
cvtcolor IYUV2RGBA	xf::cv::iyuv2rgba	./<executable name>.elf <path to input image 1> <path to input image 2> <path to input image 3>
cvtcolor IYUV2YUV4	xf::cv::iyuv2yuv4	./<executable name>.elf <path to input image 1> <path to input image 2> <path to input image 3> <path to input image 4> <path to input image 5> <path to input image 6>
cvtcolor NV122IYUV	xf::cv::nv122iyuv	./<executable name>.elf <path to input image 1> <path to input image 2>
cvtcolor NV122RGBA	xf::cv::nv122rgba	./<executable name>.elf <path to input image 1> <path to input image 2>
cvtcolor NV122YUV4	xf::cv::nv122yuv4	./<executable name>.elf <path to input image 1> <path to input image 2>
cvtcolor NV212IYUV	xf::cv::nv212iyuv	./<executable name>.elf <path to input image 1> <path to input image 2>
cvtcolor NV212RGBA	xf::cv::nv212rgba	./<executable name>.elf <path to input image 1> <path to input image 2>
cvtcolor NV212YUV4	xf::cv::nv212yuv4	./<executable name>.elf <path to input image 1> <path to input image 2>
cvtcolor RGBA2YUV4	xf::cv::rgba2yuv4	./<executable name>.elf <path to input image>
cvtcolor RGBA2IYUV	xf::cv::rgba2iyuv	./<executable name>.elf <path to input image>
cvtcolor RGBA2NV12	xf::cv::rgba2nv12	./<executable name>.elf <path to input image>
cvtcolor RGBA2NV21	xf::cv::rgba2nv21	./<executable name>.elf <path to input image>
cvtcolor UYVY2IYUV	xf::cv::uyvy2iyuv	./<executable name>.elf <path to input image>
cvtcolor UYVY2NV12	xf::cv::uyvy2nv12	./<executable name>.elf <path to input image>
cvtcolor UYVY2RGBA	xf::cv::uyvy2rgba	./<executable name>.elf <path to input image>
cvtcolor YUYV2IYUV	xf::cv::yuyv2iyuv	./<executable name>.elf <path to input image>
cvtcolor YUYV2NV12	xf::cv::yuyv2nv12	./<executable name>.elf <path to input image>
cvtcolor YUYV2RGBA	xf::cv::yuyv2rgba	./<executable name>.elf <path to input image>
Demosaicing	xf::cv::demosaicing	./<executable name>.elf <path to input image>
Difference of Gaussian	xf::cv::GaussianBlur, xf::cv::duplicateMat, and xf::cv::subtract	./<exe-name>.elf <path to input image>
Dilation	xf::cv::dilate	./<executable name>.elf <path to input image>
Distance Transform	xf::cv::distanceTransform	./<executable name>.elf <path to input image>
Erosion	xf::cv::erode	./<executable name>.elf <path to input image>
FAST	xf::cv::fast	./<executable name>.elf <path to input image>
Flip	xf::cv::flip	./<executable name>.elf <path to input image>
Gaussianfilt er	xf::cv::GaussianBlur	./<executable name>.elf <path to input image>
Gaincontrol	xf::cv::gaincontrol	./<executable name>.elf <path to input image>
Gammacorrec tion	xf::cv::gammacorrection	./<executable name>.elf <path to input image>
Harris	xf::cv::cornerHarris	./<executable name>.elf <path to input image>
Histogram	xf::cv::calcHist	./<executable name>.elf <path to input image>
Histequializ e	xf::cv::equalizeHist	./<executable name>.elf <path to input image>
Hog	xf::cv::HOGDescriptor	./<executable name>.elf <path to input image>
Houghlines	xf::cv::HoughLines	./<executable name>.elf <path to input image>
inRange	xf::cv::inRange	./<executable name>.elf <path to input image>
Integralimg	xf::cv::integralImage	./<executable name>.elf <path to input image>
Laplacian Filter	xf::cv::filter2d	./<executable name>.elf <path to input image>
Lkdensepyrof	xf::cv::densePyrOpticalFlo w	./<executable name>.elf <path to input image 1> <path to input image 2>
Lknpyroflow	xf::cv::DenseNonPyr LKOpticalFlow	./<executable name>.elf <path to input image 1> <path to input image 2>
lensshading correction	xf::cv::Lscdistancebased	./<executable name>.elf <path to input image>
Lut	xf::cv::LUT	./<executable name>.elf <path to input image>
Local tone mapping	xf::cv::LTM::process	./<executable name>.elf <path to input image>
Kalman Filter	xf::cv::KalmanFilter	./<executable name>.elf
Magnitude	xf::cv::magnitude	./<executable name>.elf <path to input image>
Max	xf::cv::Max	./<executable name>.elf <path to input image 1> <path to input image 2>
MaxS	xf::cv::MaxS	./<executable name>.elf <path to input image>
meanshifttra cking	xf::cv::MeanShift	./<executable name>.elf <path to input video/input image files> <Number of objects to track>
meanstddev	xf::cv::meanStdDev	./<executable name>.elf <path to input image>
medianblur	xf::cv::medianBlur	./<executable name>.elf <path to input image>
Min	xf::cv::Min	./<executable name>.elf <path to input image 1> <path to input image 2>
MinS	xf::cv::MinS	./<executable name>.elf <path to input image>
Minmaxloc	xf::cv::minMaxLoc	./<executable name>.elf <path to input image>
Mode filter	xf::cv::modefilter	./<executable name>.elf <path to input image>
otsuthreshol d	xf::cv::OtsuThreshold	./<executable name>.elf <path to input image>
paintmask	xf::cv::paintmask	./<executable name>.elf <path to input image>
Phase	xf::cv::phase	./<executable name>.elf <path to input image>
Pyrdown	xf::cv::pyrDown	./<executable name>.elf <path to input image>
Pyrup	xf::cv::pyrUp	./<executable name>.elf <path to input image>
Quantization Dithering	xf::cv::xf_Quatization Dithering	./<executable name>.elf <path to input image>
reduce	xf::cv::reduce	./<executable name>.elf <path to input image>
remap	xf::cv::remap	./<executable name>.elf <path to input image> <path to mapx data> <path to mapy data>
Resize	xf::cv::resize	./<executable name>.elf <path to input image>
scharrfilter	xf::cv::Scharr	./<executable name>.elf <path to input image>
set	xf::cv::set	./<executable name>.elf <path to input image>
SemiGlobalBM	xf::cv::SemiGlobalBM	./<executable name>.elf <path to left image> <path to right image>
sobelfilter	xf::cv::Sobel	./<executable name>.elf <path to input image>
stereopipeli ne	xf::cv::StereoPipeline	./<executable name>.elf <path to left image> <path to right image>
stereolbm	xf::cv::StereoBM	./<executable name>.elf <path to left image> <path to right image>
subRS	xf::cv::SubRS	./<executable name>.elf <path to input image>
subS	xf::cv::SubS	./<executable name>.elf <path to input image>
sum	xf::cv::sum	./<executable name>.elf <path to input image 1> <path to input image 2>
Svm	xf::cv::SVM	./<executable name>.elf
threshold	xf::cv::Threshold	./<executable name>.elf <path to input image>
3dlut	xf::cv::lut3d	./<executable name>.elf <path to input image>
warptransfor m	xf::cv::warpTransform	./<executable name>.elf <path to input image>
zero	xf::cv::zero	./<executable name>.elf <path to input image>

Getting Started with HLS¶

The Vitis vision library can be used to build applications in Vivado® HLS as well as Vitis HLS. This section of the document provides steps on how to run a single library component through the Vivado HLS or Vitis HLS 2020.2 flow which includes, C-simulation, C-synthesis, C/RTL co-simulation, and exporting the RTL as an IP.

All the functions under L1 folder of the Vitis Vision library can be built through Vitis HLS flow in the following two modes:

Tcl Script Mode
GUI Mode

Each configuration of all functions in L1 are provided with TCL script which can be run through the available Makefile.

Open a terminal and run the following commands to set the environment and build :

source < path-to-Vitis-installation-directory >/settings64.sh

source < part-to-XRT-installation-directory >/setup.sh

export DEVICE=< path-to-platform-directory >/< platform >.xpfm

export OPENCV_INCLUDE=< path-to-opencv-include-folder >

export OPENCV_LIB=< path-to-opencv-lib-folder >

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:< path-to-opencv-lib-folder >

make run CSIM=1 CSYNTH=1 COSIM=1 VIVADO_IMPL=1

Use the following steps to operate the HLS Standalone Mode using GUI:

Open a terminal and update the LD_LIBRARY_PATH to point to OpenCV lib folder.
From the same terminal, open Vivado® HLS or Vitis HLS in GUI mode and create a new project
Specify the name of the project. For example - Dilation.
Click Browse to enter a workspace folder used to store your projects.
Click Next.
Under the source files section, add the accel.cpp file which can be found in the examples folder. Also, fill the top function name (here it is dilation_accel).
Click Next.
Under the test bench section add tb.cpp.
Click Next.
Select the clock period to the required value (10ns in example).
Select the suitable part. For example, xczu9eg-ffvb1156-2-i.
Click Finish.
Right click on the created project and select Project Settings.
In the opened tab, select Simulation.
Files added under the Test Bench section will be displayed. Select a file and click Edit CFLAGS.
Enter -I<path-to-L1-include-directory> -std=c++0x -I<path-to-opencv-include-folder>.

Note: When using Vivado HLS in the Windows operating system, make sure to provide the -std=c++0x flag only for C-Sim and Co-Sim. Do not include the flag when performing synthesis.
In the Linker Flags section, enter the opencv libs and path to the opencv libs -L<path-to-opencv-lib-folder> -lopencv_core -lopencv_imgcodecs -lopencv_imgproc
Select Synthesis and repeat the above step for all the displayed files. Do not add opencv include path here.
Click OK.
Run the C Simulation, select Clean Build and specify the required input arguments.
Click OK.
All the generated output files/images will be present in the solution1->csim->build.
Run C synthesis.
Run co-simulation by specifying the proper input arguments.
The status of co-simulation can be observed on the console.

There are few limitations in performing co-simulation of the Vitis vision functions. They are:

Functions with multiple accelerators are not supported.
Compiler and simulator are default in HLS (gcc, xsim).
Since HLS does not support multi-kernel integration, the current flow also does not support multi-kernel integration. Hence, the Pyramidal Optical flow and Canny Edge Detection functions and examples are not supported in this flow.
The maximum image size (HEIGHT and WIDTH) set in config.h file should be equal to the actual input image size.

AXI Video Interface Functions¶

Vitis vision has functions that will transform the xf::cv::Mat into Xilinx® Video Streaming interface and vice-versa. xf::cv::AXIvideo2xfMat() and xf::cv::xfMat2AXIVideo() act as video interfaces to the IPs of the Vitis vision functions in the Vivado® IP integrator. cvMat2AXIvideoxf<NPC> and AXIvideo2cvMatxf<NPC> are used on the host side.

An example function, ‘axiconv’, depicting the usage of these functions is provided in the L1/examples directory.

Table. AXI Video Interface Functions¶
Video Library Function	Description
AXIvideo2xfMat	Converts data from an AXI4 video stream representation to xf::cv::Mat format.
xfMat2AXIvideo	Converts data stored as xf::cv::Mat format to an AXI4 video stream.
cvMat2AXIvideoxf	Converts data stored as cv::Mat format to an AXI4 video stream
AXIvideo2cvMatxf	Converts data from an AXI4 video stream representation to cv::Mat format.

AXIvideo2xfMat¶

The AXIvideo2xfMat function receives a sequence of images using the AXI4 Streaming Video and produces an xf::cv::Mat representation.

API Syntax

template<int W,int T,int ROWS, int COLS,int NPC>
int AXIvideo2xfMat(hls::stream< ap_axiu<W,1,1,1> >& AXI_video_strm, xf::cv::Mat<T,ROWS, COLS, NPC>& img)

Parameter Descriptions

The following table describes the template and the function parameters.

Table. AXIvideo2cvMatxf Function Parameter Description¶
Parameter	Description
W	Data width of AXI4-Stream. Recommended value is pixel depth.
T	Pixel type of the image. 1 channel (XF_8UC1). Data width of pixel must be no greater than W.
ROWS	Maximum height of input image.
COLS	Maximum width of input image.
NPC	Number of pixels to be processed per cycle. Possible options are XF_NPPC1 and XF_NPPC8 for 1-pixel and 8-pixel operations respectively.
AXI_video_strm	HLS stream of ap_axiu (axi protocol) type.
img	Input image.

This function will return bit error of ERROR_IO_EOL_EARLY( 1 ) or ERROR_IO_EOL_LATE( 2 ) to indicate an unexpected line length, by detecting TLAST input.

For more information about AXI interface see UG761.

xfMat2AXIvideo¶

The Mat2AXI video function receives an xf::cv::Mat representation of a sequence of images and encodes it correctly using the AXI4 Streaming video protocol.

API Syntax

template<int W, int T, int ROWS, int COLS,int NPC>
int xfMat2AXIvideo(xf::cv::Mat<T,ROWS, COLS,NPC>& img,hls::stream<ap_axiu<W,1,1,1> >& AXI_video_strm)

Parameter Descriptions

The following table describes the template and the function parameters.

Table. xfMat2AXIvideo Function Parameter Description¶
Parameter	Description
W	Data width of AXI4-Stream. Recommended value is pixel depth.
T	Pixel type of the image. 1 channel (XF_8UC1). Data width of pixel must be no greater than W.
ROWS	Maximum height of input image.
COLS	Maximum width of input image.
NPC	Number of pixels to be processed per cycle. Possible options are XF_NPPC1 and XF_NPPC8 for 1-pixel and 8-pixel operations respectively.
AXI_video_strm	HLS stream of ap_axiu (axi protocol) type.
img	Output image.

This function returns the value 0.

Note: The NPC values across all the functions in a data flow must follow the same value. If there is mismatch it throws a compilation error in HLS.

cvMat2AXIvideoxf¶

The cvMat2Axivideoxf function receives image as cv::Mat representation and produces the AXI4 streaming video of image.

API Syntax

template<int NPC,int W>
void cvMat2AXIvideoxf(cv::Mat& cv_mat, hls::stream<ap_axiu<W,1,1,1> >& AXI_video_strm)

Parameter Descriptions

The following table describes the template and the function parameters.

Table. AXIvideo2cvMatxf Function Parameter Description¶
Parameter	Description
W	Data width of AXI4-Stream. Recommended value is pixel depth.
NPC	Number of pixels to be processed per cycle. Possible options are XF_NPPC1 and XF_NPPC8 for 1-pixel and 8-pixel operations respectively.
AXI_video_strm	HLS stream of ap_axiu (axi protocol) type.
cv_mat	Input image.

AXIvideo2cvMatxf¶

The Axivideo2cvMatxf function receives image as AXI4 streaming video and produces the cv::Mat representation of image

API Syntax

template<int NPC,int W>
void AXIvideo2cvMatxf(hls::stream<ap_axiu<W,1,1,1> >& AXI_video_strm, cv::Mat& cv_mat)

Parameter Descriptions

The following table describes the template and the function parameters.

Table. AXIvideo2cvMatxf Function Parameter Description¶
Parameter	Description
W	Data width of AXI4-Stream. Recommended value is pixel depth.
NPC	Number of pixels to be processed per cycle. Possible options are XF_NPPC1 and XF_NPPC8 for 1-pixel and 8-pixel operations respectively.
AXI_video_strm	HLS stream of ap_axiu (axi protocol) type.
cv_mat	Output image.

Migrating HLS Video Library to Vitis vision¶

The HLS video library has been deprecated. All the functions and most of the infrastructure available in HLS video library are now available in Vitis vision with their names changed and some modifications. These HLS video library functions ported to Vitis vision supports build flow also.

This section provides the details on using the C++ video processing functions and the infrastructure present in HLS video library.

Infrastructure Functions and Classes¶

All the functions imported from HLS video library now take xf::cv::Mat (in sync with Vitis vision library) to represent image data instead of hls::Mat. The main difference between these two is that the hls::Mat uses hls::stream to store the data whereas xf::cv::Mat uses a pointer. Therefore, hls:: Mat cannot be exactly replaced with xf::cv::Mat for migrating.

Below table summarizes the differences between member functions of hls::Mat to xf::cv::Mat.

Table : Infrastructure Functions and Classes¶
Member Function	hls::Mat (HLS Video lib)	xf::cv::Mat (Vitis vision lib)
channels()	Returns the number of channels	Returns the number of channels
type()	Returns the enum value of pixel type	Returns the enum value of pixel type
depth()	Returns the enum value of pixel type	Returns the depth of pixel including channels
read()	Readout a value and return it as a scalar from stream	Readout a value from a given location and return it as a packed (for multi-pixel/clock) value.
operator >>	Similar to read()	Not available in Vitis vision
operator <<	Similar to write()	Not available in Vitis vision
Write()	Write a scalar value into the stream	Writes a packed (for multi-pixel/clock) value into the given location.

Infrastructure files available in HLS Video Library hls_video_core.hpp, hls_video_mem.hpp, hls_video_types.hpp are moved to xf_video_core.hpp, xf_video_mem.hpp, xf_video_types.hpp in Vitis vision Library and hls_video_imgbase.hpp is deprecated. Code inside these files unchanged except that these are now under xf::cv::namespace.

Classes¶

Memory Window Buffer: hls::window is now xf::cv::window. No change in the implementation, except the namespace change. This is located in “xf_video_mem.h” file.
Memory Line Buffer: hls::LineBuffer is now xf::cv::LineBuffer. No difference between the two, except xf::cv::LineBuffer has extra template arguments for inferring different types of RAM structures, for the storage structure used. Default storage type is “RAM_S2P_BRAM” with RESHAPE_FACTOR=1. Complete description can be found here xf::cv::LineBuffer. This is located in xf_video_mem.hpp file.

Funtions¶

OpenCV interface functions: These functions covert image data of OpenCV Mat format to/from HLS AXI types. HLS Video Library had 14 interface functions, out of which, two functions are available in Vitis vision Library: cvMat2AXIvideo and AXIvideo2cvMat located in “xf_axi.h” file. The rest are all deprecated.
AXI4-Stream I/O Functions: The I/O functions which convert hls::Mat to/from AXI4-Stream compatible data type (hls::stream) are hls::AXIvideo2Mat, hls::Mat2AXIvideo. These functions are now deprecated and added 2 new functions xf::cv::AXIvideo2xfMat and xf::cv:: xfMat2AXIvideo to facilitate the xf::cv::Mat to/from conversion. To use these functions, the header file “xf_infra.hpp” must be included.

xf::cv::window¶

A template class to represent the 2D window buffer. It has three parameters to specify the number of rows, columns in window buffer and the pixel data type.

Class definition¶

template<int ROWS, int COLS, typename T>
class Window {
public:
    Window()
   /* Window main APIs */
    void shift_pixels_left();
    void shift_pixels_right();
    void shift_pixels_up();
    void shift_pixels_down();
    void insert_pixel(T value, int row, int col);
    void insert_row(T value[COLS], int row);
    void insert_top_row(T value[COLS]);
    void insert_bottom_row(T value[COLS]);
    void insert_col(T value[ROWS], int col);
    void insert_left_col(T value[ROWS]);
    void insert_right_col(T value[ROWS]);
    T& getval(int row, int col);
    T& operator ()(int row, int col);
    T val[ROWS][COLS];
#ifdef __DEBUG__
    void restore_val();
    void window_print();
    T val_t[ROWS][COLS];
#endif
};

Parameter Descriptions¶

The following table lists the xf::cv::Window class members and their descriptions.

Table : Window Function Parameter Descriptions¶
Parameter	Description
Val	2-D array to hold the contents of buffer.

Member Function Description¶

Table : Member Function Description¶
Function	Description
shift_pixels_left()	Shift the window left, that moves all stored data within the window right, leave the leftmost column (col = COLS-1) for inserting new data.
shift_pixels_right()	Shift the window right, that moves all stored data within the window left, leave the rightmost column (col = 0) for inserting new data.
shift_pixels_up()	Shift the window up, that moves all stored data within the window down, leave the top row (row = ROWS-1) for inserting new data.
shift_pixels_down()	Shift the window down, that moves all stored data within the window up, leave the bottom row (row = 0) for inserting new data.
insert_pixel(T value, int row, int col)	Insert a new element value at location (row, column) of the window.
insert_row(T value[COLS], int row)	Inserts a set of values in any row of the window.
insert_top_row(T value[COLS])	Inserts a set of values in the top row = 0 of the window.
insert_bottom_row(T value[COLS])	Inserts a set of values in the bottom row = ROWS-1 of the window.
insert_col(T value[ROWS], int col)	Inserts a set of values in any column of the window.
insert_left_col(T value[ROWS])	Inserts a set of values in left column = 0 of the window.
insert_right_col(T value[ROWS])	Inserts a set of values in right column = COLS-1 of the window.
T& getval(int row, int col)	Returns the data value in the window at position (row,column).
T& operator ()(int row, int col)	Returns the data value in the window at position (row,column).
restore_val()	Restore the contents of window buffer to another array.
window_print()	Print all the data present in window buffer onto console.

Template Parameter Description¶

Table : Template Parameter Description¶
Parameter	Description
ROWS	Number of rows in the window buffer.
COLS	Number of columns in the window buffer.
T	Data type of pixel in the window buffer.

Sample code for window buffer declaration

Window<K_ROWS, K_COLS, unsigned char> kernel;

xf::cv::LineBuffer¶

A template class to represent 2D line buffer. It has three parameters to specify the number of rows, columns in window buffer and the pixel data type.

Class definition¶

template<int ROWS, int COLS, typename T, XF_ramtype_e MEM_TYPE=RAM_S2P_BRAM, int RESHAPE_FACTOR=1>
 class LineBuffer {
public:
    LineBuffer()
       /* LineBuffer main APIs */
    /* LineBuffer main APIs */
    void shift_pixels_up(int col);
    void shift_pixels_down(int col);
    void insert_bottom_row(T value, int col);
    void insert_top_row(T value, int col);
    void get_col(T value[ROWS], int col);
    T& getval(int row, int col);
    T& operator ()(int row, int col);

    /* Back compatible APIs */
    void shift_up(int col);
    void shift_down(int col);
    void insert_bottom(T value, int col);
    void insert_top(T value, int col);
    T val[ROWS][COLS];
#ifdef __DEBUG__
    void restore_val();
    void linebuffer_print(int col);
    T val_t[ROWS][COLS];
#endif
};

Parameter Descriptions¶

The following table lists the xf::cv::LineBuffer class members and their descriptions.

Table : Line Buffer Function Parameter Descriptions¶
Parameter	Description
Val	2-D array to hold the contents of line buffer.

Member Functions Description¶

Table : Member Functions Description¶
Function	Description
shift_pixels_up(int col)	Line buffer contents Shift up, new values will be placed in the bottom row=ROWS-1.
shift_pixels_down(int col)	Line buffer contents Shift down, new values will be placed in the top row=0.
insert_bottom_row(T value, int col)	Inserts a new value in bottom row= ROWS-1 of the line buffer.
insert_top_row(T value, int col)	Inserts a new value in top row=0 of the line buffer.
get_col(T value[ROWS], int col)	Get a column value of the line buffer.
T& getval(int row, int col)	Returns the data value in the line buffer at position (row, column).
T& operator ()(int row, int col);	Returns the data value in the line buffer at position (row, column).

Template Parameter Description¶

Table : Template Parameter Description¶
Parameter	Description
ROWS	Number of rows in line buffer.
COLS	Number of columns in line buffer.
T	Data type of pixel in line buffer.
MEM_TYPE	Type of storage element. It takes one of the following enumerated values: RAM_1P_BRAM, RAM_1P_URAM, RAM_2P_BRAM, RAM_2P_URAM, RAM_S2P_BRAM, RAM_S2P_URAM, RAM_T2P_BRAM, RAM_T2P_URAM.
RESHAPE_FACTOR	Specifies the amount to divide an array.

Sample code for line buffer declaration:

LineBuffer<3, 1920, XF_8UC3, RAM_S2P_URAM,1>     buff;

Video Processing Functions¶

The following table summarizes the video processing functions ported from HLS Video Library into Vitis vision Library along with the API modifications.

Table : Video Processing Functions¶
Functions	HLS Video Library -API	xfOpenCV Library-API
addS	template<int ROWS, int COLS, int SRC_T, typename _T, int DST_T> void AddS(Mat<ROWS, COLS, SRC_T>&src,Scalar<HLS_MAT_CN(SRC_T), _T> scl, Mat<ROWS, COLS, DST_T>& dst)	template<int POLICY_TYPE, int SRC_T, int ROWS, int COLS, int NPC =1> void addS(xf::Mat<SRC_T, ROWS, COLS, NPC> & _src1, unsigned char _scl[XF_CHANNELS(SRC_T,NPC)],xf::Mat<SRC_T, ROWS, COLS, NPC> & _dst)
AddWeighted	template<int ROWS, int COLS, int SRC1_T, int SRC2_T, int DST_T, typename P_T> void AddWeighted(Mat<ROWS, COLS, SRC1_T>& src1,P_T alpha,Mat<ROWS, COLS, SRC2_T>& src2,P_T beta, P_T gamma,Mat<ROWS, COLS, DST_T>& dst)	template< int SRC_T,int DST_T, int ROWS, int COLS, int NPC = 1> void addWeighted(xf::Mat<SRC_T, ROWS, COLS, NPC> & src1,float alpha, xf::Mat<SRC_T, ROWS, COLS, NPC> & src2,float beta, float gama, xf::Mat<DST_T, ROWS, COLS, NPC> & dst)
Cmp	template<int ROWS, int COLS, int SRC1_T, int SRC2_T, int DST_T> void Cmp(Mat<ROWS, COLS, SRC1_T>& src1,Mat<ROWS, COLS, SRC2_T>& src2, Mat<ROWS, COLS, DST_T>& dst,int cmp_op)	template<int CMP_OP, int SRC_T, int ROWS, int COLS, int NPC =1> void compare(xf::Mat<SRC_T, ROWS, COLS, NPC> & _src1, xf::Mat<SRC_T, ROWS, COLS, NPC> & _src2,xf::Mat<SRC_T, ROWS, COLS, NPC> & _dst)
CmpS	template<int ROWS, int COLS, int SRC_T, typename P_T, int DST_T> void CmpS(Mat<ROWS, COLS, SRC_T>& src, P_T value, Mat<ROWS, COLS, DST_T>& dst, int cmp_op)	template<int CMP_OP, int SRC_T, int ROWS, int COLS, int NPC =1> void compare(xf::Mat<SRC_T, ROWS, COLS, NPC> & _src1, unsigned char _scl[XF_CHANNELS(SRC_T,NPC)],xf::Mat<SRC_T, ROWS, COLS, NPC> & _dst)
Max	template<int ROWS, int COLS, int SRC1_T, int SRC2_T, int DST_T> void Max(Mat<ROWS, COLS, SRC1_T>& src1, Mat<ROWS, COLS, SRC2_T>& src2, Mat<ROWS, COLS, DST_T>& dst)	template<int SRC_T, int ROWS, int COLS, int NPC =1> void Max(xf::Mat<SRC_T, ROWS, COLS, NPC> & _src1, xf::Mat<SRC_T, ROWS, COLS, NPC> & _src2,xf::Mat<SRC_T, ROWS, COLS, NPC> & _dst)
MaxS	template<int ROWS, int COLS, int SRC_T, typename _T, int DST_T> void MaxS(Mat<ROWS, COLS, SRC_T>& src, _T value, Mat<ROWS, COLS, DST_T>& dst)	template< int SRC_T, int ROWS, int COLS, int NPC =1> void max(xf::Mat<SRC_T, ROWS, COLS, NPC> & _src1, unsigned char _scl[XF_CHANNELS(SRC_T,NPC)],xf::Mat<SRC_T, ROWS, COLS, NPC> & _dst)
Min	template<int ROWS, int COLS, int SRC1_T, int SRC2_T, int DST_T> void Min(Mat<ROWS, COLS, SRC1_T>& src1, Mat<ROWS, COLS, SRC2_T>& src2, Mat<ROWS, COLS, DST_T>& dst)	template< int SRC_T, int ROWS, int COLS, int NPC =1> void Min(xf::Mat<SRC_T, ROWS, COLS, NPC> & _src1, xf::Mat<SRC_T, ROWS, COLS, NPC> & _src2,xf::Mat<SRC_T, ROWS, COLS, NPC> & _dst)
MinS	template<int ROWS, int COLS, int SRC_T, typename _T, int DST_T> void MinS(Mat<ROWS, COLS, SRC_T>& src, _T value,Mat<ROWS, COLS, DST_T>& dst)	template< int SRC_T, int ROWS, int COLS, int NPC =1> void min(xf::Mat<SRC_T, ROWS, COLS, NPC> & _src1, unsigned char _scl[XF_CHANNELS(SRC_T,NPC)],xf::Mat<SRC_T, ROWS, COLS, NPC> & _dst)
PaintMask	template<int SRC_T,int MASK_T,int ROWS,int COLS> void PaintMask( Mat<ROWS,COLS,SRC_T> &_src, Mat<ROWS,COLS,MASK_T>&_mask, Mat<ROWS,COLS,SRC_T>&_dst,Scalar<HLS_MAT_CN(SRC_T),HLS_TNAME(SRC_T)> _color)	template< int SRC_T,int MASK_T, int ROWS, int COLS,int NPC=1> void paintmask(xf::Mat<SRC_T, ROWS, COLS, NPC> & _src_mat, xf::Mat<MASK_T, ROWS, COLS, NPC> & in_mask, xf::Mat<SRC_T, ROWS, COLS, NPC> & _dst_mat, unsigned char _color[XF_CHANNELS(SRC_T,NPC)])
Reduce	template<typename INTER_SUM_T, int ROWS, int COLS, int SRC_T, int DST_ROWS, int DST_COLS, int DST_T> void Reduce( Mat<ROWS, COLS, SRC_T> &src, Mat<DST_ROWS, DST_COLS, DST_T> &dst, int dim, int op=HLS_REDUCE_SUM)	template< int REDUCE_OP, int SRC_T,int DST_T, int ROWS, int COLS,int ONE_D_HEIGHT, int ONE_D_WIDTH, int NPC=1> void reduce(xf::Mat<SRC_T, ROWS, COLS, NPC> & _src_mat, xf::Mat<DST_T, ONE_D_HEIGHT, ONE_D_WIDTH, 1> & _dst_mat, unsigned char dim)
Zero	template<int ROWS, int COLS, int SRC_T, int DST_T> void Zero(Mat<ROWS, COLS, SRC_T>& src, Mat<ROWS, COLS, DST_T>& dst)	template< int SRC_T, int ROWS, int COLS, int NPC =1> void zero(xf::Mat<SRC_T, ROWS, COLS, NPC> & _src1,xf::Mat<SRC_T, ROWS, COLS, NPC> & _dst)
Sum	template<typename DST_T, int ROWS, int COLS, int SRC_T> Scalar<HLS_MAT_CN(SRC_T), DST_T> Sum( Mat<ROWS, COLS, SRC_T>& src)	template< int SRC_T, int ROWS, int COLS, int NPC = 1> void sum(xf::Mat<SRC_T, ROWS, COLS, NPC> & src1, double sum[XF_CHANNELS(SRC_T,NPC)] )
SubS	template<int ROWS, int COLS, int SRC_T, typename _T, int DST_T> void SubS(Mat<ROWS, COLS, SRC_T>& src, Scalar<HLS_MAT_CN(SRC_T), _T> scl, Mat<ROWS, COLS, DST_T>& dst)	template<int POLICY_TYPE, int SRC_T, int ROWS, int COLS, int NPC =1> void SubS(xf::Mat<SRC_T, ROWS, COLS, NPC> & _src1, unsigned char _scl[XF_CHANNELS(SRC_T,NPC)],xf::Mat<SRC_T, ROWS, COLS, NPC> & _dst)
SubRS	template<int ROWS, int COLS, int SRC_T, typename _T, int DST_T> void SubRS(Mat<ROWS, COLS, SRC_T>& src, Scalar<HLS_MAT_CN(SRC_T), _T> scl, Mat<ROWS, COLS, DST_T>& dst)	template<int POLICY_TYPE, int SRC_T, int ROWS, int COLS, int NPC =1> void SubRS(xf::Mat<SRC_T, ROWS, COLS, NPC> & _src1, unsigned char _scl[XF_CHANNELS(SRC_T,NPC)],xf::Mat<SRC_T, ROWS, COLS, NPC> & _dst)
Set	template<int ROWS, int COLS, int SRC_T, typename _T, int DST_T> void Set(Mat<ROWS, COLS, SRC_T>& src, Scalar<HLS_MAT_CN(SRC_T), _T> scl, Mat<ROWS, COLS, DST_T>& dst)	template< int SRC_T, int ROWS, int COLS, int NPC =1> void set(xf::Mat<SRC_T, ROWS, COLS, NPC> & _src1, unsigned char _scl[XF_CHANNELS(SRC_T,NPC)],xf::Mat<SRC_T, ROWS, COLS, NPC> & _dst)
Absdiff	template<int ROWS, int COLS, int SRC1_T, int SRC2_T, int DST_T> void AbsDiff( Mat<ROWS, COLS, SRC1_T>& src1, Mat<ROWS, COLS, SRC2_T>& src2, Mat<ROWS, COLS, DST_T>& dst)	template<int SRC_T, int ROWS, int COLS, int NPC =1> void absdiff(xf::Mat<SRC_T, ROWS, COLS, NPC> & _src1,xf::Mat<SRC_T, ROWS, COLS, NPC> & _src2,xf::Mat<SRC_T, ROWS, COLS, NPC> & _dst)
And	template<int ROWS, int COLS, int SRC1_T, int SRC2_T, int DST_T> void And( Mat<ROWS, COLS, SRC1_T>& src1, Mat<ROWS, COLS, SRC2_T>& src2, Mat<ROWS, COLS, DST_T>& dst)	template<int SRC_T, int ROWS, int COLS, int NPC = 1> void bitwise_and(xf::Mat<SRC_T, ROWS, COLS, NPC> & _src1, xf::Mat<SRC_T, ROWS, COLS, NPC> & _src2, xf::Mat<SRC_T, ROWS, COLS, NPC> &_dst)
Dilate	template<int Shape_type,int ITERATIONS,int SRC_T, int DST_T, typename KN_T,int IMG_HEIGHT,int IMG_WIDTH,int K_HEIGHT,int K_WIDTH> void Dilate(Mat<IMG_HEIGHT, IMG_WIDTH, SRC_T>&_src,Mat<IMG_HEIGHT, IMG_WIDTH, DST_T&_dst,Window<K_HEIGHT,K_WIDTH,KN_T>&_kernel)	template<int BORDER_TYPE, int TYPE, int ROWS, int COLS,int K_SHAPE,int K_ROWS,int K_COLS, int ITERATIONS, int NPC=1> void dilate (xf::Mat<TYPE, ROWS, COLS, NPC> & _src, xf::Mat<TYPE, ROWS, COLS, NPC> & _dst,unsigned char _kernel[K_ROWS*K_COLS])
Duplicate	template<int ROWS, int COLS, int SRC_T, int DST_T> void Duplicate(Mat<ROWS, COLS, SRC_T>& src,Mat<ROWS, COLS, DST_T>& dst1,Mat<ROWS, COLS, DST_T>& dst2)	template<int SRC_T, int ROWS, int COLS,int NPC> void duplicateMat(xf::Mat<SRC_T, ROWS, COLS, NPC> & _src, xf::Mat<SRC_T, ROWS, COLS, NPC> & _dst1,xf::Mat<SRC_T, ROWS, COLS, NPC> & _dst2)
EqualizeHist	template<int SRC_T, int DST_T,int ROW, int COL> void EqualizeHist(Mat<ROW, COL, SRC_T>&_src,Mat<ROW, COL, DST_T>&_dst)	template<int SRC_T, int ROWS, int COLS, int NPC = 1> void equalizeHist(xf::Mat<SRC_T, ROWS, COLS, NPC> & _src,xf::Mat<SRC_T, ROWS, COLS, NPC> & _src1,xf::Mat<SRC_T, ROWS, COLS, NPC> & _dst)
erode	template<int Shape_type,int ITERATIONS,int SRC_T, int DST_T, typename KN_T,int IMG_HEIGHT,int IMG_WIDTH,int K_HEIGHT,int K_WIDTH> void Erode(Mat<IMG_HEIGHT, IMG_WIDTH, SRC_T>&_src,Mat<IMG_HEIGHT,IMG_WIDTH,DST_T>&_dst,Window<K_HEIGHT,K_WIDTH,KN_T>&_kernel)	template<int BORDER_TYPE, int TYPE, int ROWS, int COLS,int K_SHAPE,int K_ROWS,int K_COLS, int ITERATIONS, int NPC=1> void erode (xf::Mat<TYPE, ROWS, COLS, NPC> & _src, xf::Mat<TYPE, ROWS, COLS, NPC> & _dst,unsigned char _kernel[K_ROWS*K_COLS])
FASTX	template<int SRC_T,int ROWS,int COLS> void FASTX(Mat<ROWS,COLS,SRC_T> &_src, Mat<ROWS,COLS,HLS_8UC1>&_mask,HLS_TNAME(SRC_T)_threshold,bool _nomax_supression)	template<int NMS,int SRC_T,int ROWS, int COLS,int NPC=1> void fast(xf::Mat<SRC_T, ROWS, COLS, NPC> & _src_mat,xf::Mat<SRC_T, ROWS, COLS, NPC> & _dst_mat,unsigned char _threshold)
Filter2D	template<int SRC_T, int DST_T, typename KN_T, typename POINT_T, int IMG_HEIGHT,int IMG_WIDTH,int K_HEIGHT,int K_WIDTH> void Filter2D(Mat<IMG_HEIGHT, IMG_WIDTH, SRC_T> &_src,Mat<IMG_HEIGHT, IMG_WIDTH, DST_T> &_dst,Window<K_HEIGHT,K_WIDTH,KN_T>&_kernel,Point_<POINT_T>anchor)	template<int BORDER_TYPE,int FILTER_WIDTH,int FILTER_HEIGHT, int SRC_T,int DST_T, int ROWS, int COLS,int NPC> void filter2D(xf::Mat<SRC_T, ROWS, COLS, NPC> & _src_mat,xf::Mat<DST_T, ROWS, COLS, NPC> & _dst_mat,short int filter[FILTER_HEIGHT*FILTER_WIDTH],unsigned char _shift)
GaussianBlur	template<int KH,int KW,typename BORDERMODE,int SRC_T,int DST_T,int ROWS,int COLS> void GaussianBlur(Mat<ROWS, COLS, SRC_T> &_src, Mat<ROWS, COLS, DST_T> &_dst,double sigmaX=0,double sigmaY=0)	template<int FILTER_SIZE, int BORDER_TYPE, int SRC_T, int ROWS, int COLS,int NPC = 1> void GaussianBlur(xf::Mat<SRC_T, ROWS, COLS, NPC> & _src, xf::Mat<SRC_T, ROWS, COLS, NPC> & _dst, float sigma)
Harris	template<int blockSize,int Ksize,typename KT,int SRC_T,int DST_T,int ROWS,int COLS> void Harris(Mat<ROWS, COLS, SRC_T> &_src,Mat<ROWS, COLS, DST_T>&_dst,KT k,int threshold	template<int FILTERSIZE,int BLOCKWIDTH, int NMSRADIUS,int SRC_T,int ROWS, int COLS,int NPC=1,bool USE_URAM=false> void cornerHarris(xf::Mat<SRC_T, ROWS, COLS, NPC> & src,xf::Mat<SRC_T, ROWS, COLS, NPC> & dst,uint16_t threshold, uint16_t k)
CornerHarris	template<int blockSize,int Ksize,typename KT,int SRC_T,int DST_T,int ROWS,int COLS> void CornerHarris( Mat<ROWS, COLS, SRC_T>&_src,Mat<ROWS, COLS, DST_T>&_dst,KT k)	template<int FILTERSIZE,int BLOCKWIDTH, int NMSRADIUS,int SRC_T,int ROWS, int COLS,int NPC=1,bool USE_URAM=false> void cornerHarris(xf::Mat<SRC_T, ROWS, COLS, NPC> & src,xf::Mat<SRC_T, ROWS, COLS, NPC> & dst,uint16_t threshold, uint16_t k
HoughLines2	template<unsigned int theta,unsigned int rho,typename AT,typename RT,int SRC_T,int ROW,int COL,unsigned int linesMax> void HoughLines2(Mat<ROW,COL,SRC_T> &_src, Polar_<AT,RT> (&_lines)[linesMax],unsigned int threshold)	template<unsigned int RHO,unsigned int THETA,int MAXLINES,int DIAG,int MINTHETA,int MAXTHETA,int SRC_T, int ROWS, int COLS,int NPC> void HoughLines(xf::Mat<SRC_T, ROWS, COLS, NPC> & _src_mat,float outputrho[MAXLINES],float outputtheta[MAXLINES],short threshold,short linesmax)
Integral	template<int SRC_T, int DST_T, int ROWS,int COLS> void Integral(Mat<ROWS, COLS, SRC_T>&_src, Mat<ROWS+1, COLS+1, DST_T>&_sum )	template<int SRC_TYPE,int DST_TYPE, int ROWS, int COLS, int NPC> void integral(xf::Mat<SRC_TYPE, ROWS, COLS, NPC> & _src_mat, xf::Mat<DST_TYPE, ROWS, COLS, NPC> & _dst_mat)
Merge	template<int ROWS, int COLS, int SRC_T, int DST_T> void Merge( Mat<ROWS, COLS, SRC_T>& src0, Mat<ROWS, COLS, SRC_T>& src1, Mat<ROWS, COLS, SRC_T>& src2, Mat<ROWS, COLS, SRC_T>& src3, Mat<ROWS, COLS, DST_T>& dst)	template<int SRC_T, int DST_T, int ROWS, int COLS, int NPC=1> void merge(xf::Mat<SRC_T, ROWS, COLS, NPC> &_src1, xf::Mat<SRC_T, ROWS, COLS, NPC> &_src2, xf::Mat<SRC_T, ROWS, COLS, NPC> &_src3, xf::Mat<SRC_T, ROWS, COLS, NPC> &_src4, xf::Mat<DST_T, ROWS, COLS, NPC> &_dst)
MinMaxLoc	template<int ROWS, int COLS, int SRC_T, typename P_T> void MinMaxLoc(Mat<ROWS, COLS, SRC_T>& src, P_T* min_val,P_T* max_val,Point& min_loc, Point& max_loc)	template<int SRC_T,int ROWS,int COLS,int NPC=0> void minMaxLoc(xf::Mat<SRC_T, ROWS, COLS, NPC> & _src,int32_t min_value, int32_t max_value,uint16_t _minlocx, uint16_t _minlocy, uint16_t _maxlocx, uint16_t _maxlocy )
Mul	template<int ROWS, int COLS, int SRC1_T, int SRC2_T, int DST_T> void Mul(Mat<ROWS, COLS, SRC1_T>& src1, Mat<ROWS, COLS, SRC2_T>& src2, Mat<ROWS, COLS, DST_T>& dst)	template<int POLICY_TYPE, int SRC_T, int ROWS, int COLS, int NPC = 1> void multiply(xf::Mat<SRC_T, ROWS, COLS, NPC> & src1, xf::Mat<SRC_T, ROWS, COLS, NPC> & src2, xf::Mat<SRC_T, ROWS, COLS, NPC> & dst,float scale)
Not	template<int ROWS, int COLS, int SRC_T, int DST_T> void Not(Mat<ROWS, COLS, SRC_T>& src, Mat<ROWS, COLS, DST_T>& dst)	template<int SRC_T, int ROWS, int COLS, int NPC = 1> void bitwise_not(xf::Mat<SRC_T, ROWS, COLS, NPC> & src, xf::Mat<SRC_T, ROWS, COLS, NPC> & dst)
Range	template<int ROWS, int COLS, int SRC_T, int DST_T, typename P_T> void Range(Mat<ROWS, COLS, SRC_T>& src, Mat<ROWS, COLS, DST_T>& dst, P_T start,P_T end)	template<int SRC_T, int ROWS, int COLS,int NPC=1> void inRange(xf::Mat<SRC_T, ROWS, COLS, NPC> & src,unsigned char lower_thresh,unsigned char upper_thresh,xf::Mat<SRC_T, ROWS, COLS, NPC> & dst)
Resize	template<int SRC_T, int ROWS,int COLS,int DROWS,int DCOLS> void Resize ( Mat<ROWS, COLS, SRC_T> &_src, Mat<DROWS, DCOLS, SRC_T> &_dst, int interpolation=HLS_INTER_LINEAR )	template<int INTERPOLATION_TYPE, int TYPE, int SRC_ROWS, int SRC_COLS, int DST_ROWS, int DST_COLS, int NPC, int MAX_DOWN_SCALE> void resize (xf::Mat<TYPE, SRC_ROWS, SRC_COLS, NPC> & _src, xf::Mat<TYPE, DST_ROWS, DST_COLS, NPC> & _dst)
sobel	template<int XORDER, int YORDER, int SIZE, int SRC_T, int DST_T, int ROWS,int COLS,int DROWS,int DCOLS> void Sobel (Mat<ROWS, COLS, SRC_T> &_src,Mat<DROWS, DCOLS, DST_T> &_dst)	template<int BORDER_TYPE,int FILTER_TYPE, int SRC_T,int DST_T, int ROWS, int COLS,int NPC=1,bool USE_URAM = false> void Sobel(xf::Mat<SRC_T, ROWS, COLS, NPC> & _src_mat,xf::Mat<DST_T, ROWS, COLS, NPC> & _dst_matx,xf::Mat<DST_T, ROWS, COLS, NPC> & _dst_maty)
split	template<int ROWS, int COLS, int SRC_T, int DST_T> void Split( Mat<ROWS, COLS, SRC_T>& src, Mat<ROWS, COLS, DST_T>& dst0, Mat<ROWS, COLS, DST_T>& dst1, Mat<ROWS, COLS, DST_T>& dst2, Mat<ROWS, COLS, DST_T>& dst3)	template<int SRC_T, int DST_T, int ROWS, int COLS, int NPC=1> void extractChannel(xf::Mat<SRC_T, ROWS, COLS, NPC> & _src_mat, xf::Mat<DST_T, ROWS, COLS, NPC> & _dst_mat, uint16_t _channel)
Threshold	template<int ROWS, int COLS, int SRC_T, int DST_T> void Threshold( Mat<ROWS, COLS, SRC_T>& src, Mat<ROWS, COLS, DST_T>& dst, HLS_TNAME(SRC_T) thresh, HLS_TNAME(DST_T) maxval, int thresh_type)	template<int THRESHOLD_TYPE, int SRC_T, int ROWS, int COLS,int NPC=1> void Threshold(xf::Mat<SRC_T, ROWS, COLS, NPC> & _src_mat,xf::Mat<SRC_T, ROWS, COLS, NPC> & _dst_mat,short int thresh,short int maxval )
Scale	template<int ROWS, int COLS, int SRC_T, int DST_T, typename P_T> void Scale(Mat<ROWS, COLS, SRC_T>& src,Mat<ROWS, COLS, DST_T>& dst, P_T scale=1.0,P_T shift=0.0)	template< int SRC_T,int DST_T, int ROWS, int COLS, int NPC = 1> void scale(xf::Mat<SRC_T, ROWS, COLS, NPC> & src1, xf::Mat<DST_T, ROWS, COLS, NPC> & dst,float scale, float shift)
InitUndistortRectifyMapInverse	template<typename CMT, typename DT, typename ICMT, int ROWS, int COLS, int MAP1_T, int MAP2_T, int N> void InitUndistortRectifyMapInverse ( Window<3,3, CMT> cameraMatrix,DT(&distCoeffs)[N],Window<3,3, ICMT> ir, Mat<ROWS, COLS, MAP1_T> &map1,Mat<ROWS, COLS, MAP2_T> &map2,int noRotation=false)	template< int CM_SIZE, int DC_SIZE, int MAP_T, int ROWS, int COLS, int NPC > void InitUndistortRectifyMapInverse ( ap_fixed<32,12> cameraMatrix, ap_fixed<32,12> distCoeffs, ap_fixed<32,12> *ir, xf::Mat<MAP_T, ROWS, COLS, NPC> &_mapx_mat,xf::Mat<MAP_T, ROWS, COLS, NPC> &_mapy_mat,int _cm_size, int _dc_size)
Avg, mean, AvgStddev	template<typename DST_T, int ROWS, int COLS, int SRC_T> DST_T Mean(Mat<ROWS, COLS, SRC_T>& src)	template<int SRC_T,int ROWS, int COLS,int NPC=1>void meanStdDev(xf::Mat<SRC_T, ROWS, COLS, NPC> & _src,unsigned short* _mean,unsigned short* _stddev)
CvtColor	template<typename CONVERSION,int SRC_T, int DST_T,int ROWS,int COLS> void CvtColor(Mat<ROWS, COLS, SRC_T> &_src, Mat<ROWS, COLS, DST_T> &_dst)	Color Conversion

Note: All the functions except Reduce can process N-pixels per clock where N is power of 2.

Design Examples Using Vitis Vision Library¶

All the hardware functions in the library have their own respective examples that are available in the github. This section provides details of image processing functions and pipelines implemented using a combination of various functions in Vitis vision. They illustrate how to best implement various functionalities using the capabilities of both the processor and the programmable logic. These examples also illustrate different ways to implement complex dataflow paths. The following examples are described in this section:

Iterative Pyramidal Dense Optical Flow
Corner Tracking Using Optical Flow
Color Detection
Difference of Gaussian Filter
Stereo Vision Pipeline
Blob From Image
Letterbox
Image Sensor Processing pipeline
Image Sensor Processing pipeline - 2020.2 version
Image Sensor Processing pipeline - 2021.1 version
Image Sensor Processing pipeline with HDR

Iterative Pyramidal Dense Optical Flow¶

The Dense Pyramidal Optical Flow example uses the xf::cv::pyrDown and xf::cv::densePyrOpticalFlow hardware functions from the Vitis vision library, to create an image pyramid, iterate over it and compute the Optical Flow between two input images. The example uses xf::cv::pyrDown function to compute the image pyramids of the two input images. The two image pyramids are processed by xf::cv::densePyrOpticalFlow function, starting from the smallest image size going up to the largest image size. The output flow vectors of each iteration are fed back to the hardware kernel as input to the hardware function. The output of the last iteration on the largest image size is treated as the output of the dense pyramidal optical flow example.

The Iterative Pyramidal Dense Optical Flow is computed in a nested for loop which runs for iterations*pyramid levels number of iterations. The main loop starts from the smallest image size and iterates up to the largest image size. Before the loop iterates in one pyramid level, it sets the current pyramid level’s height and width, in curr_height and current_width variables. In the nested loop, the next_height variable is set to the previous image height if scaling up is necessary, that is, in the first iterations. As divisions are costly and one time divisions can be avoided in hardware, the scale factor is computed in the host and passed as an argument to the hardware kernel. After each pyramid level, in the first iteration, the scale-up flag is set to let the hardware function know that the input flow vectors need to be scaled up to the next higher image size. Scaling up is done using bilinear interpolation in the hardware kernel.

After all the input data is prepared, and the flags are set, the host processor calls the hardware function. Please note that the host function swaps the flow vector inputs and outputs to the hardware function to iteratively solve the optimization problem.

Corner Tracking Using Optical Flow¶

This example illustrates how to detect and track the characteristic feature points in a set of successive frames of video. A Harris corner detector is used as the feature detector, and a modified version of Lucas Kanade optical flow is used for tracking. The core part of the algorithm takes in current and next frame as the inputs and outputs the list of tracked corners. The current image is the first frame in the set, then corner detection is performed to detect the features to track. The number of frames in which the points need to be tracked is also provided as the input.

Corner tracking example uses five hardware functions from the Vitis vision library xf::cv::cornerHarris, xf::cv:: cornersImgToList, xf::cv::cornerUpdate, xf::cv::pyrDown, and xf::cv::densePyrOpticalFlow.

The function, xf::cv::cornerUpdate, has been added to ensure that the dense flow vectors from the output of thexf::cv::densePyrOpticalFlow function are sparsely picked and stored in a new memory location as a sparse array. This was done to ensure that the next function in the pipeline would not have to surf through the memory by random accesses. The function takes corners from Harris corner detector and dense optical flow vectors from the dense pyramidal optical flow function and outputs the updated corner locations, tracking the input corners using the dense flow vectors, thereby imitating the sparse optical flow behavior. This hardware function runs at 300 MHz for 10,000 corners on a 720p image, adding very minimal latency to the pipeline.

cornerUpdate()¶

API Syntax

template <unsigned int MAXCORNERSNO, unsigned int TYPE, unsigned int ROWS, unsigned int COLS, unsigned int NPC>
void cornerUpdate(ap_uint<64> *list_fix, unsigned int *list, uint32_t nCorners, xf::cv::Mat<TYPE,ROWS,COLS,NPC> &flow_vectors, ap_uint<1> harris_flag)

Parameter Descriptions

The following table describes the template and the function parameters.

Table: CornerUpdate Function Parameter Descriptions¶
Paramete r	Description
MAXCORNE RSNO	Maximum number of corners that the function needs to work on
TYPE	Input Pixel Type. Only 8-bit, unsigned, 1 channel is supported (XF_8UC1)
ROWS	Maximum height of input and output image (Must be multiple of 8)
COLS	Maximum width of input and output image (Must be multiple of 8)
NPC	Number of pixels to be processed per cycle. This function supports only XF_NPPC1 or 1-pixel per cycle operations.
list_fix	A list of packed fixed point coordinates of the corner locations in 16, 5 (16 integer bits and 5 fractional bits) format. Bits from 20 to 0 represent the column number, while the bits 41 to 21 represent the row number. The rest of the bits are used for flag, this flag is set when the tracked corner is valid.
list	A list of packed positive short integer coordinates of the corner locations in unsigned short format. Bits from 15 to 0 represent the column number, while the bits 31 to 16 represent the row number. This list is same as the list output by Harris Corner Detector.
nCorners	Number of corners to track
flow_vec tors	Packed flow vectors as in xf::cv::DensePyrOpticalFlow function
harris_f lag	If set to 1, the function takes input corners from list. if set to 0, the function takes input corners from list_fix.

The example codeworks on an input video which is read and processed using the Vitis vision library.

cornersImgToList()¶

API Syntax

template <unsigned int MAXCORNERSNO, unsigned int TYPE, unsigned int ROWS, unsigned int COLS, unsigned int NPC>
void cornersImgToList(xf::cv::Mat<TYPE,ROWS,COLS,NPC> &_src, unsigned int list[MAXCORNERSNO], unsigned int *ncorners)

Parameter Descriptions

The following table describes the function parameters.

Table: CornerImgToList Function Parameter Descriptions¶
Paramete r	Description
_src	The output image of harris corner detector. The size of this xf::cv::Mat object is the size of the input image to Harris corner detector. The value of each pixel is 255 if a corner is present in the location, 0 otherwise.
list	A 32 bit memory allocated, the size of MAXCORNERS, to store the corners detected by Harris Detector
ncorners	Total number of corners detected by Harris, that is, the number of corners in the list

Image Processing¶

The following steps demonstrate the Image Processing procedure in the hardware pipeline

xf::cv::cornerharris is called to start processing the first input image
The output ofxf::cv::cornerHarris is fed toxf::cv::cornersImgToList. This function takes in an image with corners (marked as 255 and 0 elsewhere), and converts them to a list of corners.
xf::cv::pyrDown creates the two image pyramids and Dense Optical Flow is computed using the two image pyramids as described in the Iterative Pyramidal Dense Optical Flow example.
xf::cv::densePyrOpticalFlow is called with the two image pyramids as inputs.
xf::cv::cornerUpdate function is called to track the corner locations in the second image. If harris_flag is enabled, the cornerUpdate tracks corners from the output of the list, else it tracks the previously tracked corners.

The HarrisImg() function takes a flag called harris_flag which is set during the first frame or when the corners need to be redetected. The xf::cv::cornerUpdate function outputs the updated corners to the same memory location as the output corners list of xf::cv::cornerImgToList. This means that when harris_flag is unset, the corners input to the xf::cv::cornerUpdate are the corners tracked in the previous cycle, that is, the corners in the first frame of the current input frames.

After the Dense Optical Flow is computed, if harris_flag is set, the number of corners that xf::cv::cornerharris has detected and xf::cv::cornersImgToList has updated is copied to num_corners variable . The other being the tracked corners list, listfixed. If harris_flag is set, xf::cv::cornerUpdate tracks the corners in ‘list’ memory location, otherwise it tracks the corners in ‘listfixed’ memory location.

Color Detection¶

The Color Detection algorithm is basically used for color object tracking and object detection, based on the color of the object. The color based methods are very useful for object detection and segmentation, when the object and the background have a significant difference in color.

The Color Detection example uses four hardware functions from the Vitis vision library. They are:

xf::cv::BGR2HSV
xf::cv::colorthresholding
xf::cv::erode
xf::cv::dilate

In the Color Detection example, the color space of the original BGR image is converted into an HSV color space. Because HSV color space is the most suitable color space for color based image segmentation. Later, based on the H (hue), S (saturation) and V (value) values, apply the thresholding operation on the HSV image and return either 255 or 0. After thresholding the image, apply erode (morphological opening) and dilate (morphological opening) functions to reduce unnecessary white patches (noise) in the image. Here, the example uses two hardware instances of erode and dilate functions. The erode followed by dilate and once again applying dilate followed by erode.

The following example demonstrates the Color Detection algorithm.

    void color_detect(ap_uint<PTR_IN_WIDTH>* img_in,
              unsigned char* low_thresh,
              unsigned char* high_thresh,
              unsigned char* process_shape,
              ap_uint<PTR_OUT_WIDTH>* img_out,
              int rows,
              int cols) {

#pragma HLS INTERFACE m_axi      port=img_in        offset=slave  bundle=gmem0

#pragma HLS INTERFACE m_axi      port=low_thresh    offset=slave  bundle=gmem1
#pragma HLS INTERFACE s_axilite  port=low_thresh
#pragma HLS INTERFACE m_axi      port=high_thresh   offset=slave  bundle=gmem2
#pragma HLS INTERFACE s_axilite  port=high_thresh
#pragma HLS INTERFACE s_axilite  port=rows
#pragma HLS INTERFACE s_axilite  port=cols
#pragma HLS INTERFACE m_axi      port=process_shape offset=slave  bundle=gmem3
#pragma HLS INTERFACE s_axilite  port=process_shape
#pragma HLS INTERFACE m_axi      port=img_out       offset=slave  bundle=gmem4

#pragma HLS INTERFACE s_axilite  port=return

xf::cv::Mat<IN_TYPE, HEIGHT, WIDTH, NPC1> imgInput(rows, cols);
xf::cv::Mat<IN_TYPE, HEIGHT, WIDTH, NPC1> rgb2hsv(rows, cols);
xf::cv::Mat<OUT_TYPE, HEIGHT, WIDTH, NPC1> imgHelper1(rows, cols);
xf::cv::Mat<OUT_TYPE, HEIGHT, WIDTH, NPC1> imgHelper2(rows, cols);
xf::cv::Mat<OUT_TYPE, HEIGHT, WIDTH, NPC1> imgHelper3(rows, cols);
xf::cv::Mat<OUT_TYPE, HEIGHT, WIDTH, NPC1> imgHelper4(rows, cols);
xf::cv::Mat<OUT_TYPE, HEIGHT, WIDTH, NPC1> imgOutput(rows, cols);

// Copy the shape data:
unsigned char _kernel[FILTER_SIZE * FILTER_SIZE];
for (unsigned int i = 0; i < FILTER_SIZE * FILTER_SIZE; ++i) {

    #pragma HLS PIPELINE
    // clang-format on
    _kernel[i] = process_shape[i];
}

#pragma HLS DATAFLOW
// clang-format on
// Retrieve xf::cv::Mat objects from img_in data:
xf::cv::Array2xfMat<PTR_IN_WIDTH, IN_TYPE, HEIGHT, WIDTH, NPC1>(img_in, imgInput);

// Convert RGBA to HSV:
xf::cv::bgr2hsv<IN_TYPE, HEIGHT, WIDTH, NPC1>(imgInput, rgb2hsv);

// Do the color thresholding:
xf::cv::colorthresholding<IN_TYPE, OUT_TYPE, MAXCOLORS, HEIGHT, WIDTH, NPC1>(rgb2hsv, imgHelper1, low_thresh,
                                                                             high_thresh);

// Use erode and dilate to fully mark color areas:
xf::cv::erode<XF_BORDER_CONSTANT, OUT_TYPE, HEIGHT, WIDTH, XF_KERNEL_SHAPE, FILTER_SIZE, FILTER_SIZE, ITERATIONS,
              NPC1>(imgHelper1, imgHelper2, _kernel);
xf::cv::dilate<XF_BORDER_CONSTANT, OUT_TYPE, HEIGHT, WIDTH, XF_KERNEL_SHAPE, FILTER_SIZE, FILTER_SIZE, ITERATIONS,
               NPC1>(imgHelper2, imgHelper3, _kernel);
xf::cv::dilate<XF_BORDER_CONSTANT, OUT_TYPE, HEIGHT, WIDTH, XF_KERNEL_SHAPE, FILTER_SIZE, FILTER_SIZE, ITERATIONS,
               NPC1>(imgHelper3, imgHelper4, _kernel);
xf::cv::erode<XF_BORDER_CONSTANT, OUT_TYPE, HEIGHT, WIDTH, XF_KERNEL_SHAPE, FILTER_SIZE, FILTER_SIZE, ITERATIONS,
              NPC1>(imgHelper4, imgOutput, _kernel);

// Convert _dst xf::cv::Mat object to output array:
xf::cv::xfMat2Array<PTR_OUT_WIDTH, OUT_TYPE, HEIGHT, WIDTH, NPC1>(imgOutput, img_out);

return;

    } // End of kernel

In the given example, the source image is passed to the xf::cv::BGR2HSV function, the output of that function is passed to the xf::cv::colorthresholding module, the thresholded image is passed to the xf::cv::erode function and, the xf::cv::dilate functions and the final output image are returned.

Difference of Gaussian Filter¶

The Difference of Gaussian Filter example uses four hardware functions from the Vitis vision library. They are:

xf::cv::GaussianBlur
xf::cv::duplicateMat
xf::cv::subtract

The Difference of Gaussian Filter function can be implemented by applying Gaussian Filter on the original source image, and that Gaussian blurred image is duplicated as two images. The Gaussian blur function is applied to one of the duplicated images, whereas the other one is stored as it is. Later, perform the Subtraction function on, two times Gaussian applied image and one of the duplicated image.

The following example demonstrates the Difference of Gaussian Filter example.

    void gaussiandiference(ap_uint<PTR_WIDTH>* img_in, float sigma, ap_uint<PTR_WIDTH>* img_out, int rows, int cols) {

#pragma HLS INTERFACE m_axi      port=img_in        offset=slave  bundle=gmem0
#pragma HLS INTERFACE m_axi      port=img_out       offset=slave  bundle=gmem1
#pragma HLS INTERFACE s_axilite  port=sigma
    #pragma HLS INTERFACE s_axilite  port=rows
    #pragma HLS INTERFACE s_axilite  port=cols
#pragma HLS INTERFACE s_axilite  port=return


xf::cv::Mat<TYPE, HEIGHT, WIDTH, NPC1> imgInput(rows, cols);
xf::cv::Mat<TYPE, HEIGHT, WIDTH, NPC1> imgin1(rows, cols);
xf::cv::Mat<TYPE, HEIGHT, WIDTH, NPC1> imgin2(rows, cols);
xf::cv::Mat<TYPE, HEIGHT, WIDTH, NPC1, 15360> imgin3(rows, cols);
xf::cv::Mat<TYPE, HEIGHT, WIDTH, NPC1> imgin4(rows, cols);
xf::cv::Mat<TYPE, HEIGHT, WIDTH, NPC1> imgOutput(rows, cols);

#pragma HLS DATAFLOW

// Retrieve xf::cv::Mat objects from img_in data:
xf::cv::Array2xfMat<PTR_WIDTH, TYPE, HEIGHT, WIDTH, NPC1>(img_in, imgInput);

// Run xfOpenCV kernel:
xf::cv::GaussianBlur<FILTER_WIDTH, XF_BORDER_CONSTANT, TYPE, HEIGHT, WIDTH, NPC1>(imgInput, imgin1, sigma);
xf::cv::duplicateMat<TYPE, HEIGHT, WIDTH, NPC1, 15360>(imgin1, imgin2, imgin3);
xf::cv::GaussianBlur<FILTER_WIDTH, XF_BORDER_CONSTANT, TYPE, HEIGHT, WIDTH, NPC1>(imgin2, imgin4, sigma);
xf::cv::subtract<XF_CONVERT_POLICY_SATURATE, TYPE, HEIGHT, WIDTH, NPC1, 15360>(imgin3, imgin4, imgOutput);

// Convert output xf::cv::Mat object to output array:
xf::cv::xfMat2Array<PTR_WIDTH, TYPE, HEIGHT, WIDTH, NPC1>(imgOutput, img_out);

return;
    } // End of kernel

In the given example, the Gaussain Blur function is applied for source image imginput, and resultant image imgin1 is passed to xf::cv::duplicateMat. The imgin2 and imgin3 are the duplicate images of Gaussian applied image. Again gaussian blur is applied to imgin2 and the result is stored in imgin4. Now, perform the subtraction between imgin4 and imgin3, but here imgin3 has to wait up to at least one pixel of imgin4 generation. Finally the subtraction performed on imgin3 and imgin4.

Stereo Vision Pipeline¶

Disparity map generation is one of the first steps in creating a three dimensional map of the environment. The Vitis vision library has components to build an image processing pipeline to compute a disparity map given the camera parameters and inputs from a stereo camera setup.

The two main components involved in the pipeline are stereo rectification and disparity estimation using local block matching method. While disparity estimation using local block matching is a discrete component in Vitis vision, rectification block can be constructed using xf::cv::InitUndistortRectifyMapInverse() and xf::cv::Remap(). The dataflow pipeline is shown below. The camera parameters are an additional input to the pipeline.

The following code is for the pipeline.

    void stereopipeline_accel(ap_uint<INPUT_PTR_WIDTH>* img_L,
                      ap_uint<INPUT_PTR_WIDTH>* img_R,
                      ap_uint<OUTPUT_PTR_WIDTH>* img_disp,
                      float* cameraMA_l,
                      float* cameraMA_r,
                      float* distC_l,
                      float* distC_r,
                      float* irA_l,
                      float* irA_r,
                      int* bm_state_arr,
                      int rows,
                      int cols) {

#pragma HLS INTERFACE m_axi     port=img_L  offset=slave bundle=gmem1
#pragma HLS INTERFACE m_axi     port=img_R  offset=slave bundle=gmem5
#pragma HLS INTERFACE m_axi     port=img_disp  offset=slave bundle=gmem6
#pragma HLS INTERFACE m_axi     port=cameraMA_l  offset=slave bundle=gmem2
#pragma HLS INTERFACE m_axi     port=cameraMA_r  offset=slave bundle=gmem2
#pragma HLS INTERFACE m_axi     port=distC_l  offset=slave bundle=gmem3
#pragma HLS INTERFACE m_axi     port=distC_r  offset=slave bundle=gmem3
#pragma HLS INTERFACE m_axi     port=irA_l  offset=slave bundle=gmem2
#pragma HLS INTERFACE m_axi     port=irA_r  offset=slave bundle=gmem2
#pragma HLS INTERFACE m_axi     port=bm_state_arr  offset=slave bundle=gmem4
#pragma HLS INTERFACE s_axilite port=rows
#pragma HLS INTERFACE s_axilite port=cols
#pragma HLS INTERFACE s_axilite port=return

ap_fixed<32, 12> cameraMA_l_fix[XF_CAMERA_MATRIX_SIZE], cameraMA_r_fix[XF_CAMERA_MATRIX_SIZE],
    distC_l_fix[XF_DIST_COEFF_SIZE], distC_r_fix[XF_DIST_COEFF_SIZE], irA_l_fix[XF_CAMERA_MATRIX_SIZE],
    irA_r_fix[XF_CAMERA_MATRIX_SIZE];

for (int i = 0; i < XF_CAMERA_MATRIX_SIZE; i++) {

    #pragma HLS PIPELINE II=1
    // clang-format on
    cameraMA_l_fix[i] = (ap_fixed<32, 12>)cameraMA_l[i];
    cameraMA_r_fix[i] = (ap_fixed<32, 12>)cameraMA_r[i];
    irA_l_fix[i] = (ap_fixed<32, 12>)irA_l[i];
    irA_r_fix[i] = (ap_fixed<32, 12>)irA_r[i];
}
for (int i = 0; i < XF_DIST_COEFF_SIZE; i++) {

    #pragma HLS PIPELINE II=1
    // clang-format on
    distC_l_fix[i] = (ap_fixed<32, 12>)distC_l[i];
    distC_r_fix[i] = (ap_fixed<32, 12>)distC_r[i];
}

xf::cv::xFSBMState<SAD_WINDOW_SIZE, NO_OF_DISPARITIES, PARALLEL_UNITS> bm_state;
bm_state.preFilterType = bm_state_arr[0];
bm_state.preFilterSize = bm_state_arr[1];
bm_state.preFilterCap = bm_state_arr[2];
bm_state.SADWindowSize = bm_state_arr[3];
bm_state.minDisparity = bm_state_arr[4];
bm_state.numberOfDisparities = bm_state_arr[5];
bm_state.textureThreshold = bm_state_arr[6];
bm_state.uniquenessRatio = bm_state_arr[7];
bm_state.ndisp_unit = bm_state_arr[8];
bm_state.sweepFactor = bm_state_arr[9];
bm_state.remainder = bm_state_arr[10];

int _cm_size = 9, _dc_size = 5;

xf::cv::Mat<XF_8UC1, XF_HEIGHT, XF_WIDTH, XF_NPPC1> mat_L(rows, cols);

xf::cv::Mat<XF_8UC1, XF_HEIGHT, XF_WIDTH, XF_NPPC1> mat_R(rows, cols);

xf::cv::Mat<XF_16UC1, XF_HEIGHT, XF_WIDTH, XF_NPPC1> mat_disp(rows, cols);

xf::cv::Mat<XF_32FC1, XF_HEIGHT, XF_WIDTH, XF_NPPC1> mapxLMat(rows, cols);

xf::cv::Mat<XF_32FC1, XF_HEIGHT, XF_WIDTH, XF_NPPC1> mapyLMat(rows, cols);

xf::cv::Mat<XF_32FC1, XF_HEIGHT, XF_WIDTH, XF_NPPC1> mapxRMat(rows, cols);

xf::cv::Mat<XF_32FC1, XF_HEIGHT, XF_WIDTH, XF_NPPC1> mapyRMat(rows, cols);

xf::cv::Mat<XF_8UC1, XF_HEIGHT, XF_WIDTH, XF_NPPC1> leftRemappedMat(rows, cols);

xf::cv::Mat<XF_8UC1, XF_HEIGHT, XF_WIDTH, XF_NPPC1> rightRemappedMat(rows, cols);


#pragma HLS DATAFLOW

xf::cv::Array2xfMat<INPUT_PTR_WIDTH, XF_8UC1, XF_HEIGHT, XF_WIDTH, XF_NPPC1>(img_L, mat_L);
xf::cv::Array2xfMat<INPUT_PTR_WIDTH, XF_8UC1, XF_HEIGHT, XF_WIDTH, XF_NPPC1>(img_R, mat_R);

xf::cv::InitUndistortRectifyMapInverse<XF_CAMERA_MATRIX_SIZE, XF_DIST_COEFF_SIZE, XF_32FC1, XF_HEIGHT, XF_WIDTH,
                                       XF_NPPC1>(cameraMA_l_fix, distC_l_fix, irA_l_fix, mapxLMat, mapyLMat,
                                                 _cm_size, _dc_size);
xf::cv::remap<XF_REMAP_BUFSIZE, XF_INTERPOLATION_BILINEAR, XF_8UC1, XF_32FC1, XF_8UC1, XF_HEIGHT, XF_WIDTH,
              XF_NPPC1, XF_USE_URAM>(mat_L, leftRemappedMat, mapxLMat, mapyLMat);

xf::cv::InitUndistortRectifyMapInverse<XF_CAMERA_MATRIX_SIZE, XF_DIST_COEFF_SIZE, XF_32FC1, XF_HEIGHT, XF_WIDTH,
                                       XF_NPPC1>(cameraMA_r_fix, distC_r_fix, irA_r_fix, mapxRMat, mapyRMat,
                                                 _cm_size, _dc_size);
xf::cv::remap<XF_REMAP_BUFSIZE, XF_INTERPOLATION_BILINEAR, XF_8UC1, XF_32FC1, XF_8UC1, XF_HEIGHT, XF_WIDTH,
              XF_NPPC1, XF_USE_URAM>(mat_R, rightRemappedMat, mapxRMat, mapyRMat);

xf::cv::StereoBM<SAD_WINDOW_SIZE, NO_OF_DISPARITIES, PARALLEL_UNITS, XF_8UC1, XF_16UC1, XF_HEIGHT, XF_WIDTH,
                 XF_NPPC1, XF_USE_URAM>(leftRemappedMat, rightRemappedMat, mat_disp, bm_state);

xf::cv::xfMat2Array<OUTPUT_PTR_WIDTH, XF_16UC1, XF_HEIGHT, XF_WIDTH, XF_NPPC1>(mat_disp, img_disp);
    }

Blob From Image¶

This example shows how various xfOpenCV funtions can be used to accelerate preprocessing of input images before feeding them to a Deep Neural Network (DNN) accelerator.

This specific application shows how pre-processing for Googlenet_v1 can be accelerated which involves resizing the input image to 224 x 224 size followed by mean subtraction. The two main functions from Vitis vision library which are used to build this pipeline are xf::cv::resize() and xf::cv::preProcess() which operate in dataflow.

The following code shows the top level wrapper containing the xf::cv::resize() and xf::cv::preProcess() calls.

void pp_pipeline_accel(ap_uint<INPUT_PTR_WIDTH> *img_inp, ap_uint<OUTPUT_PTR_WIDTH> *img_out, int rows_in, int cols_in, int rows_out, int cols_out, float params[3*T_CHANNELS], int th1, int th2)
{
//HLS Interface pragmas
#pragma HLS INTERFACE m_axi     port=img_inp  offset=slave bundle=gmem1
#pragma HLS INTERFACE m_axi     port=img_out  offset=slave bundle=gmem2
#pragma HLS INTERFACE m_axi     port=params  offset=slave bundle=gmem3

#pragma HLS INTERFACE s_axilite port=rows_in     bundle=control
#pragma HLS INTERFACE s_axilite port=cols_in     bundle=control
#pragma HLS INTERFACE s_axilite port=rows_out     bundle=control
#pragma HLS INTERFACE s_axilite port=cols_out     bundle=control
#pragma HLS INTERFACE s_axilite port=th1     bundle=control
#pragma HLS INTERFACE s_axilite port=th2     bundle=control

#pragma HLS INTERFACE s_axilite port=return   bundle=control

xf::cv::Mat<XF_8UC3, HEIGHT, WIDTH, NPC1>   imgInput0(rows_in, cols_in);


xf::cv::Mat<TYPE, NEWHEIGHT, NEWWIDTH, NPC_T> out_mat(rows_out, cols_out);


    hls::stream<ap_uint<256> > resizeStrmout;
    int srcMat_cols_align_npc = ((out_mat.cols + (NPC_T - 1)) >> XF_BITSHIFT(NPC_T)) << XF_BITSHIFT(NPC_T);

    #pragma HLS DATAFLOW

    xf::cv::Array2xfMat<INPUT_PTR_WIDTH,XF_8UC3,HEIGHT, WIDTH, NPC1>  (img_inp, imgInput0);
    xf::cv::resize<INTERPOLATION,TYPE,HEIGHT,WIDTH,NEWHEIGHT,NEWWIDTH,NPC_T,MAXDOWNSCALE> (imgInput0, out_mat);
    xf::cv::accel_utils obj;
    obj.xfMat2hlsStrm<INPUT_PTR_WIDTH, TYPE, NEWHEIGHT, NEWWIDTH, NPC_T, (NEWWIDTH*NEWHEIGHT/8)>(out_mat, resizeStrmout, srcMat_cols_align_npc);
    xf::cv::preProcess <INPUT_PTR_WIDTH, OUTPUT_PTR_WIDTH, T_CHANNELS, CPW, HEIGHT, WIDTH, NPC_TEST, PACK_MODE, X_WIDTH, ALPHA_WIDTH, BETA_WIDTH, GAMMA_WIDTH, OUT_WIDTH, X_IBITS, ALPHA_IBITS, BETA_IBITS, GAMMA_IBITS, OUT_IBITS, SIGNED_IN, OPMODE> (resizeStrmout, img_out, params, rows_out, cols_out, th1, th2);

}

This piepeline is integrated with Deep learning Processign Unit(DPU) as part of Vitis-AI-Library and achieved 11 % speed up compared to software pre-procesing.

Overall Performance (Images/sec):
with software pre-processing : 125 images/sec
with hardware accelerated pre-processing : 140 images/sec

Letterbox¶

The Letterbox algorithm is used for scaling input image to desired output size while preserving aspect ratio of original image. If required, zeroes are padded for preserving the aspect ratio post resize.

An application of letterbox is in the pre-processing block of machine learning pipelines used in image processing.

The following example demonstrates the Letterbox algorithm.

void letterbox_accel(ap_uint<INPUT_PTR_WIDTH>* img_inp,
                    ap_uint<OUTPUT_PTR_WIDTH>* img_out,
                    int rows_in,
                    int cols_in,
                    int rows_out,
                    int cols_out,
                    int insert_pad_value) {

                    #pragma HLS INTERFACE m_axi     port=img_inp  offset=slave bundle=gmem1
                    #pragma HLS INTERFACE m_axi     port=img_out  offset=slave bundle=gmem2
                    #pragma HLS INTERFACE s_axilite port=rows_in
                    #pragma HLS INTERFACE s_axilite port=cols_in
                    #pragma HLS INTERFACE s_axilite port=rows_out
                    #pragma HLS INTERFACE s_axilite port=cols_out
                    #pragma HLS INTERFACE s_axilite port=insert_pad_value
                    #pragma HLS INTERFACE s_axilite port=return


                    // Compute Resize output image size for Letterbox
                    float scale_height = (float)rows_out/(float)rows_in;
                    float scale_width = (float)cols_out/(float)cols_in;
                    int rows_out_resize, cols_out_resize;
                    if(scale_width<scale_height){
                            cols_out_resize = cols_out;
                            rows_out_resize = (int)((float)(rows_in*cols_out)/(float)cols_in);
                    }
                    else{
                            cols_out_resize = (int)((float)(cols_in*rows_out)/(float)rows_in);
                            rows_out_resize = rows_out;
                    }

                    xf::cv::Mat<TYPE, HEIGHT, WIDTH, NPC_T> imgInput0(rows_in, cols_in);
                    xf::cv::Mat<TYPE, NEWHEIGHT, NEWWIDTH, NPC_T> out_mat_resize(rows_out_resize, cols_out_resize);
                    xf::cv::Mat<TYPE, NEWHEIGHT, NEWWIDTH, NPC_T> out_mat(rows_out, cols_out);

                    #pragma HLS DATAFLOW

                    xf::cv::Array2xfMat<INPUT_PTR_WIDTH,XF_8UC3,HEIGHT, WIDTH, NPC_T>  (img_inp, imgInput0);
                    xf::cv::resize<INTERPOLATION,TYPE,HEIGHT,WIDTH,NEWHEIGHT,NEWWIDTH,NPC_T,MAXDOWNSCALE> (imgInput0, out_mat_resize);
                    xf::cv::insertBorder<TYPE, NEWHEIGHT, NEWWIDTH, NEWHEIGHT, NEWWIDTH, NPC_T>(out_mat_resize, out_mat, insert_pad_value);
                    xf::cv::xfMat2Array<OUTPUT_PTR_WIDTH, TYPE, NEWHEIGHT, NEWWIDTH, NPC_T>(out_mat, img_out);
                    return;
                    }// end kernel

The Letterbox example uses two hardware functions from the Vitis vision library. They are:

xf::cv::resize
xf::cv::insertBorder

In the given example, the source image is passed to the xf::cv::resize function. The output of that function is passed to the xf::cv::insertBorder module and the final output image are returned.

Insert Border API Syntax

template <
    int TYPE,
    int SRC_ROWS,
    int SRC_COLS,
    int DST_ROWS,
    int DST_COLS,
    int NPC
    >
void insertBorder (
    xf::cv::Mat <TYPE, SRC_ROWS, SRC_COLS, NPC>& _src,
    xf::cv::Mat <TYPE, DST_ROWS, DST_COLS, NPC>& _dst,
    int insert_pad_val
    )

Parameters:

TYPE	input and ouput type
SRC_ROWS	rows of the input image
SRC_COLS	cols of the input image
DST_ROWS	rows of the output image
DST_COLS	cols of the output image
NPC	number of pixels processed per cycle
_src	input image
_dst	output image
insert_pad_val	insert pad value

Image Sensor Processing pipeline¶

Image Sensor Processing (ISP) is a pipeline of image processing functions processing the raw image from the sensor.

Current ISP includes following 4 blocks:

BPC (Bad pixel correction) : An image sensor may have a certain number of defective/bad pixels that may be the result of manufacturing faults or variations in pixel voltage levels based on temperature or exposure. Bad pixel correction module removes defective pixels.
Gain Control : The Gain control module improves the overall brightness of the image.
Demosaicing : The demosaic module reconstructs RGB pixels from the input Bayer image (RGGB,BGGR,RGBG,GRGB).
Auto white balance: The AWB module improves color balance of the image by using image statistics.

Current design example demonstrates how to use ISP functions in a pipeline. User can include other modules (like gamma correction, color conversion, resize etc) based on their need.

The following example demonstrates the ISP pipeline.

void ISPPipeline_accel(ap_uint<INPUT_PTR_WIDTH>* img_inp, ap_uint<OUTPUT_PTR_WIDTH>* img_out, int height, int width) {

#pragma HLS INTERFACE m_axi     port=img_inp  offset=slave bundle=gmem1
#pragma HLS INTERFACE m_axi     port=img_out  offset=slave bundle=gmem2
#pragma HLS INTERFACE s_axilite port=height
#pragma HLS INTERFACE s_axilite port=width
#pragma HLS INTERFACE s_axilite port=return
#pragma HLS ARRAY_PARTITION variable=hist0 complete dim=1
#pragma HLS ARRAY_PARTITION variable=hist1 complete dim=1

if (!flag) {
        ISPpipeline(img_inp, img_out, height, width, hist0, hist1);
        flag = 1;
} else {
        ISPpipeline(img_inp, img_out, height, width, hist1, hist0);
        flag = 0;
}
}
void ISPpipeline(ap_uint<INPUT_PTR_WIDTH>* img_inp,
                                 ap_uint<OUTPUT_PTR_WIDTH>* img_out,
                                 int height,
                                 int width,
                                 uint32_t hist0[3][256],
                                 uint32_t hist1[3][256]) {
#pragma HLS INLINE OFF
        xf::cv::Mat<XF_SRC_T, XF_HEIGHT, XF_WIDTH, XF_NPPC> imgInput1(height, width);
        xf::cv::Mat<XF_SRC_T, XF_HEIGHT, XF_WIDTH, XF_NPPC> bpc_out(height, width);
        xf::cv::Mat<XF_SRC_T, XF_HEIGHT, XF_WIDTH, XF_NPPC> gain_out(height, width);
        xf::cv::Mat<XF_DST_T, XF_HEIGHT, XF_WIDTH, XF_NPPC> demosaic_out(height, width);
        xf::cv::Mat<XF_DST_T, XF_HEIGHT, XF_WIDTH, XF_NPPC> impop(height, width);
        xf::cv::Mat<XF_DST_T, XF_HEIGHT, XF_WIDTH, XF_NPPC> _dst(height, width);

#pragma HLS stream variable=bpc_out.data dim=1 depth=2
#pragma HLS stream variable=gain_out.data dim=1 depth=2
#pragma HLS stream variable=demosaic_out.data dim=1 depth=2
#pragma HLS stream variable=imgInput1.data dim=1 depth=2
#pragma HLS stream variable=impop.data dim=1 depth=2
#pragma HLS stream variable=_dst.data dim=1 depth=2

#pragma HLS DATAFLOW


        float inputMin = 0.0f;
        float inputMax = 255.0f;
        float outputMin = 0.0f;
        float outputMax = 255.0f;
        float p = 2.0f;

        xf::cv::Array2xfMat<INPUT_PTR_WIDTH, XF_SRC_T, XF_HEIGHT, XF_WIDTH, XF_NPPC>(img_inp, imgInput1);
        xf::cv::badpixelcorrection<XF_SRC_T, XF_HEIGHT, XF_WIDTH, XF_NPPC, 0, 0>(imgInput1, bpc_out);
        xf::cv::gaincontrol<XF_BAYER_PATTERN, XF_SRC_T, XF_HEIGHT, XF_WIDTH, XF_NPPC>(bpc_out, gain_out);
        xf::cv::demosaicing<XF_BAYER_PATTERN, XF_SRC_T, XF_DST_T, XF_HEIGHT, XF_WIDTH, XF_NPPC, 0>(gain_out, demosaic_out);
        xf::cv::AWBhistogram<XF_DST_T, XF_DST_T, XF_HEIGHT, XF_WIDTH, XF_NPPC, WB_TYPE>(
                demosaic_out, impop, hist0, p, inputMin, inputMax, outputMin, outputMax);
        xf::cv::AWBNormalization<XF_DST_T, XF_DST_T, XF_HEIGHT, XF_WIDTH, XF_NPPC, WB_TYPE>(impop, _dst, hist1, p, inputMin,
                                                                                                                                                                                inputMax, outputMin, outputMax);
        xf::cv::xfMat2Array<OUTPUT_PTR_WIDTH, XF_DST_T, XF_HEIGHT, XF_WIDTH, XF_NPPC>(_dst, img_out);
}

Image Sensor Processing pipeline - 2020.2 version¶

This ISP includes following 8 blocks:

Black level correction : Black level leads to the whitening of image in dark region and perceived loss of overall contrast. The Blacklevelcorrection algorithm corrects the black and white levels of the overall image.
BPC (Bad pixel correction) : An image sensor may have a certain number of defective/bad pixels that may be the result of manufacturing faults or variations in pixel voltage levels based on temperature or exposure. Bad pixel correction module removes defective pixels.
Gain Control : The Gain control module improves the overall brightness of the image.
Demosaicing : The demosaic module reconstructs RGB pixels from the input Bayer image (RGGB,BGGR,RGBG,GRGB).
Auto white balance: The AWB module improves color balance of the image by using image statistics.
Colorcorrection matrix : corrects color suitable for display or video system.
Quantization and Dithering : Quantization and Dithering performs the uniform quantization to also reduce higher bit depth to lower bit depths.
Autoexposurecorrection : This function automatically attempts to correct the exposure level of captured image and also improves contrast of the image.

Current design example demonstrates how to use ISP functions in a pipeline. User can include other modules (like gamma correction, color conversion, resize etc) based on their need.

The following example demonstrates the ISP pipeline with above list of functions.

void ISPPipeline_accel(ap_uint<INPUT_PTR_WIDTH>* img_inp, ap_uint<OUTPUT_PTR_WIDTH>* img_out, int height, int width) {

#pragma HLS INTERFACE m_axi     port=img_inp  offset=slave bundle=gmem1
#pragma HLS INTERFACE m_axi     port=img_out  offset=slave bundle=gmem2

#pragma HLS ARRAY_PARTITION variable=hist0 complete dim=1
#pragma HLS ARRAY_PARTITION variable=hist1 complete dim=1

        if (!flag) {
                ISPpipeline(img_inp, img_out, height, width, hist0, hist1, histogram0, histogram1, igain_0, igain_1);
                flag = 1;

        } else {
                ISPpipeline(img_inp, img_out, height, width, hist1, hist0, histogram1, histogram0, igain_1, igain_0);
                flag = 0;
        }
}

void ISPpipeline(ap_uint<INPUT_PTR_WIDTH>* img_inp,
                                ap_uint<OUTPUT_PTR_WIDTH>* img_out,
                                unsigned short height,
                                unsigned short width,
                                uint32_t hist0[3][HIST_SIZE],
                                uint32_t hist1[3][HIST_SIZE],
                                uint32_t hist_aec1[1][256],
                                uint32_t hist_aec2[1][256],
                                int gain0[3], int gain1[3]) {

        #pragma HLS INLINE OFF

                xf::cv::Mat<XF_SRC_T, XF_HEIGHT, XF_WIDTH, XF_NPPC> imgInput1(height, width);
                xf::cv::Mat<XF_SRC_T, XF_HEIGHT, XF_WIDTH, XF_NPPC> imgInput2(height, width);
                xf::cv::Mat<XF_SRC_T, XF_HEIGHT, XF_WIDTH, XF_NPPC> bpc_out(height, width);
                xf::cv::Mat<XF_SRC_T, XF_HEIGHT, XF_WIDTH, XF_NPPC> gain_out(height, width);
                xf::cv::Mat<XF_DST_T, XF_HEIGHT, XF_WIDTH, XF_NPPC> demosaic_out(height, width);
                xf::cv::Mat<XF_DST_T, XF_HEIGHT, XF_WIDTH, XF_NPPC> impop(height, width);
                xf::cv::Mat<XF_DST_T, XF_HEIGHT, XF_WIDTH, XF_NPPC> ltm_in(height, width);
                xf::cv::Mat<XF_DST_T, XF_HEIGHT, XF_WIDTH, XF_NPPC> lsc_out(height, width);
                xf::cv::Mat<XF_LTM_T, XF_HEIGHT, XF_WIDTH, XF_NPPC> _dst(height, width);
                xf::cv::Mat<XF_LTM_T, XF_HEIGHT, XF_WIDTH, XF_NPPC> aecin(height, width);

                #pragma HLS stream variable=bpc_out.data dim=1 depth=2
                #pragma HLS stream variable=gain_out.data dim=1 depth=2
                #pragma HLS stream variable=demosaic_out.data dim=1 depth=2
                #pragma HLS stream variable=imgInput1.data dim=1 depth=2
                #pragma HLS stream variable=imgInput2.data dim=1 depth=2
                #pragma HLS stream variable=impop.data dim=1 depth=2
                #pragma HLS stream variable=_dst.data dim=1 depth=2
                #pragma HLS stream variable=ltm_in.data dim=1 depth=2
                #pragma HLS stream variable=lsc_out.data dim=1 depth=2
                #pragma HLS stream variable=aecin.data dim=1 depth=2

            #pragma HLS DATAFLOW

                float inputMin = 0.0f;
                float inputMax = (1 << (XF_DTPIXELDEPTH(XF_SRC_T, XF_NPPC))) - 1; // 65535.0f;
                float outputMin = 0.0f;
                float outputMax = (1 << (XF_DTPIXELDEPTH(XF_SRC_T, XF_NPPC))) - 1; // 65535.0f;
                float p = 0.2f;
                float thresh = 0.6f;

                float mul_fact = (inputMax / (inputMax - BLACK_LEVEL));

                xf::cv::Array2xfMat<INPUT_PTR_WIDTH, XF_SRC_T, XF_HEIGHT, XF_WIDTH, XF_NPPC>(img_inp, imgInput1);
                xf::cv::blackLevelCorrection<XF_SRC_T, XF_HEIGHT, XF_WIDTH, XF_NPPC, 16, 15, 1>(imgInput1, imgInput2, BLACK_LEVEL,
                                                                                                                                                                                mul_fact);

                xf::cv::badpixelcorrection<XF_SRC_T, XF_HEIGHT, XF_WIDTH, XF_NPPC, 0, 0>(imgInput2, bpc_out);
                xf::cv::gaincontrol<XF_BAYER_PATTERN, XF_SRC_T, XF_HEIGHT, XF_WIDTH, XF_NPPC>(bpc_out, gain_out);
                xf::cv::demosaicing<XF_BAYER_PATTERN, XF_SRC_T, XF_DST_T, XF_HEIGHT, XF_WIDTH, XF_NPPC, 0>(gain_out, demosaic_out);

                if (WB_TYPE) {
                        xf::cv::AWBhistogram<XF_DST_T, XF_DST_T, XF_HEIGHT, XF_WIDTH, XF_NPPC, WB_TYPE, HIST_SIZE>(
                                demosaic_out, impop, hist0, thresh, inputMin, inputMax, outputMin, outputMax);
                        xf::cv::AWBNormalization<XF_DST_T, XF_DST_T, XF_HEIGHT, XF_WIDTH, XF_NPPC, WB_TYPE, HIST_SIZE>(
                                impop, ltm_in, hist1, thresh, inputMin, inputMax, outputMin, outputMax);
                } else {
                        xf::cv::AWBChannelGain<XF_DST_T, XF_DST_T, XF_HEIGHT, XF_WIDTH, XF_NPPC, 0>(demosaic_out, impop, p, gain0);
                        xf::cv::AWBGainUpdate<XF_DST_T, XF_DST_T, XF_HEIGHT, XF_WIDTH, XF_NPPC, 0>(impop, ltm_in, p, gain1);
                }

                xf::cv::colorcorrectionmatrix<XF_CCM_TYPE, XF_DST_T, XF_DST_T, XF_HEIGHT, XF_WIDTH, XF_NPPC>(ltm_in, lsc_out);

                xf::cv::xf_QuatizationDithering<XF_DST_T, XF_LTM_T, XF_HEIGHT, XF_WIDTH, 256, 65536, XF_NPPC>(lsc_out, aecin);

                if (AEC_EN) {
                        xf::cv::autoexposurecorrection<XF_LTM_T, XF_LTM_T, SIN_CHANNEL_TYPE, XF_HEIGHT, XF_WIDTH, XF_NPPC>(
                                aecin, _dst, hist_aec1, hist_aec2);

                        xf::cv::xfMat2Array<OUTPUT_PTR_WIDTH, XF_LTM_T, XF_HEIGHT, XF_WIDTH, XF_NPPC>(_dst, img_out);
                }

                xf::cv::xfMat2Array<OUTPUT_PTR_WIDTH, XF_LTM_T, XF_HEIGHT, XF_WIDTH, XF_NPPC>(aecin, img_out);
        }

Image Sensor Processing pipeline - 2021.1 version¶

This ISP includes following blocks:

Black level correction : Black level leads to the whitening of image in dark region and perceived loss of overall contrast. The Blacklevelcorrection algorithm corrects the black and white levels of the overall image.
BPC (Bad pixel correction) : An image sensor may have a certain number of defective/bad pixels that may be the result of manufacturing faults or variations in pixel voltage levels based on temperature or exposure. Bad pixel correction module removes defective pixels.
Gain Control : The Gain control module improves the overall brightness of the image.
Demosaicing : The demosaic module reconstructs RGB pixels from the input Bayer image (RGGB,BGGR,RGBG,GRGB).
Auto white balance: The AWB module improves color balance of the image by using image statistics.
Colorcorrection matrix : corrects color suitable for display or video system.
Quantization and Dithering : Quantization and Dithering performs the uniform quantization to also reduce higher bit depth to lower bit depths.
Gamma correction : Gamma correction improves the overall brightness of image.
Color space conversion : Converting RGB image to YUV422(YUYV) image for HDMI display purpose.RGB2YUYV converts the RGB image into Y channel for every pixel and U and V for alternate pixels.

Current design example demonstrates how to use ISP functions in a pipeline.

User can dynamically configure the below parameters to the pipeline.

Runtime parameters for the pipeline¶
Parameter	Description
rgain	To configure gain value for the red channel.
bgain	To configure gain value for the blue channel.
gamma_lut	Lookup table for gamma values.first 256 will be R, next 256 values are G gamma and last 256 values are B values
mode_reg	Flag to enable/disable AWB algorithm
pawb	%top and %bottom pixels are ignored while computing min and max to improve quality.
rows	The number of rows in the image or height of the image.
cols	The number of columns in the image or width of the image.

User can also use below compile time parameters to the pipeline.

Compiletime parameters for the pipeline¶
Parameter	Description
XF_HEIGHT	Maximum height of input and output image
XF_WIDTH	Maximum width of input and output image (Must be multiple of NPC)
XF_BAYER_PATTERN	The Bayer format of the RAW input image. supported formats are RGGB,BGGR,GBRG,GRBG.
XF_SRC_T	Input pixel type,Supported pixel widths are 8,10,12,16

The following example demonstrates the ISP pipeline with above list of functions.

void ISPPipeline_accel(ap_uint<INPUT_PTR_WIDTH>* img_inp,
                ap_uint<OUTPUT_PTR_WIDTH>* img_out,
                int height,
                int width,
                uint16_t rgain,
                uint16_t bgain,
                unsigned char gamma_lut[256 * 3],
                unsigned char mode_reg,
                uint16_t pawb) {

                        #pragma HLS INTERFACE m_axi     port=img_inp  offset=slave bundle=gmem1
                        #pragma HLS INTERFACE m_axi     port=img_out  offset=slave bundle=gmem2

                        #pragma HLS ARRAY_PARTITION variable=hist0_awb complete dim=1
                        #pragma HLS ARRAY_PARTITION variable=hist1_awb complete dim=1

                        if (!flag) {
                                ISPpipeline(img_inp, img_out, height, width, hist0_awb, hist1_awb, igain_0, igain_1, rgain, bgain, gamma_lut,
                                                        mode_reg, pawb);
                                flag = 1;

                        } else {
                                ISPpipeline(img_inp, img_out, height, width, hist1_awb, hist0_awb, igain_1, igain_0, rgain, bgain, gamma_lut,
                                                        mode_reg, pawb);
                                flag = 0;
                        }
                }

void ISPpipeline(ap_uint<INPUT_PTR_WIDTH>* img_inp,
                        ap_uint<OUTPUT_PTR_WIDTH>* img_out,
                        unsigned short height,
                        unsigned short width,
                        uint32_t hist0[3][HIST_SIZE],
                        uint32_t hist1[3][HIST_SIZE],
                        int gain0[3],
                        int gain1[3],
                        uint16_t rgain,
                        uint16_t bgain,
                        unsigned char gamma_lut[256 * 3],
                        unsigned char mode_reg,
                        uint16_t pawb) {

        #pragma HLS INLINE OFF

                xf::cv::Mat<XF_SRC_T, XF_HEIGHT, XF_WIDTH, XF_NPPC> imgInput1(height, width);
                xf::cv::Mat<XF_SRC_T, XF_HEIGHT, XF_WIDTH, XF_NPPC> imgInput2(height, width);
                xf::cv::Mat<XF_SRC_T, XF_HEIGHT, XF_WIDTH, XF_NPPC> bpc_out(height, width);
                xf::cv::Mat<XF_SRC_T, XF_HEIGHT, XF_WIDTH, XF_NPPC> gain_out(height, width);
                xf::cv::Mat<XF_DST_T, XF_HEIGHT, XF_WIDTH, XF_NPPC> demosaic_out(height, width);
                xf::cv::Mat<XF_DST_T, XF_HEIGHT, XF_WIDTH, XF_NPPC> impop(height, width);
                xf::cv::Mat<XF_DST_T, XF_HEIGHT, XF_WIDTH, XF_NPPC> ltm_in(height, width);
                xf::cv::Mat<XF_DST_T, XF_HEIGHT, XF_WIDTH, XF_NPPC> lsc_out(height, width);
                xf::cv::Mat<XF_LTM_T, XF_HEIGHT, XF_WIDTH, XF_NPPC> _dst(height, width);
                xf::cv::Mat<XF_LTM_T, XF_HEIGHT, XF_WIDTH, XF_NPPC> aecin(height, width);
                xf::cv::Mat<XF_16UC1, XF_HEIGHT, XF_WIDTH, XF_NPPC> _imgOutput(height, width);


        #pragma HLS DATAFLOW

                const int Q_VAL = 1 << (XF_DTPIXELDEPTH(XF_SRC_T, XF_NPPC));
                float thresh = (float)pawb / 256;
                float inputMax = (1 << (XF_DTPIXELDEPTH(XF_SRC_T, XF_NPPC))) - 1; // 65535.0f;
                float mul_fact = (inputMax / (inputMax - BLACK_LEVEL));

                xf::cv::Array2xfMat<INPUT_PTR_WIDTH, XF_SRC_T, XF_HEIGHT, XF_WIDTH, XF_NPPC>(img_inp, imgInput1);
                xf::cv::blackLevelCorrection<XF_SRC_T, XF_HEIGHT, XF_WIDTH, XF_NPPC, 16, 15, 1>(imgInput1, imgInput2, BLACK_LEVEL,mul_fact);
                xf::cv::gaincontrol<XF_BAYER_PATTERN, XF_SRC_T, XF_HEIGHT, XF_WIDTH, XF_NPPC>(imgInput2, gain_out, rgain, bgain);
                xf::cv::demosaicing<XF_BAYER_PATTERN, XF_SRC_T, XF_DST_T, XF_HEIGHT, XF_WIDTH, XF_NPPC, 0>(gain_out, demosaic_out);
                function_awb<XF_DST_T, XF_DST_T, XF_HEIGHT, XF_WIDTH, XF_NPPC>(demosaic_out, ltm_in, hist0, hist1, gain0, gain1,height, width, mode_reg, thresh);
                xf::cv::colorcorrectionmatrix<XF_CCM_TYPE, XF_DST_T, XF_DST_T, XF_HEIGHT, XF_WIDTH, XF_NPPC>(ltm_in, lsc_out);
                if (XF_DST_T == XF_8UC3) {
                        fifo_copy<XF_DST_T, XF_LTM_T, XF_HEIGHT, XF_WIDTH, XF_NPPC>(lsc_out, aecin, height, width);
                } else {
                        xf::cv::xf_QuatizationDithering<XF_DST_T, XF_LTM_T, XF_HEIGHT, XF_WIDTH, 256, Q_VAL, XF_NPPC>(lsc_out, aecin);
                }
                xf::cv::gammacorrection<XF_LTM_T, XF_LTM_T, XF_HEIGHT, XF_WIDTH, XF_NPPC>(aecin, _dst, gamma_lut);
                xf::cv::rgb2yuyv<XF_LTM_T, XF_16UC1, XF_HEIGHT, XF_WIDTH, XF_NPPC>(_dst, _imgOutput);
                xf::cv::xfMat2Array<OUTPUT_PTR_WIDTH, XF_16UC1, XF_HEIGHT, XF_WIDTH, XF_NPPC>(_imgOutput, img_out);
        }

Image Sensor Processing pipeline with HDR¶

This ISP includes HDR function with 2021.1 pipeline with out color space conversion. It takes two exposure frames as inputs(Short exposure frame and Long exposure frame) and after HDR fusion it will return hdr merged output frame. The HDR output goes to ISP 2021.1 pipeline and returns the output RGB image.

HDRMerge : HDRMerge module generates the Hign dynamic range image from a set of different exposure frames. Usually, image sensors has limited dynamic range and it’s difficult to get HDR image with single image capture. From the sensor, the frames are collected with different exposure times and will get different exposure frames, HDRMerge will generates the HDR frame with those exposure frames.

The following example demonstrates the ISP pipeline with HDR.

void ISPPipeline_accel(ap_uint<INPUT_PTR_WIDTH>* img_inp1,
        ap_uint<INPUT_PTR_WIDTH>* img_inp2,
        ap_uint<OUTPUT_PTR_WIDTH>* img_out,
        int height,
        int width,
        uint16_t rgain,
        uint16_t bgain,
        unsigned char gamma_lut[256 * 3],
        unsigned char mode_reg,
        uint16_t pawb,
        short* wr_hls) {

#pragma HLS INTERFACE m_axi     port=img_inp1  offset=slave bundle=gmem1
#pragma HLS INTERFACE m_axi     port=img_inp2  offset=slave bundle=gmem2
#pragma HLS INTERFACE m_axi     port=img_out  offset=slave bundle=gmem3
#pragma HLS INTERFACE m_axi     port=wr_hls  offset=slave bundle=gmem4

#pragma HLS ARRAY_PARTITION variable=hist0_awb complete dim=1
#pragma HLS ARRAY_PARTITION variable=hist1_awb complete dim=1

        if (!flag) {
                ISPpipeline(img_inp1, img_inp2, img_out, height, width, hist0_awb, hist1_awb, igain_0, igain_1, rgain, bgain,
                                        gamma_lut, mode_reg, pawb, wr_hls);
                flag = 1;

        } else {
                ISPpipeline(img_inp1, img_inp2, img_out, height, width, hist1_awb, hist0_awb, igain_1, igain_0, rgain, bgain,
                                        gamma_lut, mode_reg, pawb, wr_hls);
                flag = 0;
        }
}

void ISPpipeline(ap_uint<INPUT_PTR_WIDTH>* img_inp1,
                        ap_uint<INPUT_PTR_WIDTH>* img_inp2,
                        ap_uint<OUTPUT_PTR_WIDTH>* img_out,
                        unsigned short height,
                        unsigned short width,
                        uint32_t hist0[3][HIST_SIZE],
                        uint32_t hist1[3][HIST_SIZE],
                        int gain0[3],
                        int gain1[3],
                        uint16_t rgain,
                        uint16_t bgain,
                        unsigned char gamma_lut[256 * 3],
                        unsigned char mode_reg,
                        uint16_t pawb,
                        short* wr_hls) {

#pragma HLS INLINE OFF

        xf::cv::Mat<XF_SRC_T, XF_HEIGHT, XF_WIDTH, XF_NPPC> imgInputhdr1(height, width);
        xf::cv::Mat<XF_SRC_T, XF_HEIGHT, XF_WIDTH, XF_NPPC> imgInputhdr2(height, width);
        xf::cv::Mat<XF_SRC_T, XF_HEIGHT, XF_WIDTH, XF_NPPC> imgInput1(height, width);
        xf::cv::Mat<XF_SRC_T, XF_HEIGHT, XF_WIDTH, XF_NPPC> imgInput2(height, width);
        xf::cv::Mat<XF_SRC_T, XF_HEIGHT, XF_WIDTH, XF_NPPC> bpc_out(height, width);
        xf::cv::Mat<XF_SRC_T, XF_HEIGHT, XF_WIDTH, XF_NPPC> gain_out(height, width);
        xf::cv::Mat<XF_DST_T, XF_HEIGHT, XF_WIDTH, XF_NPPC> demosaic_out(height, width);
        xf::cv::Mat<XF_DST_T, XF_HEIGHT, XF_WIDTH, XF_NPPC> impop(height, width);
        xf::cv::Mat<XF_DST_T, XF_HEIGHT, XF_WIDTH, XF_NPPC> ltm_in(height, width);
        xf::cv::Mat<XF_DST_T, XF_HEIGHT, XF_WIDTH, XF_NPPC> lsc_out(height, width);
        xf::cv::Mat<XF_LTM_T, XF_HEIGHT, XF_WIDTH, XF_NPPC> _dst(height, width);
        xf::cv::Mat<XF_LTM_T, XF_HEIGHT, XF_WIDTH, XF_NPPC> aecin(height, width);
        xf::cv::Mat<XF_16UC1, XF_HEIGHT, XF_WIDTH, XF_NPPC> _imgOutput(height, width);


#pragma HLS DATAFLOW

        const int Q_VAL = 1 << (XF_DTPIXELDEPTH(XF_SRC_T, XF_NPPC));
        float thresh = (float)pawb / 256;
        float inputMax = (1 << (XF_DTPIXELDEPTH(XF_SRC_T, XF_NPPC))) - 1; // 65535.0f;
        float mul_fact = (inputMax / (inputMax - BLACK_LEVEL));
        xf::cv::Array2xfMat<INPUT_PTR_WIDTH, XF_SRC_T, XF_HEIGHT, XF_WIDTH, XF_NPPC>(img_inp1, imgInputhdr1);
        xf::cv::Array2xfMat<INPUT_PTR_WIDTH, XF_SRC_T, XF_HEIGHT, XF_WIDTH, XF_NPPC>(img_inp2, imgInputhdr2);

        xf::cv::Hdrmerge_bayer<XF_SRC_T, XF_SRC_T, XF_HEIGHT, XF_WIDTH, XF_NPPC, NO_EXPS, W_B_SIZE>(
                imgInputhdr1, imgInputhdr2, imgInput1, wr_hls);

        xf::cv::blackLevelCorrection<XF_SRC_T, XF_HEIGHT, XF_WIDTH, XF_NPPC, 16, 15, 1>(imgInput1, imgInput2, BLACK_LEVEL,mul_fact);
        xf::cv::gaincontrol<XF_BAYER_PATTERN, XF_SRC_T, XF_HEIGHT, XF_WIDTH, XF_NPPC>(imgInput2, gain_out, rgain, bgain);
        xf::cv::demosaicing<XF_BAYER_PATTERN, XF_SRC_T, XF_DST_T, XF_HEIGHT, XF_WIDTH, XF_NPPC, 0>(gain_out, demosaic_out);
        function_awb<XF_DST_T, XF_DST_T, XF_HEIGHT, XF_WIDTH, XF_NPPC>(demosaic_out, ltm_in, hist0, hist1, gain0, gain1,height, width, mode_reg, thresh);
        xf::cv::colorcorrectionmatrix<XF_CCM_TYPE, XF_DST_T, XF_DST_T, XF_HEIGHT, XF_WIDTH, XF_NPPC>(ltm_in, lsc_out);
        if (XF_DST_T == XF_8UC3) {
                fifo_copy<XF_DST_T, XF_LTM_T, XF_HEIGHT, XF_WIDTH, XF_NPPC>(lsc_out, aecin, height, width);
        } else {
                xf::cv::xf_QuatizationDithering<XF_DST_T, XF_LTM_T, XF_HEIGHT, XF_WIDTH, 256, Q_VAL, XF_NPPC>(lsc_out, aecin);
        }
        xf::cv::gammacorrection<XF_LTM_T, XF_LTM_T, XF_HEIGHT, XF_WIDTH, XF_NPPC>(aecin, _dst, gamma_lut);
        xf::cv::xfMat2Array<OUTPUT_PTR_WIDTH, XF_8UC3, XF_HEIGHT, XF_WIDTH, XF_NPPC>(_dst, img_out);
}