/setup.sh
```
### Baseline the Application Performance
The software application processes High Definition(HD) video frames/images with 1920x1080 resolution. It performs convolution on a set of images and prints the summary of performance results. It is used for measuring baseline software performance. Please the set the environment variable that points to tutorial direction relative to repo path as follow:
```bash
export CONV_TUTORIAL_DIR=/VITIS_TUTORIAL_REPO_PATH/Hardware_Acceleration/Design_Tutorials/01-convolution-tutorial
```
where **VITIS_TUTORIAL_REPO_PATH** is the local path where git repo is placed by the user after cloning.
**NOTE**: Make sure during all of the labs in this tutorial you have set `CONV_TUTORIAL_DIR` variable appropriately
Run the application to measure performance as follows:
```bash
cd $CONV_TUTORIAL_DIR/sw_run
./run.sh
```
Results similar to the ones shown below will be printed. Note down the CPU throughput.
```bash
----------------------------------------------------------------------------
Number of runs : 60
Image width : 1920
Image height : 1080
Filter type : 6
Generating a random 1920x1080 input image
Running Software version on 60 images
CPU Time : 28.0035 s
CPU Throughput : 12.7112 MB/s
----------------------------------------------------------------------------
```
## Running FPGA Accelerated Application
### Launching the Host Application
Now launch the application, which uses FPGA accelerated video convolution filter. The application will be run on an actual FPGA card, also called System Run.
```bash
cd $CONV_TUTORIAL_DIR
make run
```
The result summary will be similar to the one given below:
```bash
----------------------------------------------------------------------------
Xilinx 2D Filter Example Application (Randomized Input Version)
FPGA binary : ../xclbin/fpgabinary.hw.xclbin
Number of runs : 60
Image width : 1920
Image height : 1080
Filter type : 3
Max requests : 12
Compare perf. : 1
Programming FPGA device
Generating a random 1920x1080 input image
Running FPGA accelerator on 60 images
Running Software version
Comparing results
Test PASSED: Output matches reference
FPGA Time : 0.4240 s
FPGA Throughput : 839.4765 MB/s
CPU Time : 28.9083 s
CPU Throughput : 12.3133 MB/s
FPGA Speedup : 68.1764 x
----------------------------------------------------------------------------
```
### Results
From the host application console output, it is clear that the FPGA accelerated kernel can outperform CPU-only implementation by a factor of 68x. It is a large gain in terms of performance over CPU. The following labs will illustrate how this performance allows processing more than 3 HD video channels with 1080p resolution in parallel. The tutorial describes how to achieve such performance gains by building a kernel and host application written in C++. The host application uses OpenCL APIs and Xilinx Runtime (XRT) underneath it, demonstrating how to unleash this custom-built hardware kernel's computing power effectively.
---------------------------------------
Next Lab Module: Video Convolution Filter : Introduction and Performance Estimation
Copyright© 2020-2021 Xilinx