Streaming Lab

Introduction

In this lab you will create an streaming kernel which implements a Finite Impulse Response (FIR) filter of 73 taps. However, Alveo/AWS F1 shells do not support direct streaming connection between the host and kernels. In order to support the streaming kernel, you will include datamover kernels which will translate from memory mapped to stream and vice versa.

The following image depicts the three kernels that will be use and how they are connected together.

This labs guides you through the steps to:

This lab has been verified in the following platforms (platform containing the string 2018 are not supported)

FIR

A FIR filter is one of the two primary digital filters. A FIR has a finite response to an impulse.

The following figure shows the conventional discrete tapped delay line filter representation. As you can see the output, y(n), is the weighted sum of the n previous samples. The weight vector, a(n), is also called coefficients and vary deepening on the filter type, sampling frequency and other parameters.

However, in this lab we will use the transposed direct version, which is an improved version of the filter that yields to better performance and does not diverge significantly from the standard version. There are much more efficient implementations, but these will not be cover here.

Steps

Create Vitis project

  1. Create a new Vitis Application Project

  2. In the Platform window select AWS F1 platform

  3. In the Application Project Details type streaming_lab

  4. Finally, select Empty Application

Add source code and hardware functions

  1. In the Explore view, right-click on streaming_lab_system > streaming_lab > src and select Import Sources...

  2. Browse to ~/xup_compute_acceleration/sources/streaming_lab and add host.cpp, xcl2.cpp and xcl2.hpp files

  3. In the Explore view, right-click on streaming_lab_system > streaming_lab_kernels > src and select Import Sources...

  4. Browse to ~/xup_compute_acceleration/sources/streaming_lab and add krnl_fir.cpp, krnl_mm2s.cpp, and krnl_s2mm.hpp files

  5. Double-click on streaming_lab_system > streaming_lab_kernels > streaming_lab_kernels.prj

  6. In the Hardware Functions view, click the Add Hardware Function (alt tag) button icon

  7. Select krnl_fir, krnl_mm2s and krnl_s2mm and click OK

  8. Check that the kernels are included within the Hardware Functions panel

  9. In the Explore view, right-click on streaming_lab_system_hw_link and select Import Sources...

  10. Browse to ~/xup_compute_acceleration/sources/streaming_lab and add linking.cfg file

  11. In the field Into folder:, click Browse... and select streaming_lab_system_hw_link [pl], then click OK and Finish

  12. In the Assistant view, right-click on streaming_lab_system > streaming_lab_system_hw_link > Emulation-SW > binary_container_1 and select Settings...

  13. In the V++ command line options: field type --config ../linking.cfg and click Apply and Close

    The linker option will take effect for all configurations.

  14. Open and analyze the source code of the kernels and host application

    • krnl_mm2s.cpp: reads data from global memory and generates a stream
    • krnl_s2mm.cpp: reads data from an stream and stores the data in global memory
    • krnl_fir.cpp: implements a digital bandpass FIR filter using the transposed form. There are 73 coefficients that are static
    • host.cpp: creates the test vector, instantiates and run the FIR filter
  15. Open and analyze linking.cfg file.

    This file provides information to the tool on how to connect the streaming connections as well as the memory interfaces

Run software emulation

  1. In the Assistant view, select streaming_lab_system and build the application by clicking the build (alt tag) button

  2. In the Explorer view, right-click on streaming_lab_system and select Run As > Run Configurations...

  3. In the Program Arguments make sure that Automatically add binary container(s) to arguments is selected.

  4. Click Apply and then Run

  5. The console output will report

    Found Platform
    Platform Name: Xilinx
    INFO: Reading <path>/binary_container_1.xclbin
    Loading: '<path>/binary_container_1.xclbin'
    Trying to program device[0]: xilinx_aws-vu9p-f1_shell-v04261818_201920_2
    Device[0]: program successful!
    Running FIR filter with 128 samples, each sample is a 32-bit signed element
    Launching Hardware Kernels...
    Getting Hardware Results...
    Computing Software results...
    TEST PASSED
    

Change program arguments

  1. In the Explorer view, right-click on streaming_lab_system and select Run As > Run Configurations...

  2. In the Program Arguments click Edit...

  3. Double-click streaming_lab

  4. In the Arguments box include include: ${project_loc:streaming_lab_system}/Emulation-SW/binary_container_1.xclbin 16 debug

  5. Click OK to set the arguments, click OK again, then click Apply, and finally click Run

    Notice that this time the results are shown with sample number, sw and hw computed results

Run hardware emulation

  1. Select or open the Hardware Kernel Project Settings view and change Active build configuration to: Emulation-HW

  2. In the Assistant view, select streaming_lab_system and build the application by clicking the build alt tag button

  3. Once compiled, Run the Emulation-HW, only specify the binary container as argument

  4. The console output will report

    Found Platform
    Platform Name: Xilinx
    INFO: Reading <path>/binary_container_1.xclbin
    Loading: '<path>/binary_container_1.xclbin'
    Trying to program device[0]: xilinx_aws-vu9p-f1_shell-v04261818_201920_2
    INFO: [HW-EM 01] Hardware emulation runs simulation underneath. Using a large data set will result .............
    Device[0]: program successful!
    Running FIR filter with 2048 samples, each sample is a 32-bit signed element
    Launching Kernel...
    Getting Results...
    TEST PASSED
    INFO::[ Vitis-EM 22 ] [Time elapsed: 0 minute(s) 39 seconds, Emulation time: 0.145998 ms]
    Data transfer between kernel(s) and global memory(s)
    krnl_mm2s_1:m_axi_gmem-DDR[0]          RD = 8.000 KB               WR = 0.000 KB        
    krnl_s2mm_1:m_axi_gmem-DDR[2]          RD = 0.000 KB               WR = 8.000 KB        
    
    Data transfer on stream interfaces
    krnl_fir_1:y-->krnl_s2mm_1:s2m               8.000 KB        
    krnl_mm2s_1:m2s-->krnl_fir_1:x               8.000 KB   
    

    Notice that not only the data memory mapped data transfer is reported, but also, the streaming data transfer. Once again, you can rerun the application with different arguments.

  5. In the Assistant view expand streaming_lab_system > streaming_lab [Host] > Emulation-HW -> SystemDebugger_streaming_lab_system_streaming_lab and double click Run Summary (xclbin)

  6. In the Vitis Analyzer, click System Diagram and notice the memory mapped and streaming connection (dotted lines)

  7. Open Timeline Trace and explore the host and kernel timeline. If you do not see host activity then go to Run Configurations and change runtime configuration to enable Host Code tracing

    Note that when the FIR filter starts producing data it never starves. This is because the design uses two different memory banks. You can explore how the memory bank assignation impacts the performance by editing the linking.cfg file, re building and rerunning the application.

Analysis

Note that for a linear phase response FIR filter, the coefficients are symmetric around the center value. Vitis HLS realizes and halves the number of multiplications. What is more, Vitis HLS analyzes the coefficients and does not implement a multiplications for those coefficients that are a power of 2, or can be conformed as the sum of power of two, e.g, 384 (256 + 128). Vitis HLS analyzes the cost of implementing the multiplication as an addition of power of 2 or implementing using a multiplier. Consequently, out of the 36 symmetric multiplications one has a coefficient 0, and the other four are implemented as sum. That leaves the designing with only 32 multiplications. Since each multiplication is 32-bit x 16-bit and the DSP48e2, harden multipliers, can handle multiplication of 27-bit x 18-bit each multiplication needs two DSP48e2. Vitis HLS is able to perform these types of optimizations because the coefficients are static, for dynamic coefficients Vitis cannot make any assumption and will implement 73 multiplications.

Here is the list of the coefficients that are optimized.

Index Coefficient Composition
4 384 256 , 128
8 -15 -16 , 1
14 20 16 , 4
27 -6 -8 , 2

Run hardware

Since the Hardware build and AFI availability for AWS takes a considerable amount of time, a precompiled and preregistered AWS version is provided for you. Use the precompiled solution directory to verify the functionality

  1. Change Active build configuration: to Hardware

  2. In the Assistant view, right-click on streaming_lab_system > streaming_lab [Host] and select Build

    Note, this will only build the host code.

  3. Copy the precompiled bitstream solution

    cp ~/xup_compute_acceleration/solutions/streaming_lab/* ~/workspace/streaming_lab/Hardware/
    
  4. Run the application and analyze the output using the following commands:

    cd ~/workspace/streaming_lab/Hardware/
    ./streaming_lab binary_container_1.awsxclbin
    
  5. The FPGA bitstream will be downloaded and the host application will be executed showing an output similar to:

    Found Platform
    Platform Name: Xilinx
    INFO: Reading binary_container_1.awsxclbin
    Loading: 'binary_container_1.awsxclbin'
    Trying to program device[0]: xilinx_aws-vu9p-f1_shell-v04261818_201920_2
    Device[0]: program successful!
    Running FIR filter with 4194304 samples, each sample is a 32-bit signed element
    Launching Hardware Kernels...
    Getting Hardware Results...
    Computing Software results...
    TEST PASSED
    

Conclusion

In this lab, you used Vitis to implement an FIR filter using streaming kernels. A configuration file specifies how the streaming interfaces are connected between the kernels. You also analyzed the system diagram and the timeline trace.