2021.1 Vitis™ Application Acceleration Development Flow Tutorials

See 2020.2 Vitis Application Acceleration Development Flow Tutorials

Experiencing Acceleration Performance

In this lab, you will experience the acceleration potential by running the application first as a software-only version and then as an optimized FPGA-accelerated version using a precompiled FPGA accelerator.

  1. Run the following command to set up the application.

    # Source the Vitis runtime environment
    export LAB_WORK_DIR=<Downloaded Github repository>/Hardware_Acceleration/Design_Tutorials/02-bloom
    
  2. Next, build the C application:

    1. Navigate to the cpu_src directory.

    2. Use the following command to run the original application with the number of documents as the argument, and generate the golden output file for comparison.

      cd $LAB_WORK_DIR/cpu_src/
      make run
      

      The generated output compute scores are stored in the host code in the cpu_profile_score array that represents the outputs for the total number of specified documents. The results will look similar to the following:

      ./host 100000
      Initializing data
      Creating documents - total size : 1398.903 MBytes (349725824 words)
      Creating profile weights
      
      Total execution time of CPU          |  2949.3867 ms
      Compute Hash processing time         |  2569.3266 ms
      Compute Score processing time        |   380.0601 ms
      --------------------------------------------------------------------
      Execution COMPLETE
      
  3. Run the application on the FPGA. For the purposes of this lab, the FPGA accelerator is implemented with an 8x parallelization factor.

    • Eight input words are processed in parallel, producing eight output flags in parallel during each clock cycle.

      To run the optimized application on the FPGA, run the following make command.

      make run_fpga SOLUTION=1
      

      The following output displays.

      Processing 1398.905 MBytes of data
      Splitting data in 8 sub-buffers of 174.863 MBytes for FPGA processing
      --------------------------------------------------------------------
      Executed FPGA accelerated version  |   427.1341 ms   ( FPGA 230.345 ms )
      Executed Software-Only version     |   3057.6307 ms
      --------------------------------------------------------------------
      Verification: PASS
      

      The computed throughput is:

      Throughput = Total data/Total time = 1.39 GB/427.1341ms = 3.25 GB/s

      By efficiently leveraging FPGA acceleration, the throughput of the application increases by a factor of 7.

Next Steps

In this step, you observed the acceleration that can be achieved using an FPGA. Next, you will architect the application for the application and dive into what functions can be accelerated by profiling the original applications. You will also define the interface boundaries and performance constraints to achieve the desired acceleration.


Return to Getting Started PathwayReturn to Start of Tutorial

Copyright© 2020 Xilinx