Vitis Hardware Acceleration

See Vitis™ Development Environment on xilinx.com

NOTE:
In this section, the traveling salesperson problem (TSP) design is implemented with 4 parallel memory lookups to fetch distances.

Load the project into Vitis HLS

Open a terminal and navigate to the build directory. Launch the following command which will open the graphical interface of Vitis HLS and configure the project based on the settings included in the hls_opt.tcl Tcl file:

user@server:~$ cd ./build
user@server:~$ vitis_hls -p hls_opt.tcl &

Once the tool comes up, on the left-hand side, locate the Explorer pane, expand proj->Source and double-click on tsp_opt.cpp to inspect the source code. The testbench is found below in proj->TestBench in the file is tsp_TB.cpp.

Review Code Changes

There are now 4 arrays for the distances:

uint16_t distances_0[N][N];
uint16_t distances_1[N][N];
uint16_t distances_2[N][N];
uint16_t distances_3[N][N];

The incoming distance data points are still read one at a time, but they are copied into all 4 memories:

loop_distances: for (int i = 0; i < N*N; ++i)
  {
    uint16_t val;
    streamDistances >> val;
    distances_0[i/N][i%N] = val;
    distances_1[i/N][i%N] = val;
    distances_2[i/N][i%N] = val;
    distances_3[i/N][i%N] = val;
  }

The loop_compute main loop continuously increments by 4 and distributes the 4 values to copies of the compute function. Each evaluates a route:

loop_compute: for( unsigned long int i_ = 0; i_ < factorialN; i_ += 4 )
  {
    #pragma HLS pipeline II=1
    candidate0 = std::min(candidate0, compute(i_+0, distances_0));
    candidate1 = std::min(candidate1, compute(i_+1, distances_1));
    candidate2 = std::min(candidate2, compute(i_+2, distances_2));
    candidate3 = std::min(candidate3, compute(i_+3, distances_3));
  }

Final determination of the shortest distance:

    // Determine shortest between the 4 candidates
    shortestDistance = std::min({ candidate0, candidate1,
                                  candidate2, candidate3 });

Running C-simulation and C-synthesis

First you will run C simulation to confirm the optimized design works as expected. Edit the tsp.h file to make sure the number of cities is small (N=5) for faster simulation run times, and Run C Simulation.

After reviewing the results of simulation, increase the number of cities again (N=13) in the tsp.h file for C synthesis so that you can compare results with the original design. Run C Synthesis.

The following figure shows the C synthesis report in the Vitis HLS GUI (the Performance and Resource Estimates section):
synthesis

You will notice that:

  • The latency for the tsp function is now less than a second

  • The loop distance trip count is unchanged as expected since the input data is same

  • The tripcount for the main loop (loop_compute) is now a fourth of factorial 12 (12!/4) thanks to the parallel execution of the compute function

  • The new Loop 3 originates from the final std::min call that returns the smallest of the 4 results obtained