void SingleStream::FIRinit()
{
for (int i = 0; i < Delay; ++i)
{
get_ss(0);
}
}
```
## Compilation and Analysis
Ensure that the `InitPythonPath` has been sourced in the `Utils` directory.
Navigate to the `MultiKernel` directory. In the `Makefile` three methods are defined:
- `aie`
- Compiles the graph and the kernels
- `aiesim`
- Runs the AI Engine System C simulator
- `aieviz`
- Runs `vitis_analyzer` on the output summary
Have a look at the source code (kernel and graph) to familiarize yourself with the C++ instantiation of kernels. In `graph.cpp` the PL AI Engine connections are declared using 64-bit interfaces running at 500 MHz, allowing for maximum bandwidth on the AI Engine array AXI-Stream network.
To have the simulation running, input data must be generated. Change directory to `data` and type `GenerateStreams`. The following parameter should be set for this example:
![missing image](../Images/GenerateSingleStream.jpg)
Click **Generate** then **Exit**. The generated file `PhaseIn_0.txt` should contain mainly 0's, with a few 1's and 10's.
Type `make all` and wait for `vitis_analyzer` GUI to display. The Vitis analyzer is able to show the graph, how it has been implemented in the device, and the complete timeline of the simulation. In this specific case the graph is very simple (a single kernel) and the implementation is on a single AI Engine.
Click **Graph** to visualize the graph of the application:
![missing image](../Images/Graph4Kernels.jpg)
The four kernels and their four independent input streams are clearly visible. A single input with a FIFO of eight between each AI Engine can also be implemented.
Click **Array** to visualize where the kernel has been placed, and how it is fed from the the PL:
![missing image](../Images/Array4Kernels.jpg)
In this view the cascade streams connecting neighboring AI Engines are key to the performance of this graph.
Finally click **Trace** to look at how the entire simulation went through. This may be useful to track where your AI Engine stalls if performance is not as expected:
![missing image](../Images/Timeline4Kernels.jpg)
Now the output of the filter can be displayed. The input being a set of Dirac impulses, the impulse response of the filter should be recognized throughout the waveform. Navigate to `Emulation-AIE/aiesimulator_output/data` and look at the `Output_0.txt`. You can see that you have two complex outputs per line which is prepended with a time stamp. `ProcessAIEOutput Output_0.txt`.
![missing image](../Images/GraphOutput4Kernels.jpg)
The top graph reflects the outputs where the abscissa is the time at which this output occured. The four frames are clearly localized; there is no output for a number of clock cycles. On the bottom graph, a zoom on the output is displayed and the filter impulse response is recognizable.
The performance of this architecture can be measured using the timestamped output. In the same directory (`Emulation-AIE/aiesimulator_output/data`) type `StreamThroughput Output_0.txt`:
```
Output_0.txt --> 951.67 Msps
-----------------------
Total Throughput --> 951.67 Msps
```
This architecture achieves very close to 1 Gsps performance. It is slightly less because of the number of cycles spent for initialization when the kernels are called (the quiet zones in the output graph). This performance increases when the frame length is increased.
Copyright© 2020–2021 Xilinx
XD020