Internal Design of jpegDecoder

Overview

This API is decoder supports the ‘Sequential DCT-based mode’ of ISO/IEC 10918-1 standard. It is a high-performance implementation based-on Xilinx HLS design methodolygy. It can process 1 Huffman token and create up to 8 DCT coeffiects within one cycle. It is also an easy-to-use decoder as it can direct parser the JPEG file header without help of software functions.

As an independent IP, L1 API is the key circuit of L2 API. L2 API runs as a kernel demo, which can also show the overall performance of the circuit.

It can be seen from the benchmark of the API that the decoding speed of huffman decoder(L1 IP) is usually faster than that of iDCT(in L2 kernel). In practical applications, jpeg decoder is often used as the front module of the entire codec board.

Algorithm

JPEG Decoder algorithm implementation:

Figure 1 : jpegDecoder kernel work flow

Figure 1 jpegDecoder kernel work flow on FPGA

The output stream will be recoverd to a .yuv file by the host code and be stored to the folder of the input jpeg.

Implemention

The input JPEG and output Features:

Table 1 : jpegDecoder Features

Table 1 jpegDecoder Features
jpegDecoder Status
Input support JPEG that scaned by baseline sequential processing 8-bit precision
Output YUV with the mcu scan order
Output info Image width, height, scan format, quantization tables, number of mcu, other details… the reason for the decoding error if there is
performance decode one Huffman symbol in 1 cycle Output YUV raw data 8 Byte per cycle with the mcu scan order

The algorithm implemention is shown as the figure below:

Figure 2 : jpegDecoder architecture on FPGA

Figure 2 jpegDecoder architecture on FPGA

As we can see from the figure:

The design uses the special statistical characteristics of jpeg compression, that is, in most cases, the (huffman length + value length) is less than 15, and each clock cycle can solve a huffman symbol.

Profiling

The hardware resource utilizations are listed in the following table. Different tool versions may result slightly different resource.

Table 2 : Hardware resources for kernelJpegDecoder

Table 2 Hardware resources for kernelJpegDecoder in U50
Kernel BRAM URAM DSP FF LUT Frequency(MHz)
kernelJpegDecoder 28 0 39 23652 24591 243