JXL Encoder¶
JXL Encoder example resides in L2/demos/jxlEnc
directory. The tutorial provides a step-by-step guide that covers commands for building and running kernel.
Overview¶
JXL Encoder demos contain 3 kernels which show acceleration cases on different parts of JPEG-XL encoder. JxlEnc_lossy_enc_compute kernel mainly responsible for accelerating AC and DC generation and processing. The input is XYB image data and other componets such as maskfield, raw quantfield and aq maps. The output is AC and DC coefficients, and block order and strategy and quantfield for next steps. JxlEnc_ans_initHistogram and JxlEnc_ans_clusterHistogram are two parts of ANS encoding, accelerating on these two kernels responsible for generation of ACtokens and histograms. The internel block design is show as below.
The design of the JxlEnc_lossy_enc_compute kernel is as follows:
LoadData is responsible for load host data to internal stream and pass to next step. Then, a parallel computing of DCT8x8, DCT16x16 and DCT32x32 is processed by VarDCT and the result is sending to acs_heuritic which further compute ac strategy for each image block. CFL is responsible for color correlation of YtoX and YtoB and also pass quantfield and acs to next module. In Compute_CoeffAC, ac coefficients are generated and then ouput to AXI writeout. All order of image blocks are computed after dataflow processing of AC and DC coefficients, its’ result are then send to AXI writeout.
The design of the JxlEnc_ans_initHistogram and JxlEnc_ans_clusterHistogram is as follows:
Kernel JxlEnc_ans_initHistogram and JxlEnc_ans_clusterHistogram are designed for accelerating ANS encoding. The JxlEnc_ans_initHistogram is processed within dataflow and parallely doing AC Tokenize and Histogram initiation. For JxlEnc_ans_clusterHistogram, it is processed in pipeline acceleration and generates all histograms for post-processing in JPEG-XL computing flow.
Executable Usage¶
- Work Directory(Step 1)
The steps for library download and environment setup can be found in Vitis Codec Library. For getting the design,
cd L2/demos/jxlEnc
- Build kernel(Step 2)
Run the following make command to build your XCLBIN and host binary targeting a specific device. Please be noticed that this process will take a long time, maybe couple of hours.
make run TARGET=hw
- Run kernel(Step 3)
To get the benchmark results, please run the following command.
PATH_TO_BUILD/host.exe --xclbin PATH_TO_BUILD/jxlEnc.xclbin PNGFilePath JXLFilePath
Note: “PATH_TO_BUILD” is decided by your chosen “DEVICE=” when running hw build, Default arguments are set in Makefile.
JXL Encoder Input Arguments:
Usage: host.exe -[-xclbin] --xclbin: the kernel name PNGFilePath: the path to the input *.PNG JXLFilePath: the path to the output *.jxl
Note: Default arguments are set in Makefile, you can use other pictures listed in the table.
Profiling¶
The hardware resource utilizations are listed in the following table. Different tool versions may result slightly different resource.
IP | BRAM | URAM | DSP | FF | LUT |
JxlEnc_lossy_enc_compute | 364 | 53 | 498 | 145111 | 121741 |
JxlEnc_ans_clusterHistogram | 70 | 28 | 51 | 60744 | 38507 |
JxlEnc_ans_initHistogram | 150 | 41 | 95 | 64710 | 39289 |
Result¶
Image | Size | Time(ms) | Throughput(MP/s) |
lena_c_512.png | 512x512 | 3.63 | 72.21 |
hq_1024x1024.png | 1024x1024 | 13.06 | 80.29 |
hq_2Kx2K.png | 2048x2048 | 50.33 | 83.34 |
Image | Size | Time(ms) | Throughput(MP/s) |
lena_c_512.png | 512x512 | 4.6 | 56.98 |
hq_1024x1024.png | 1024x1024 | 14.6 | 71.82 |
hq_2Kx2K.png | 2048x2048 | 41.13 | 101.97 |
Image | Size | Time(ms) | Throughput(MP/s) |
lena_c_512.png | 512x512 | 6.07 | 43.19 |
hq_1024x1024.png | 1024x1024 | 18.03 | 58.16 |
hq_2Kx2K.png | 2048x2048 | 79.30 | 52.89 |