Tuning Video Quality¶
Table of Contents
The quality of encoded video depends on various factors. It is primarily a function of the target bit rate and the type of video content. However, there are some encoder parameters which can be used to adjust the video quality. This document describes the major parameters impacting video quality. These paramaters and the underlying concepts are applicable whether using FFmpeg, GStreamer or the C XMA APIs.
Various examples illustrating the effect of these settings can be found here:
ABR Scaling¶
The figure below illustrates a scaling ladder with a 1920x1080 input and 4 outputs with resolutions of 1280x720, 852x480, 640x360, and 416x240, respectively.
For the complete list of features and capabilities of the Xilinx hardware scaler, refer to the Adaptive Bitrate Scaler features section of the Specs and Features chapter of the documentation.
Scaling Order
The scaler is tuned for downscaling and expects non-increasing resolutions in an ABR ladder. Increasing resolutions between outputs is supported but will reduce video quality. For best results, the outputs must be configured in descending order. The frame rate and resolution of a given output should be smaller or equal than the rate of the previous output. Since the output of one scaling stage is passed as an input to the next, video quality will be negatively affected if frame rate is increased after it has been lowered.
Scaling Ratio
Down scale ratios greater than 3 lead to poor visual quality. To address this issue, multiple number of arms must be added to the scaler ladder, where each arm accommodates down scale ratios of up to 3. For example, to down scale from 1920x1080 to 512x288, 2 scaler arms are required in order to preserve high visual quality.
Number of B Frames¶
The default number of B frames is 2, but for most streams, the optimal number of B frames is 1. This provides the best tradeoffs for both video quality and objective quality use cases. The number of B frames can be adjusted according to the amount of motion in the video content. Generally, more B-frames helps compression, but hurts very high motion scenes. Xilinx recommends the following B frames settings:
0 for gaming clips with fast motion, camera pan/rotation scenes
2 for static or slow moving scenes, talking heads, or video conferencing type of content
1 for all medium motion and all other content
To set the number of B frames:
The number of B frames can also be adjusted as noted in Dynamic Encoder Parameters, or dynamically, Dynamic GOP.
Lookahead¶
Lookahead is used to improve the accuracy of rate control by enabling the encoder to buffer a specified number of frames (using the parameter). Spatial and temporal complexity measures are computed for these frames. The rate control uses these measures to distribute more bits to frames which are hard to encode, and less bits to frames which are easy to encode. This redistribution results in better video quality. For spatial and temporal adaptive quantization to take effect, the lookahead depth must be set to at least 1. When latency is tolerable in applications, Xilinx recommends using the maximum lookahead depth to get optimum video quality.
The maximum lookahead value allowed is 1/3 of the stream frame rate. Therefore, the lookhead depth can be set up to 20 for streams with 60fps, and up to 10 for streams with 30fps.
To enable lookahead:
When using FFmpeg, use the
-lookahead_depth
optionWhen using GStreamer, use the
lookahead-depth
option
Adaptive Quantization¶
This feature improves the video qualitity by changing the quantization parameter (QP) within a frame. The QP for each frame is determined by the rate control, and adaptive quantization (AQ) adjusts QP on top of that for different regions within a frame. It exploits the fact that the human eye is more sensitive to certain regions of a frame and redistributes more bits to those regions. The Xilinx video encoders support two types of AQ: Spatial Adaptive Quantization and Temporal Adaptive Quantization.
Adaptive Quantization is enabled by setting -qp-mode
to relative-load
or auto
(default) and when -lookahead_depth
>= 1. When AQ is enabled, Spatial AQ and Temporal AQ can be individually controlled as described in the following sections.
Adaptive Quantization is disabled by setting -qp-mode
to uniform
. In this case, all CU’s within the frame are quantized using the same QP.
Spatial Adaptive Quantization
Spatial AQ adjusts the QP within a frame based on the spatial characteristics. The human eye is more sensitive to regions which are flat and have low texture than regions which have lots of detail and texture. Spatial AQ exploits this and provides more bits to the low texture and flat regions at the expense of high texture regions. This redistribution of bits to visually perceptible regions of the frame brings about visual improvement. Although spatial AQ improves video qualitity, it hurts objective metrics and causes a drop in PSNR and VMAF. It is recommended to turn this feature off when performing PSNR/VMAF based evaluation.
The spatial AQ algorithm can be controlled using the spatial AQ gain option. The range of this option is from 0 to 100 and indicates the strength of the redistribution of data within the frame as a percentage. The default gain is 50.
To control spatial AQ:
When using FFmpeg, use the
-spatial-aq
option to enable or disable spatial AQ, and use the-spatial-aq-gain
option to set the gainWhen using GStreamer, use the
spatial-aq
option to enable or disable spatial AQ, and use thespatial-aq-gain
option to set the gain
Spatial quantization can also be adjusted dynamically. Consult the Dynamic Encoder Parameters section for more details on how to change this option during runtime.
Temporal Adaptive Quantization
Temporal AQ adjusts the QP based on the temporal characteristics of the sequence. It utilizes the lookahead frames to capture the temporal characteristics where static/low motion or background is differentiated with high motion regions. The high motion regions are not very sensitive to the human eye as compared with low motion regions. Temporal AQ exploits this fact and redistributes more bits to static or low motion regions.
To control temporal AQ:
When using FFmpeg, use the
-temporal-aq
option to enable or disable temporal AQWhen using GStreamer, use the
temporal-aq
option to enable or disable temporal AQ
Temporal quantization can also be adjusted dynamically. Consult the Dynamic Encoder Parameters section for more details on how to change this option during runtime.
Scaling List¶
Scaling list offers a mechanism to scale the transform coefficients by specifying scaling matrices. This influences the quality of encoded video. There are two options to specify the scaling lists mode: flat (0) and default (1).
For subjective video quality, the scaling list mode must be set to default. The default scaling mode gives more importance to low-frequency coefficients and less importance to high-frequency coefficients.
To improve the objective numbers (such as PSNR and VMAF), the scaling mode must be set to flat, where all the coefficients are scaled equally.
To control the scaling list mode:
When using FFmpeg, use the
-scaling-list
option (0 = flat, 1 = default)When using GStreamer, use the
scaling-list
option (0 = flat, 1 = default)
Dynamic Encoder Parameters¶
Dynamic parameters are parameters which can be changed during runtime. This is useful to optimize video quality and compression rate for different segments of the video. This capability is supported through FFmpeg, GStreamer and the Video SDK C APIs.
The following encoder parameters can be dynamically modified:
Number of B frames (0 to 4)
Bitrate (1000 to INT_MAX)
Temporal AQ mode (0 to 1)
Spatial AQ mode (0 to 1)
Spatial AQ gain (0 to 100)
Min/Max QP (1-51)
When using FFmpeg or GStreamer, the encoder parameters which should be changed are specified in a configuration file as key-value pairs. This means that these key-value pairs must be known ahead of time. While this can used for file-based processing, this method is primarily intended as a testing mechanism.
When using the C APIs, the dynamic encoder parameters are updated on a frame-by-frame basis using side data. With this approach the top-level application can analyze the incoming frames, determine how to adjust the encoder parameters and update them before encoding the frame. This is primary intended use case for this capability.
Dynamic Parameters Considerations
Recommended settings for dynamic B frames are:
0 for gaming clips with fast motion, camera pan/rotation scenes
2 for static or slow moving scenes, talking heads, or video conferencing type of content
1 for all medium motion and all other content
B frames changes do not happen at the exact frame number specified. Instead, the change comes into effect one or two frames from the actual frame number specified in the config file.
The maximum value for the number of B frames is the value configured at initilization
In low latency mode, a dynamic bitrate changes is reflected after 1 or 2 GOPs. In normal latency mode, a bitrate change occurs within the next second.
The configuration files for dynamic parameters must comply with the format specified above. Ill-formed files may result in unexpected behavior.
FFmpeg
Encoder parameters which should be changed dynamically are specified as key-value pairs in a configuration file. This configuration file is provided to FFmpeg using the -expert-options
encoder switch with the dynamic-params=<file>
option.
The -expert-options
option is specific to an encoded output. For use cases with multiple encoded outputs (such as ABR ladders), each output can have its own -expert-options
option and associated configuration file.
The configuration file should contain one line for each frame where one or more parameters are changed. Each line should start with the frame number followed by a list of key-value pairs for all the modified parameters:
<frameNumberN1>:<key1>=<value1>
<frameNumberN2>:<key2>=<value2>,<key3>=<value3>
Below is a table listing the parameters which can be changed at runtime, the corresponding key and valid values.
Dynamic Parameter |
Key |
Valid Values |
---|---|---|
Number of B frames |
|
0 to 4 |
Bitrate (in bits per second) |
|
1000 to INT_MAX |
Temporal AQ mode |
|
0 to 1 |
Spatial AQ mode |
|
0 to 1 |
Spatial AQ gain |
|
0 to 100 |
Min/Max QP |
|
1 to 51 |
Sample FFmpeg encode command:
ffmpeg -c:v mpsoc_vcu_hevc -i input.h265 -c:v mpsoc_vcu_hevc -b:v 2.5M -lookahead_depth 10 -expert-options dynamic-params=dynparams.txt -y output.h265
Sample configuration file for dynamic parameters:
300:NumB=1
600:BR=6000000
1200:sAQ=1,sAQGain=50
1800:tAQ=1
2400:NumB=0,BR=10000000,sAQ=0,sAQGain=50,tAQ=0
5000:MinQP=30,MaxQP=35
GStreamer
Encoder parameters which should be changed dynamically are specified as key-value pairs in a JSON configuration file, as follows:
"dynamic_params" :[
{
"frame" : 600,
"b-frames" : 2,
"bitrate" : 6000,
"temporal-aq" : false,
"spatial-aq" : true,
"spatial-aq-gain" : 50
"min-qp" : 25,
"max-qp" : 35
},
{
"frame" : 1500,
"b-frames" : 0,
"bitrate" : 3000
"temporal-aq" : true,
"spatial-aq" : true,
"spatial-aq-gain" : 50,
"min-qp" : 5,
"max-qp" : 15
}
]
Refer to the GStreamer ABR Ladder Application for a working example leveraging dynamic parameters with GStreamer. This example includes a complete JSON file for configuring the dynamic parameters and a description of command line usage.
C APIs
Encoder parameters which should be changed dynamically must be specified in a data structure called XlnxDynParams
and defined as follows:
-
struct XlnxDynParams¶
typedef struct XlnxLaDynParams {
bool is_spatial_gain_changed;
uint32_t spatial_aq_gain;
bool is_temporal_mode_changed;
bool temporal_aq_mode;
bool is_spatial_mode_changed;
bool spatial_aq_mode;
} XlnxLaDynParams;
typedef struct XlnxEncDynParams {
bool is_bitrate_changed;
uint32_t bit_rate;
bool is_bframes_changed;
uint8_t num_b_frames;
bool is_min_max_qp_changed;
uint32_t min_qp;
uint32_t max_qp;
} XlnxEncDynParams;
typedef struct XlnxDynParams {
XlnxEncDynParams enc_dyn_param;
XlnxLaDynParams la_dyn_param;
} XlnxDynParams;
The XlnxDynParams
data structure is then passed along as side-data to the encoded frame using the xma_frame_add_side_data()
API. The encoder plugin then takes care of updating the parameters automatically.
For of an example of how this can be implemented, refer to the xlnx_enc_add_dyn_params()
function in the src/xlnx_enc_dyn_params.c file of the shared library providing dynamic encoder paramater support.
XMA Encoder and Transcoder Applications
The XMA Encoder and Transcoder sample applications also feature support for dynamically changing encoder parameters. These sample applications are developed using the Video SDK C APIs. Dynamic parameters are handled in the same way as for FFmpeg: they are specified in a configuration file following the same syntax as for FFmpeg, and the configuration file is passed to the application using the -expert-options dynamic-params=<file>
option.
XMA H.264 Encode Example:
u30_xma_encode -w 1920 -h 1080 -pix_fmt yuv420p -i input_1080.yuv -fps 60 -g 120 -periodicity-idr 120 -frames 600 -c:v mpsoc_vcu_h264 -expert-options dynamic-params=cmdfile.txt -lookahead-depth 20 -spatial-aq 1 -temporal-aq 1 -spatial-aq-gain 80 -o out1.264
XMA HEVC Encode Example:
u30_xma_encode -w 1920 -h 1080 -pix_fmt yuv420p -i input_1080.yuv -fps 60 -g 120 -periodicity-idr 120 -frames 600 -c:v mpsoc_vcu_hevc -expert-options dynamic-params=cmdfile.txt -lookahead-depth 20 -spatial-aq 1 -temporal-aq 1 -spatial-aq-gain 80 -o out2.265
XMA ABR Transcode Example:
u30_xma_transcode -c:v mpsoc_vcu_h264 -i sample.h264 -multiscale_xma -num-output 2 -out_1_width 1280 -out_1_height 720 -out_2_width 848 \
-c:v mpsoc_vcu_h264 -b:v 3000K -expert-options dynamic-params=cmdfile.txt -qp-mode 2 -lookahead-depth 16 -temporal-aq 1 -spatial-aq 1 -spatial-aq-gain 75 -o out_dp_test1.h264 \
-c:v mpsoc_vcu_h264 -b:v 4000K -expert-options dynamic-params=cmdfile.txt -qp-mode 2 -lookahead-depth 16 -temporal-aq 1 -spatial-aq 1 -spatial-aq-gain 75 -o out_dp_test2.h264
Dynamic IDR Frame Insertion¶
The dynamic IDR frame insertion feature allows encoding any incoming frame as IDR at runtime. This capability is supported through FFmpeg, GStreamer and the Video SDK C APIs.
Dynamic IDR frame insertion resets the GOP position. That is, if the stream is running with a GOP size of 120 (an IDR frame every 120 frames) and if an IDR frame is dynamically inserted at frame 50, the next IDR frame would be at frame 170 instead of 120.
To control dynamic IDR frame insertion:
When using FFmpeg, use the
-force_key_frames
optionWhen using GStreamer, use the
--forcekeyframe
option tovvas_xabrladder
applicationWhen using the C APIs, set the
is_idr
flag in theXmaFrame
data structure of frame to be encoded as IDR
FFmpeg
To control dynamic IDR frame insertion in FFmpeg, use the -force_key_frames
option. This option can take either a list of timestamps (time,[time…]) or an expression (expr:expr).
When using timestamps, FFmpeg will round the specified times to the nearest output timestamp as per the encoder time base and force a keyframe at the first frame having timestamp equal or greater than the computed timestamp.
When using an expression, the string is interpreted like an expression and is evaluated for each frame. A key frame is forced in case the evaluation is non-zero.
Example using timestamps - Forcing IDR frames at specific time intervals:
ffmpeg -re -c:v mpsoc_vcu_hevc -i input.mp4 -c:v mpsoc_vcu_h264 –b:v 5M –force_key_frames 1,2,3,5,8,13,21 -f mp4 out.mp4
Example using expressions - Forcing an IDR frame every 5 seconds:
ffmpeg -re -c:v mpsoc_vcu_hevc -i input.mp4 -c:v mpsoc_vcu_h264 –b:v 5M -force_key_frames expr:gte(t,n_forced*5) -f mp4 out.mp4
For more details on using timestamps or expressions, see -force_key_frames in FFmpeg documentation (https://ffmpeg.org/ffmpeg-all.html).
GStreamer
To control IDR frame insertion in GStreamer, use the --forcekeyframe
option in vvas_xabrladder application. This option takes IDR frame insertion frequency in number of frames.
Example to force an IDR frame every 30 frames:
vvas_xabrladder --devidx 0 --lookahead_enable 0 --codectype 1 --forcekeyframe 30 --file <path to input file>
C APIs
To encode an incoming frame as IDR, the application needs to set the is_idr
flag in the XmaFrame
structure which is being sent to the lookahead module and then to the encoder:
XmaFrame la_in_frame;
la_in_frame.is_idr = 1;
For a complete source code reference, refer to xlnx_enc_set_if_idr_frame()
function in the transcoder/lib/src/xlnx_transcoder.c file of the XMA transcoder application.
XMA Encoder and Transcoder Applications
In the XMA encoder and transcoder applications, use the -force_key_frame command line option to set IDR frames at specific frame numbers:
u30_xma_transcode -c:v mpsoc_vcu_h264 -i input.264 -c:v mpsoc_vcu_h264 -b:v 6000K -force_key_frame "(200,300,250,200,180,400)" -o out.264
Dynamic GOP¶
The ability to change the composition of a GOP, in response to the content of a video, is of great benefit in video compression. U30 family of devices offer variable number of B frames, 0 to 2, in response to variations in motion vectors, within the lookahead buffer. Specifically, the inserted number of B frames is inversely proportional to variations in motion vector. Control parameter -dynamic-gop
enables and disables this feature. Note that this parameter is not compatible with -bf
and requires a minimal lookahead depth of 5 frames.
Various examples illustrating the effect of these settings can be found here:
When using FFmpeg, the
-dynamic-gop
option can be set to 1 to enable this feature.When using GStreamer, the
dynamic-gop
option can be set to 1 to enable this feature. This parameter is not compatible withb-frames
and requires a minimal lookahead depth of 5 frames.
Objective Metrics versus Subjective Quality¶
Encoder settings can be used to optimize for objective quality (metrics such as PSNR, SSIM or VMAF) or subjective quality (visual appeal). This section summarizes recommended settings for these two goals.
Optimizing for Objective Metrics
To optimize for objective metrics, Adaptive Quantization should be disabled and a flat scaling list should be used. Doing so provides equal importance to all the blocks in the frame: the same quantization parameters and transform coefficients are used for all of them.
When using FFmpeg, the
-tune-metrics
option can be set to 1 to automatically disable AQ and use a flat scaling list.When using GStreamer, the
tune-metrics
option can be set to 1 to automatically disable AQ and use a flat scaling list.
Optimizing for Subjective Quality
To optimize for subjective quality, Adaptive Quantization should be enabled and the default scaling list should be used.
Adaptive Quantization (AQ) exploits the fact that the human eye is more sensitive to certain regions of a frame. This method drops information from high-frequency locations and keeps more information in low-frequency locations in a frame. The result appears more visually appealing. The Spatial AQ Gain parameter controls the strength of the redistribution of data within the frame. Setting too high a value may have a consequence of blurring edges. Experimentation across your clips is recommended if you wish to tune this parameter.
The default scaling list is used to scale up low-frequency data in the stream such that when it is quantized down during the encoding process, detail is retained.
For more details about Adaptive Quantization and the Scaling List, refer to the corresponding sections.