Skip to content

Commit

Permalink
Docs update (#92)
Browse files Browse the repository at this point in the history
* AV1 update

Signed-off-by: [email protected] <[email protected]>

* Adding section for sws-scale.

Signed-off-by: [email protected] <[email protected]>

* Clarifying why colormatrix isnt great, also adding zscale for completeness.

Signed-off-by: [email protected] <[email protected]>

* Adding some extra notes on what the graphs are showing, also added the allintra flag.

Signed-off-by: [email protected] <[email protected]>

* Updating the color tests, removing some un-necessary tests.

Signed-off-by: [email protected] <[email protected]>

* Adding in some more reference-results images that are needed by the AV1 page.

Signed-off-by: [email protected] <[email protected]>

---------

Signed-off-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
  • Loading branch information
richardssam and SamRichardsDisney authored May 5, 2024
1 parent e519607 commit 4f5941e
Show file tree
Hide file tree
Showing 18 changed files with 2,241 additions and 65 deletions.
45 changes: 41 additions & 4 deletions ColorPreservation.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,16 +19,14 @@ Even if you are sticking to 8-bits encodes, if your source media is able to have

For more information, see: [https://trac.ffmpeg.org/wiki/colorspace](https://trac.ffmpeg.org/wiki/colorspace)

TODO -- Review the SWS_Flags.

For examples comparing these see: [here](https://academysoftwarefoundation.github.io/EncodingGuidelines/tests/chip-chart-yuvconvert/compare.html)

## colormatrix filter
```
-vf "colormatrix=bt470bg:bt709"
```
This is the most basic colorspace filtering. bt470bg is essentially part of the bt601 spec. See: [https://www.ffmpeg.org/ffmpeg-filters.html#colormatrix]()
This is the most basic colorspace filtering. bt470bg is essentially part of the bt601 spec. See: [https://www.ffmpeg.org/ffmpeg-filters.html#colormatrix]()
This is the most basic colorspace filtering. bt470bg is essentially part of the bt601 spec. See: [https://www.ffmpeg.org/ffmpeg-filters.html#colormatrix](https://www.ffmpeg.org/ffmpeg-filters.html#colormatrix)

e.g.

<!---
Expand Down Expand Up @@ -56,6 +54,10 @@ ffmpeg -y -i ../sourceimages/chip-chart-1080-noicc.png \
./chip-chart-yuvconvert/spline444colormatrix2.mp4
```

There are a couple of issues with this filter:
* only supports 8bpc (8-bit per component) pixel formats
* Its slower than the alternatives.

## colorspace filter
```
-vf "colorspace=bt709:iall=bt601-6-625:fast=1"
Expand Down Expand Up @@ -83,6 +85,38 @@ ffmpeg -y -i ../sourceimages/chip-chart-1080-noicc.png \
./chip-chart-yuvconvert/spline444colorspace.mp4
```


## zscale filter

```
-vf "zscale=m=709:min=709:rangein=full:range=limited"
```
Using the libswscale library. Seems similar to colorspace, but with image resizing, and levels built in. [https://www.ffmpeg.org/ffmpeg-filters.html#scale-1](https://www.ffmpeg.org/ffmpeg-filters.html#scale-1)

This is an alternative to libswscale, which does produce pretty good results for image resizing, but purely for RGB to YCrCb conversion libswscale appears very slightly better.

e.g.

<!---
name: test_colormatch_zscale
sources:
- sourceimages/chip-chart-1080-16bit-noicc.png.yml
comparisontest:
- testtype: idiff
- testtype: assertresults
tests:
- assert: less
value: max_error
less: 0.00195
-->
```
ffmpeg -y -i ../sourceimages/chip-chart-1080-noicc.png \
-pix_fmt yuv444p10le -vf "zscale=m=709:min=709:rangein=full:range=limited" \
-c:v libx264 -preset placebo -qp 0 -x264-params "keyint=15:no-deblock=1" -qscale:v 1 \
-color_range tv -colorspace bt709 -color_primaries bt709 -color_trc iec61966-2-1 \
./chip-chart-yuvconvert/spline444out_color_matrix.mp4
```

## libswscale filter

```
Expand Down Expand Up @@ -112,3 +146,6 @@ ffmpeg -y -i ../sourceimages/chip-chart-1080-noicc.png \
-color_range tv -colorspace bt709 -color_primaries bt709 -color_trc iec61966-2-1 \
./chip-chart-yuvconvert/spline444out_color_matrix.mp4
```


Note, there are a lot of other flags often used with the swscale filter (such as -sws_flags spline+full_chroma_int+accurate_rnd ) which really have minimal impact in the RGB to YCrCb conversion, if you are not resizing the image. For more details on this see [SWS Flags](/EncodeSwsScale.html) section.
16 changes: 7 additions & 9 deletions EncodeAv1.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,20 +89,17 @@ See: [SVT-AV1 Common Questions](https://gitlab.com/AOMediaCodec/SVT-AV1/-/blob/m


See Also:
* https://gitlab.com/AOMediaCodec/SVT-AV1/-/blob/master/Docs/svt-av1_encoder_user_guide.md
* https://gitlab.com/AOMediaCodec/SVT-AV1/-/blob/master/Docs/Ffmpeg.md
* [svt-av1 encoder user guide](https://gitlab.com/AOMediaCodec/SVT-AV1/-/blob/master/Docs/svt-av1_encoder_user_guide.md)
* [ffmpeg svt-av1](https://gitlab.com/AOMediaCodec/SVT-AV1/-/blob/master/Docs/Ffmpeg.md)

## libaom-av1

This is the reference encoder https://github.com/AOMediaCodec/community/wiki
This is the reference encoder [AOM Media Codec](https://github.com/AOMediaCodec/community/wiki)

Supported pixel formats:
yuv420p yuv422p yuv444p gbrp yuv420p10le yuv422p10le yuv444p10le yuv420p12le yuv422p12le yuv444p12le gbrp10le gbrp12le gray gray10le gray12le


{: .warning }
All our initial testing is showing libaom being more than 10x slower at encoding than svt-av1. It needs further exploration to determine if there are ways of getting better encoding times. Unfortunately for many pixel formats, libaom is the only option for av1 encoding (e.g. 422, or 444 encoding).

Example encoding:

```
Expand All @@ -117,21 +114,22 @@ ffmpeg -r 24 -start_number 1 -i inputfile.%04d.png -frames:v 200 -c:v libaom-av1
| --- | --- |
| -cpu-used 6 | This sets how efficient the compression will be. The default is 1, changing this will increase encoding speed at the expense of having some impact on quality and rate control accuracy. Values above 6 are reset to 6 unless real-time encoding is enabled. See below for comparison. |
| -row-mt 1 | This enables row based multi-threading (see [here](https://trac.ffmpeg.org/wiki/Encode/VP9#rowmt)) which is not enabled by default. |
| -usage allintra | Encodes for all intra-frames |

### cpu-speed Comparison for libaom-av1

To help pick appropriate values with the cpu-speed flag, we have run the [Test Framework](enctests/README.html) through one of the test media. You can see that values are

| ![](enctests/reference-results/aomav1-crf-test-encode_time.png) | ![](enctests/reference-results/aomav1-crf-test-encode_time_zoom.png) |
| This is showing cpu-speed values against encoding time. | Same graph of cpu-speed value against encoding time a 0-500 scale. |
| This is showing cpu-speed values against encoding time. You can see that values of 1 and 2 are more than 15 minutes, where most other encoders are closer to the 30 second range. | Same graph of cpu-speed value against encoding time a 0-500 scale. This is showing cpu-used 5 is now just twice as slow. |

| ![](enctests/reference-results/aomav1-crf-test-filesize.png) This is showing cpu-speed values against file size. |
| ![](enctests/reference-results/aomav1-crf-test-vmaf_harmonic_mean.png) This is showing cpu-speed values against VMAF harmonic mean |


See Also - note these are all guides for AOMENC (the AOM encoder that is part of libaom), but many of the parameters map to ffmpeg:
* https://forum.doom9.org/showthread.php?t=183906
* https://old.reddit.com/r/AV1/comments/lfheh9/encoder_tuning_part_2_making_aomencav1libaomav1/
* [A 2nd generation guide to aomenc-av1](https://forum.doom9.org/showthread.php?t=183906)
* [Making aomenc-AV1/libaom-AV1 the best it can be in a sea of uncertainty]((https://old.reddit.com/r/AV1/comments/lfheh9/encoder_tuning_part_2_making_aomencav1libaomav1/)
* https://github.com/master-of-zen/Av1an/blob/master/docs/Encoders/aomenc.md

## librav1e
Expand Down
145 changes: 145 additions & 0 deletions EncodeSwsScale.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
---
layout: default
title: Ffmpeg Scaling Options
nav_order: 5.6
parent: Encoding Overview
---

# SWS_Flags

As mentioned previously, **we recommend doing any image resizing outside of ffmpeg**, especially if your sources are OpenEXR and you are going to do a colorspace conversion, prior to encoding, this will ensure that the filtering is done in linear space, which will produce less artifacts. However the scaling algorithms can still get called when remapping chroma from 4:4:4 to 4:2:2 (or 4:2:0). For more information on chroma subsampling, see [frame.io: chroma -subsampling guide]([https://workflow.frame.io/guide/chroma-subsampling](https://workflow.frame.io/guide/chroma-subsampling)).

There are two scaling libraries that can be used in ffmpeg, libswscale, and zscale. This guide is mainly focusing on the additional options of the libswscale library.


## libswscale Scaling Summary

For most cases, when converting from RGB to Y’CbCr the default options are fine, but if you are downsampling the image inside ffmpeg, you probably want to use the lanczos filter rather than the default bicubic


```
-sws_flags lanczos+accurate_rnd
```


For converting from 4:4:4 to RGB we recommend using:


```
-sws_flags spline+accurate_rnd
```


For converting from 4:2:2 or 4:2:0 to RGB (or another Y’CbCr) we recommend using:


```
-sws_flags spline+full_chroma_int+accurate_rnd
```


When converting to 4:2:2 or 4:2:0. This appears to convert with no artifacts, all the other scaling options have artifacts of some form.

Note, you could also be safe in simply enabling flags all the time, since they rarely will do much harm, an example of this is:


```
-sws_flags spline+full_chroma_int+accurate_rnd+full_chroma_inp
```


Especially if you already have done any resolution scaling outside of ffmpeg, are not harmful, and for Y’CbCr conversion to RGB would be beneficial.


### SWS_Flags options

The main flag that is defined is the scaling algorithm, (see later), but you can combine it with another of other flags.

E.g.


```
-sws_flags=lanczos+accurate_rnd+full_chroma_int:sws_dither=none:param0=5
```


Picks the lanczos filter, with a param0=5 which sets the filter size to 5 rather than default 3, turning off the dither, and enabling the accurate_rnd and full_chroma_int flags. NOTE, this is just an example of what you can do, not what we recommend doing.

Flags that can be used with the filters, include:


| accurate_rnd | allows more accurate rounding. This avoids using some [MMX optimizations](https://stackoverflow.com/questions/70893502/why-does-ffmpeg-output-slightly-different-rgb-values-when-converting-to-gbrp-and) that might introduce rounding errors. In practice its unlikely to kick in, but it doesn't hurt. This only occurs if the source is 4:2:0 or 4:2:2, and the destination is RGB, articularly if you are dithering the result. The other case is if you are converting from one Y’CbCr format to another (e.g. 4:2:0 to 4:2:2). NOTE, we have yet to find a case where this currently makes a difference, it's possible it was important for earlier versions of ffmpeg, but with more recent versions (ffmpeg >= 5.x) it seems to have little impact. |
| full_chroma_int | Full Chroma Interpolation - is used for internal processing when rescaling. It enables the use of full Y’CbCr 4:4:4 for internal processing. This means the chroma plane is upsampled using actual scaling conversions before the Y’CbCr-to-RGB conversion is initiated. This can potentially deliver higher visual quality at a relatively small speed penalty. This does nothing for the RGB to Y’CbCr conversion, but does for the [YCrCb to RGB conversion](https://github.com/FFmpeg/FFmpeg/blob/08e97dae205d10806a0360bfc62f654d629dda93/libswscale/output.c#L2847), or between Y’CbCr formats. |
| full_chroma_inp | Full Chroma Input - This forces the scaler to assume the input is full chroma even for cases where the scaler thinks it's useless. This does nothing if the source format is an RGB, so really only applies if the source format is a [4:2:2 or 4:2:0](https://github.com/FFmpeg/FFmpeg/blob/08e97dae205d10806a0360bfc62f654d629dda93/libswscale/utils.c#L1493) format. If this is not defined, it will drop every other pixel for chroma calculation. It seems like full_chroma_int really does everything this does. |
| print_info | Outputs additional debug info for the scaler. |
| Bitexact | disabled SIMD operations that don’t generate exactly the same output as C, this ends up being quite similar to accurate_rnd. |
| sws_dither | Allows you to set (or disable) the dithering algorithm. By default it is set to “auto”, which will use the bayer algorithm unless the “full_chroma_int” flag is enabled, in which case it will use the “ed” (error diffusion) algorithm. This is due to the bayer algorithm not supporting the full_chroma_int flag. You can also disable it with “none”. It is recommended to leave it enabled, to help with the RGB to Y’CbCr rounding errors, particularly in 8-bit. (see [utils.c - initFilter line 1423](https://github.com/FFmpeg/FFmpeg/blob/a87a52ed0b561dc231e707ee94299561631085ee/libswscale/utils.c#L1423) |


## Deeper dive

Below we dig into when the different flags should be used for different occasions.


### Converting from RGB to Y’CbCr 422

In the testing (See link), for most cases, there is minimal difference between the swscale algorithm options for RGB to 422 conversion. Any of the bicubic, lanczos scalers will produce an identical result.

E.g.

```
-sws_flags accurate_rnd+lanczos+print_info -pix_fmt yuv422p10le -vf "scale=in_range=full:in_color_matrix=bt709:out_range=tv:out_color_matrix=bt709"
```

Zscale does produce a slightly different result, using the flags:

```
-vf "zscale=m=709:min=709:rangein=full:range=limited:filter=lanczos"
```

Its frankly hard to say between zscale and swscale which is better.

The one test pattern that creates errors for both of the scalers is the smpteHDbars, zscale is the worse of the two, but unless you use the “area” filter, the bars results are not great. If you have a fairly graphic look, and are converting to 422, you may want to try area, but do be aware that for other cases it may not create a great result.


### Converting from RGB to Y’CbCr 420

In the testing (See link), Unlike for the conversion to 422, since you are now sampling over a slightly bigger area, there is a bigger difference between the different filtering algorithms. We are recommending:

E.g.

```
-sws_flags accurate_rnd+lanczos+print_info -pix_fmt yuv420p10le -vf "scale=in_range=full:in_color_matrix=bt709:out_range=tv:out_color_matrix=bt709"
```

Zscale does produce a slightly improved if occasionally softer result, using the flags:

```
-vf "zscale=m=709:min=709:rangein=full:range=limited:filter=lanczos"
```

Its frankly hard to say between zscale and swscale which is better.


### Exporting from Y’CbCr

If you are converting files from Y’CbCr, e.g. from Prores to MXF (or h264), particularly from 4:2:0 or 4:2:2, many of the above options are theoretically more important.

Additionally you should be aware of is the [chroma_sample_location](http://trac.ffmpeg.org/wiki/Scaling#Chromasamplelocation).Which can also be important when exporting 4:2:0 or 4:2:2 to RGB. &lt;TODO TEST>


## Scaling algorithms

Picking the scaling filter does depend on the type of imagery you are applying it to. If you are processing live-action then the consensus is typically the lanczos filter. Animation or motion graphics may suffer due to the ringing effects of lanczos, but you really should compare the results.

* [Comparison gallery of image scaling algorithms - Wikipedia](https://en.wikipedia.org/wiki/Comparison_gallery_of_image_scaling_algorithms)
* [Cambridge in Color - resizing for web and email](https://www.cambridgeincolour.com/tutorials/image-resize-for-web.htm)
* [ImageMagick Examples -- Resampling Filters](https://www.imagemagick.org/Usage/filter)

See also:

* [https://stackoverflow.com/questions/70893502/why-does-ffmpeg-output-slightly-different-rgb-values-when-converting-to-gbrp-and](https://stackoverflow.com/questions/70893502/why-does-ffmpeg-output-slightly-different-rgb-values-when-converting-to-gbrp-and)
* [https://trac.ffmpeg.org/ticket/1582#comment:11](https://trac.ffmpeg.org/ticket/1582#comment:11)
* [https://stackoverflow.com/questions/64729698/why-is-ffmpegs-conversion-to-yuv420-so-poor](https://stackoverflow.com/questions/64729698/why-is-ffmpegs-conversion-to-yuv420-so-poor)


Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified enctests/reference-results/av1-crf-test-encode_time.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified enctests/reference-results/av1-crf-test-filesize.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified enctests/reference-results/av1-crf-test-vmaf_harmonic_mean.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 4f5941e

Please sign in to comment.