Skip to content

Commit

Permalink
Docs: Add loudness metadata encoding instructions.
Browse files Browse the repository at this point in the history
This will help ensure that encoded files include the correct loudness metadata
that decoders and renderers can use to normalize audio for playback.

PiperOrigin-RevId: 715073260
  • Loading branch information
felicialim authored and jwcullen committed Jan 14, 2025
1 parent fb86319 commit 4e74eb9
Show file tree
Hide file tree
Showing 2 changed files with 58 additions and 19 deletions.
58 changes: 39 additions & 19 deletions docs/external/encoding_with_external_tools.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,17 +14,37 @@ encapsulated in MP4.
The examples below encode input wav files to IAMF using Opus to encode the
underlying audio elements at 64 kbps per channel.

> **NOTE: Input channel order**
>
> The examples assume that the channel order in the input wav files follow the
> ordering used in [ITU-R BS.2051](https://www.itu.int/rec/R-REC-BS.2051).
>
> If a different channel order is used, change the input channel indices
> indicated by `channelmap`. The [IAMF specification (Coupled stereo
> channels)](https://aomediacodec.github.io/iamf/v1.1.0.html#coupled_substream_count)
> requires that the result groups specific channels as either a stereo pair or
> a mono channel, and additionally that they follow a specific order. In the
> ffmpeg command, this is defined by the order of the `-map "[]"` options.
Usage notes:

- **Input channel order**

The examples assume that the channel order in the input wav files follow the
ordering used in [ITU-R BS.2051](https://www.itu.int/rec/R-REC-BS.2051).

If a different channel order is used, change the input channel indices
indicated by `channelmap`. The [IAMF specification (Coupled stereo
channels)](https://aomediacodec.github.io/iamf/v1.1.0.html#coupled_substream_count)
requires that the result groups specific channels as either a stereo pair or
a mono channel, and additionally that they follow a specific order. In the
ffmpeg command, this is defined by the order of the `-map "[]"` options.

- **Loudness metadata**

IAMF files include loudness metadata for the input audio, which IAMF
decoders and renderers can use to normalize the output audio.

The `integrated_loudness` value specifies the program integrated loudness
information in LKFS, as measured according to
[ITU-R BS.1770-4](https://www.itu.int/rec/R-REC-BS.1770).

The `digital_peak` value specifies the digital (sampled) peak value of the
audio signal in dBFS.

To include the correct loudness statistics in the IAMF file, measure the
loudness of the input audio for at least the stereo downmix, and then modify
the `integrated_loudness` and `digital_peak` values in the examples below.
Tools such as the `loudnorm` and `astats` ffmpeg filters may be helpful for
measuring the loudness.

### Encoding stereo wav to IAMF

Expand All @@ -35,7 +55,7 @@ ffmpeg -i /path/to/input.wav \
-filter_complex "[0:a]channelmap=0|1:stereo[FRONT]" \
-map "[FRONT]" \
-stream_group "type=iamf_audio_element:id=1:st=0:audio_element_type=channel,layer=ch_layout=stereo" \
-stream_group "type=iamf_mix_presentation:id=3:stg=0:annotations=en-us=default_mix_presentation,submix=parameter_id=100:parameter_rate=48000:default_mix_gain=0.0|element=stg=0:headphones_rendering_mode=binaural:annotations=en-us=stereo:parameter_id=101:parameter_rate=48000:default_mix_gain=0.0|layout=sound_system=stereo:integrated_loudness=0.0" \
-stream_group "type=iamf_mix_presentation:id=3:stg=0:annotations=en-us=default_mix_presentation,submix=parameter_id=100:parameter_rate=48000:default_mix_gain=0.0|element=stg=0:headphones_rendering_mode=binaural:annotations=en-us=stereo:parameter_id=101:parameter_rate=48000:default_mix_gain=0.0|layout=sound_system=stereo:integrated_loudness=0.0:digital_peak=0.0" \
-streamid 0:0 \
-c:a libopus -b:a 64000 /path/to/output_iamf_or_mp4
```
Expand All @@ -49,7 +69,7 @@ ffmpeg -i /path/to/input.wav \
-filter_complex "[0:a]channelmap=0|1:stereo[FRONT];[0:a]channelmap=4|5:stereo[BACK];[0:a]channelmap=2:mono[CENTER];[0:a]channelmap=3:mono[LFE]" \
-map "[FRONT]" -map "[BACK]" -map "[CENTER]" -map "[LFE]" \
-stream_group "type=iamf_audio_element:id=1:st=0:st=1:st=2:st=3:audio_element_type=channel,layer=ch_layout=5.1" \
-stream_group "type=iamf_mix_presentation:id=3:stg=0:annotations=en-us=default_mix_presentation,submix=parameter_id=100:parameter_rate=48000:default_mix_gain=0.0|element=stg=0:headphones_rendering_mode=binaural:annotations=en-us=5.1:parameter_id=101:parameter_rate=48000:default_mix_gain=0.0|layout=sound_system=stereo:integrated_loudness=0.0" \
-stream_group "type=iamf_mix_presentation:id=3:stg=0:annotations=en-us=default_mix_presentation,submix=parameter_id=100:parameter_rate=48000:default_mix_gain=0.0|element=stg=0:headphones_rendering_mode=binaural:annotations=en-us=5.1:parameter_id=101:parameter_rate=48000:default_mix_gain=0.0|layout=sound_system=stereo:integrated_loudness=0.0:digital_peak=0.0" \
-streamid 0:0 -streamid 1:1 -streamid 2:2 -streamid 3:3 \
-c:a libopus -b:a 64000 /path/to/output_iamf_or_mp4
```
Expand All @@ -64,7 +84,7 @@ ffmpeg -i /path/to/input.wav \
-map "[FRONT]" -map "[BACK]" -map "[TOP_FRONT]" -map "[CENTER]" -map "[LFE]"
\
-stream_group "type=iamf_audio_element:id=1:st=0:st=1:st=2:st=3:st=4:audio_element_type=channel,layer=ch_layout=5.1.2" \
-stream_group "type=iamf_mix_presentation:id=3:stg=0:annotations=en-us=default_mix_presentation,submix=parameter_id=100:parameter_rate=48000:default_mix_gain=0.0|element=stg=0:headphones_rendering_mode=binaural:annotations=en-us=5.1.2:parameter_id=101:parameter_rate=48000:default_mix_gain=0.0|layout=sound_system=stereo:integrated_loudness=0.0" \
-stream_group "type=iamf_mix_presentation:id=3:stg=0:annotations=en-us=default_mix_presentation,submix=parameter_id=100:parameter_rate=48000:default_mix_gain=0.0|element=stg=0:headphones_rendering_mode=binaural:annotations=en-us=5.1.2:parameter_id=101:parameter_rate=48000:default_mix_gain=0.0|layout=sound_system=stereo:integrated_loudness=0.0:digital_peak:0.0" \
-streamid 0:0 -streamid 1:1 -streamid 2:2 -streamid 3:3 -streamid 4:4 \
-c:a libopus -b:a 64000 /path/to/output_iamf_or_mp4
```
Expand All @@ -78,7 +98,7 @@ ffmpeg -i /path/to/input.wav \
-filter_complex "[0:a]channelmap=0|1:stereo[FRONT];[0:a]channelmap=4|5:stereo[SIDE];[0:a]channelmap=6|7:stereo[BACK];[0:a]channelmap=8|9:stereo[TOP_FRONT];[0:a]channelmap=10|11:stereo[TOP_BACK];[0:a]channelmap=2:mono[CENTER];[0:a]channelmap=3:mono[LFE]" \
-map "[FRONT]" -map "[SIDE]" -map "[BACK]" -map "[TOP_FRONT]" -map "[TOP_BACK]" -map "[CENTER]" -map "[LFE]" \
-stream_group "type=iamf_audio_element:id=1:st=0:st=1:st=2:st=3:st=4:st=5:st=6:audio_element_type=channel,layer=ch_layout=7.1.4" \
-stream_group "type=iamf_mix_presentation:id=3:stg=0:annotations=en-us=default_mix_presentation,submix=parameter_id=100:parameter_rate=48000:default_mix_gain=0.0|element=stg=0:headphones_rendering_mode=binaural:annotations=en-us=7.1.4:parameter_id=101:parameter_rate=48000:default_mix_gain=0.0|layout=sound_system=stereo:integrated_loudness=0.0" \
-stream_group "type=iamf_mix_presentation:id=3:stg=0:annotations=en-us=default_mix_presentation,submix=parameter_id=100:parameter_rate=48000:default_mix_gain=0.0|element=stg=0:headphones_rendering_mode=binaural:annotations=en-us=7.1.4:parameter_id=101:parameter_rate=48000:default_mix_gain=0.0|layout=sound_system=stereo:integrated_loudness=0.0:digital_peak=0.0" \
-streamid 0:0 -streamid 1:1 -streamid 2:2 -streamid 3:3 -streamid 4:4 -streamid 5:5 -streamid 6:6 \
-c:a libopus -b:a 64000 /path/to/output_iamf_or_mp4
```
Expand All @@ -92,7 +112,7 @@ ffmpeg -i /path/to/input.wav \
-filter_complex "[0:a]channelmap=0:mono[A0];[0:a]channelmap=1:mono[A1];[0:a]channelmap=2:mono[A2];[0:a]channelmap=3:mono[A3]" \
-map "[A0]" -map "[A1]" -map "[A2]" -map "[A3]" \
-stream_group "type=iamf_audio_element:id=1:st=0:st=1:st=2:st=3:audio_element_type=scene,layer=ch_layout=ambisonic\ 1:ambisonics_mode=mono," \
-stream_group "type=iamf_mix_presentation:id=3:stg=0:annotations=en-us=default_mix_presentation,submix=parameter_id=100:parameter_rate=48000:default_mix_gain=0.0|element=stg=0:headphones_rendering_mode=binaural:annotations=en-us=FOA:parameter_id=101:parameter_rate=48000:default_mix_gain=0.0|layout=sound_system=stereo:integrated_loudness=0.0" \
-stream_group "type=iamf_mix_presentation:id=3:stg=0:annotations=en-us=default_mix_presentation,submix=parameter_id=100:parameter_rate=48000:default_mix_gain=0.0|element=stg=0:headphones_rendering_mode=binaural:annotations=en-us=FOA:parameter_id=101:parameter_rate=48000:default_mix_gain=0.0|layout=sound_system=stereo:integrated_loudness=0.0:digital_peak=0.0" \
-streamid 0:0 -streamid 1:1 -streamid 2:2 -streamid 3:3 \
-c:a libopus -b:a 64000 /path/to/output_iamf_or_mp4
```
Expand All @@ -106,7 +126,7 @@ ffmpeg -i /path/to/input.wav \
-filter_complex "[0:a]channelmap=0:mono[A0];[0:a]channelmap=1:mono[A1];[0:a]channelmap=2:mono[A2];[0:a]channelmap=3:mono[A3];[0:a]channelmap=4:mono[A4];[0:a]channelmap=5:mono[A5];[0:a]channelmap=6:mono[A6];[0:a]channelmap=7:mono[A7];[0:a]channelmap=8:mono[A8];[0:a]channelmap=9:mono[A9];[0:a]channelmap=10:mono[A10];[0:a]channelmap=11:mono[A11];[0:a]channelmap=12:mono[A12];[0:a]channelmap=13:mono[A13];[0:a]channelmap=14:mono[A14];[0:a]channelmap=15:mono[A15]" \
-map "[A0]" -map "[A1]" -map "[A2]" -map "[A3]" -map "[A4]" -map "[A5]" -map "[A6]" -map "[A7]" -map "[A8]" -map "[A9]" -map "[A10]" -map "[A11]" -map "[A12]" -map "[A13]" -map "[A14]" -map "[A15]" \
-stream_group "type=iamf_audio_element:id=1:st=0:st=1:st=2:st=3:st=4:st=5:st=6:st=7:st=8:st=9:st=10:st=11:st=12:st=13:st=14:st=15:audio_element_type=scene,layer=ch_layout=ambisonic\ 3:ambisonics_mode=mono," \
-stream_group "type=iamf_mix_presentation:id=3:stg=0:annotations=en-us=default_mix_presentation,submix=parameter_id=100:parameter_rate=48000:default_mix_gain=0.0|element=stg=0:headphones_rendering_mode=binaural:annotations=en-us=3OA:parameter_id=101:parameter_rate=48000:default_mix_gain=0.0|layout=sound_system=stereo:integrated_loudness=0.0" \
-stream_group "type=iamf_mix_presentation:id=3:stg=0:annotations=en-us=default_mix_presentation,submix=parameter_id=100:parameter_rate=48000:default_mix_gain=0.0|element=stg=0:headphones_rendering_mode=binaural:annotations=en-us=3OA:parameter_id=101:parameter_rate=48000:default_mix_gain=0.0|layout=sound_system=stereo:integrated_loudness=0.0:digital_peak=0.0" \
-streamid 0:0 -streamid 1:1 -streamid 2:2 -streamid 3:3 -streamid 4:4 -streamid 5:5 -streamid 6:6 -streamid 7:7 -streamid 8:8 -streamid 9:9 -streamid 10:10 -streamid 11:11 -streamid 12:12 -streamid 13:13 -streamid 14:14 -streamid 15:15 \
-c:a libopus -b:a 64000 /path/to/output_iamf_or_mp4
```
Expand All @@ -133,7 +153,7 @@ ffmpeg -i /path/to/input_FOA.wav -i /path/to/input_stereo.wav \
-filter_complex "[1:a]channelmap=0|1:stereo[FRONT]" \
-map "[FRONT]" \
-stream_group "type=iamf_audio_element:id=2:st=4:audio_element_type=channel,layer=ch_layout=stereo" \
-stream_group "type=iamf_mix_presentation:id=3:stg=0:stg=1:annotations=en-us=default_mix_presentation,submix=parameter_id=100:parameter_rate=48000:default_mix_gain=0.0|element=stg=0:headphones_rendering_mode=binaural:annotations=en-us=FOA:parameter_id=101:parameter_rate=48000:default_mix_gain=0.0|element=stg=1:headphones_rendering_mode=stereo:annotations=en-us=stereo:parameter_id=101:parameter_rate=48000:default_mix_gain=0.0|layout=sound_system=stereo:integrated_loudness=0.0" \
-stream_group "type=iamf_mix_presentation:id=3:stg=0:stg=1:annotations=en-us=default_mix_presentation,submix=parameter_id=100:parameter_rate=48000:default_mix_gain=0.0|element=stg=0:headphones_rendering_mode=binaural:annotations=en-us=FOA:parameter_id=101:parameter_rate=48000:default_mix_gain=0.0|element=stg=1:headphones_rendering_mode=stereo:annotations=en-us=stereo:parameter_id=101:parameter_rate=48000:default_mix_gain=0.0|layout=sound_system=stereo:integrated_loudness=0.0:digital_peak=0.0" \
-streamid 0:0 -streamid 1:1 -streamid 2:2 -streamid 3:3 -streamid 4:4 \
-c:a libopus -b:a 64000 /path/to/output_iamf_or_mp4
```
Expand All @@ -160,7 +180,7 @@ ffmpeg -i /path/to/input_TOA.wav -i /path/to/input_stereo.wav \
-filter_complex "[1:a]channelmap=0|1:stereo[FRONT]" \
-map "[FRONT]" \
-stream_group "type=iamf_audio_element:id=2:st=16:audio_element_type=channel,layer=ch_layout=stereo" \
-stream_group "type=iamf_mix_presentation:id=3:stg=0:stg=1:annotations=en-us=default_mix_presentation,submix=parameter_id=100:parameter_rate=48000:default_mix_gain=0.0|element=stg=0:headphones_rendering_mode=binaural:annotations=en-us=3OA:parameter_id=101:parameter_rate=48000:default_mix_gain=0.0|element=stg=1:headphones_rendering_mode=stereo:annotations=en-us=stereo:parameter_id=101:parameter_rate=48000:default_mix_gain=0.0|layout=sound_system=stereo:integrated_loudness=0.0" \
-stream_group "type=iamf_mix_presentation:id=3:stg=0:stg=1:annotations=en-us=default_mix_presentation,submix=parameter_id=100:parameter_rate=48000:default_mix_gain=0.0|element=stg=0:headphones_rendering_mode=binaural:annotations=en-us=3OA:parameter_id=101:parameter_rate=48000:default_mix_gain=0.0|element=stg=1:headphones_rendering_mode=stereo:annotations=en-us=stereo:parameter_id=101:parameter_rate=48000:default_mix_gain=0.0|layout=sound_system=stereo:integrated_loudness=0.0:digital_peak=0.0" \
-streamid 0:0 -streamid 1:1 -streamid 2:2 -streamid 3:3 -streamid 4:4 -streamid 5:5 -streamid 6:6 -streamid 7:7 -streamid 8:8 -streamid 9:9 -streamid 10:10 -streamid 11:11 -streamid 12:12 -streamid 13:13 -streamid 14:14 -streamid 15:15 -streamid 16:16 \
-c:a libopus -b:a 64000 /path/to/output_iamf_or_mp4
```
Expand Down
19 changes: 19 additions & 0 deletions iamf/cli/textproto_templates/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,25 @@ Set the following fields in the textproto template.
Set this to the desired output filename. The generated .iamf file will be
named `file_name_prefix.iamf`.
- `loudness`
Measure the loudness of the input audio, including the stereo downmix, and
store these values in the following loudness fields. IAMF decoders and
renderers can use this loudness metadata to normalize the output audio.
- `loudness.integrated_loudness`
This is the [ITU-R BS.1770-4](https://www.itu.int/rec/R-REC-BS.1770)
integrated loudness, specified in LKFS. Convert the loudness value to
the correct `int16` value to use here as `integrated_loudness =
integrated_loudness_in_lkfs * 256`.
- `loudness.digital_peak`
This is the digital (sampled) peak value of the audio signal, specified
in dBFS. Convert the peak value to the correct `int16` value to use here
as `digital_peak = digital_peak_in_dBFS * 256`.
Optionally, modify other fields in the textproto template as necessary.
- `channel_metadatas`
Expand Down

0 comments on commit 4e74eb9

Please sign in to comment.