Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ideas for an automatic buffer goal setting #1551

Open
peaBerberian opened this issue Sep 23, 2024 · 2 comments
Open

Ideas for an automatic buffer goal setting #1551

peaBerberian opened this issue Sep 23, 2024 · 2 comments
Labels
proposal This Pull Request or Issue is only a proposal for a change with the expectation of a debate on it work-in-progress This Pull Request or issue is not finished yet

Comments

@peaBerberian
Copy link
Collaborator

peaBerberian commented Sep 23, 2024

To document our attempts on this subject, I decided to open this issue.

It lists what we already do to set an automatic buffer goal, what we tried to do but abandoned and what we're currently trying to do.

It's primarily intended to RxPlayer maintainers, but is also a good reference to describe some innovations we currently work on (which is in the interest of people not familiar with the RxPlayer's codebase), which is why I'm here being more verbose and descriptive than usual.

Preamble: Definitions

I'm assuming here familiarity with some terms we use in the RxPlayer, described below.

MSE, SourceBuffer/media buffers and buffering

MSE, for Media Source Extension, is basically a specification implemented in most popular browsers that allows complex media players to be implemented in JavaScript. The RxPlayer relies on it to play most media contents.

MSE defines the concept of SourceBuffer JavaScript objects, that I here call interchangeably "SourceBuffers" or media buffers, which stores audio and video data for future decoding.

Basically the main job of the RxPlayer is to load the right chunks of media data (that we call "segments") and "push" them to those SourceBuffers so the browser and lower layers may be able to decode the content.

This "load-and-push logic: is the buffering action I'm referring to in this issue.

Buffer goal

In the RxPlayer, what we call the "buffer goal" is the amount of media data in seconds we will pre-buffer ahead of the current position in the SourceBuffers, allowing to continue playback smoothly even e.g. in cases where the network bandwidth temporarily decreases.

For example if we're playing a content at the position 10 (seconds) and we have a buffer goal set to 30, we're going to try loading segments until the position 40 in the corresponding SourceBuffer.

Note that as the playback position evolves (as a content plays), the last position to buffer also evolves by the same extent (e.g. when position advances to 15, we're loading until 45 if our buffer goal is still set to 30).

wantedBufferAhead

The RxPlayer has a wantedBufferAhead option which is the way in which an application can influence the RxPlayer's buffer goal.

In reality, the RxPlayer might today rely on a "buffer goal" lower than the wantedBufferAhead, because for example it noticed that the wantedBufferAhead was too high for the current device.

Browser Garbage Collection of segments

The RxPlayer use the term "browser GC" often abusively not to refer to the actual GC (behind the cleaning of now unreachable memory) but to refer to the browser's media buffer cleaning strategies when there's not much memory left.

Basically after a lot of media data has been buffered and not a lot of memory is left, browsers rely on algorithms (described by the MSE w3c recommendation) to remove the media data it infers to be the less useful (e.g. behind the current position)

Because it's fun, here's a look into the presumably corresponding Google Chrome implementation: https://source.chromium.org/chromium/chromium/src/+/main:media/filters/source_buffer_stream.cc;drc=a150b50c0ff706af12c449c7fccd3cf2745e2061;bpv=1;bpt=1;l=756

(Note that there are also referring to this as "garbage collection"!)

The corresponding Firefox implementation seems to be around here: https://searchfox.org/mozilla-central/source/dom/media/mediasource/TrackBuffersManager.cpp#336

Issue

The RxPlayer is used on a large panel of devices, which variety grew a lot in the last years, due not only to its open-source aspect but also due to a large expansion of targeted platforms at Canal+ and by its international partners.

Many of those devices are memory-constrained.
This in itself a problem for the RxPlayer because media data, especially high resolution and high dynamic range media data, can quickly necessitate a lot of memory.
Due to this, the recommendation is often to set a lower wantedBufferAhead when a high value seems to be too much for the device and/or a maxVideoBufferSize. This means we will pre-buffer less data.

It works but it also has the following effects:

  • A lower buffer size means a higher chance of rebuffering as we'll have less buffer to "cushion" a fall in bandwidth, a network issue, or some other similar events.

  • This also often means a poorer media quality will be displayed. This is because our algorithms choosing the ideal quality (often called "adaptive bitrate algorithms") are influenced by the current buffer size.

    To make it short and simplify why: if there's not much of a buffer, those algorithms don't take risks and stay on a quality they're 100% sure they can handle, if there's a lot of buffer however, it can begin to be more optimistic about checking if a superior quality can be maintained.

So we've been trying several strategies to try constructing the highest possible buffer.

Solution 1: Setting a wantedBufferAhead recommendation per device

This solution basically expands on our initial strategy of setting a lower wantedBufferAhead and maxVideoBufferSize (which seems more appropriate as it indicates a limit in terms of memory size) on memory-constrained devices.

The idea would be to identify as many devices as possible and to know their limitation in advance. We would then set an appropriate wantedBufferAhead and/or maxVideoBufferSize which would be high enough so we can deliver a good streaming experience yet far enough from potential memory limits.

With this idea in mind, we worked with some constructors (e.g. companies making TVs) to obtain ways to determine how much data can be buffered.

For now, we plan to first put the corresponding wantedBufferAhead and maxVideoBufferSize values inside the application as opposed to inside the RxPlayer for reasons described below.

However, this strategy has several drawbacks:

  • In an actual application, the application itself will have an impact on memory, meaning that the actually used wantedBufferAhead and maxVideoBufferSize values would change per application. There's consequently no generalization that can easily be done inside the RxPlayer.

  • We won't be able to list all existing devices, nor will we be able to obtain such values from all companies.

  • Companies seems to incentive us to buffer the less media data possible, I guess to be sure issues are prevented.

    We understand that, but we can see on some tested devices that the actual limit is very very far off what has been communicated.

  • For now in part because of the precedent points, we don't plan to integrate such configuration inside the RxPlayer nor open-source it.

It would thus be nice to also have a way to determine an optimal buffer size inside the RxPlayer completely algorithmically.

Solution 2: QuotaExceededError handling

Coded frame eviction and QuotaExceededError

The MSE recommendation proposes a "coded frame eviction" algorithm to run when the media player try to append media data but the browser cannot "accept more media data".

Most media players thus infer that this algorithm runs when we begin to approach memory limits on the current device - as it looks like the main reason for a browser not "accepting" more data.

That eviction algorithm tries to free up memory, generally by first removing already played data from memory (media data "behind" the current playback position) if there's some, then media data distant in the future (generally further than the pushed media data). That's the algorithm I'm speaking of here when talking about "browser garbage collection" (note that most other JS developers would talk about another algorithm when using that expression, and they would be right! Yet in the context of a media player it sometimes makes sense to reuse the term there for this other logic).

If after running this algorithm, there is still not enough data to append the new appended data, the browser is supposed to throw a QuotaExceededError (a JavaScript error type).

The previous RxPlayer strategy

The RxPlayer, like other JavaScript media players, often have strategies to reduce the size of the buffer after receiving such an error.

Previously the idea was - on a QuotaExceededError received when appending media data - to in order:

  1. Try, just in case the browser did not do it right, to free-up some memory from the buffer that isn't much needed (behind the playback position, in the future...)

  2. Reduce the wantedBufferAhead through a ratio system (this system makes the reduction exponentially higher at each newer QuotaExceededError), so we try to build less buffer.

  3. Wait a bit before potentially loading and pushing new segments (as we're maybe close to the memory limit, we want to ensure the browser has time to find some memory to free before risking doing operations potentially heavy in memory).

  4. Restart the loading and pushing operations.

Further ideas for algorithmic detection of the buffer size

So I've thought about exploiting that algorithm to know when a memory limit has been reached, in turn to have a good idea of the maximum buffer size available for media right now on the device.

By raising the buffer size progressively until we reach a QuotaExceededError, we can not the current buffer size as that happen (both in terms of seconds but also perhaps more importantly in terms of the combined size of the media data loaded), apply some ratio on it just to be safe (by being far enough from this potential memory limit), and rely on that value for some time.

I was trying an algorithm based on this principle on an LG TV last week, yet it often triggered crashes, I guess because it may be approaching too much the memory limits of the device (though I'm not 100% sure of this).

Anyway, that experience made me question if that algorithm was too risky.
When receiving a QuotaExceededError, we may be relatively close to the real memory limits of the device, which could presumably be reached very soon e.g. after a simultaneous heavy logic from the application.

This is the reason why I wanted to approach other solutions.

Status for this solution

This solution might be useful as a last resort but from initial tests it appears to be risky.

Solution 3: Instrumentalize the coded frame eviction algorithm

Basic idea

This solution is more complex, yet it tries to work-around some of the disadvantages of the previous ones: waiting for the coded frame eviction algorithm to happen on already played media data.

To illustrate the idea at its basis, let's "draw" a crude representation of the played media buffer:

    |              >                                                      |
    ^              ^                                                      ^
second 0     Current position                                        Duration of
           (let's say second 30)                                     the content
                                                               (let's say second 140)

Now let's consider that we have a wantedBufferAhead of ten seconds, that we've reached:

    |              >====                                                  |
    0             30   40                                                140

               `=` here representing
                 loaded media data
              (from the second 30 to
                 the second 40)

As we continue to play the content, we'll continue loading 10 seconds in advance, but also keep the already loaded data. When playing second 40, we could thus be in the following
situation:

    |              =====>====                                             |
    0             30   40   50                                           140

With thus media data behind the current position from 30 to 40 and in front from 40 to 50.

After enough time, we could be left with a lot of buffer behind:

    |              ====================>====                              |
    0             30                  70   80                            140

This means that there's a lot of loaded data currently residing in memory. Let's assume that we're close to the memory limit for the current device and as such the browser begins to remove some media data through its aforementioned "coded frame eviction" algorithm.

It will begin removing the already-played data. Let's say it removes the data from 30 to 36 seconds:

    |              XXX=================>====                              |
    0                36               70   80                            140

                `X` here represents
               media data that's been
               removed by the browser

At that point, the RxPlayer can detect that the browser evicted media data (and I do mean "detect" here as there's sadly nothing explicit. A media player has to infer that data eviction took place - for example by regularly checking with the browser if it has media data at various timestamps) estimate roughly the current buffer size and set it as a current limit.

Let's say that it estimates a number of seconds (in reality it will also consider the size in bytes of all loaded segments). We would do 80 - 36 (current size of the buffer in seconds) which is equal to 44.

We thus know that for now we can construct a buffer of 44 seconds. We now could even be automatically removing the already-played data and just build 44 seconds of buffer:

    |                                  >=====================             |
    0                                 70                    114          140

This seems less risky than just waiting for a QuotaExceededError as the browser had a lot of data it could potentially remove behind the current position if memory usage peaks at some point.

Actual implementation

If the previous example was just implemented as is, there would be some problems:

  1. We would be waiting until the browser garbage collect all data before updating the wanted buffer size at once. This potentially means a lot of time is spent with a very low buffer size.

  2. As written in the previous chapter, the size of the buffered data is more important that the amount of time we can buffer: if we buffer high-quality data, we should expect that less data can be buffered than if we buffer low-quality data.

  3. When the "actual" maximum buffer size is found at the end of the algorithm, if we just set it as our new buffer size, we're left with the same risks than the QuotaExceededError solution: if the application decides to rely on a lot of memory at once, there could be issue due to no memory left.

Because of those issues, the actual implementation I'm currently checking on devices has the following tweaks:

  1. We're regularly raising the wantedBufferAhead as the buffer behind [the current playback position] grows without triggering a browser GC.
    Basically we currently wait until there's as much buffer "behind" than buffer ahead, before raising the wantedBufferAhead a few seconds. We also clean-up the buffer behind a little to ensure we're not actually filling memory at once when doing that.

  2. When browser GC happens we also set the maxVideoBufferSize based on how much data is infer to live in the buffer right now.

  3. For now, we keep a form of the "as much buffer behind than in front" principle so the browser has a lot of data it can removes if it needs to. Though here the strategy is in reality more complex: to prevent all potential issues, once a browser GC is detected, we try using much less memory than the detected limit.

Those tweaks means that we're not building a buffer as large as we theoretically could (by a somewhat big margin) but it presents much less risks (e.g. of crashing the application) and still gave good results on the devices I tested it on until now.

Issues for this solution

There's still some risks with this solution.

The main one I can think of is that the browser GC (on the media buffers) may with a regular MediaSource only happen when pushing new segments. If we're not, and if we're close to the memory limit, we still risk a crash if something somewhere decides to do some memory-heavy operation.

To note that the ManagedMediaSource interface, as far as I know only available on Safari, had the bright idea (I'm not even sarcastic here!) to allow what MSE calls "memory cleanup" at anytime, which would fix that issue.
Firefox also seems interested.
I didn't find Chrom(ium) status on this.

Status for this solution

This is the solution I'm currently testing, that could be relied on together with "solution 1".

It gives good results though I haven't had my hands on a device truly low-on-memory yet, which would be a real indication on whether the "security margins" this algorithm takes still allow for a large-enough buffer on them or if they lead to a worse result than what we currently have.

Anyway this is still highly experimental, may be put to trash in the end, or maybe only used when ManagedMediaSource is available, yet it could fix one of our main pain points: enabling higher buffer sizes on memory-constrained devices.

@peaBerberian peaBerberian added work-in-progress This Pull Request or issue is not finished yet proposal This Pull Request or Issue is only a proposal for a change with the expectation of a debate on it labels Sep 23, 2024
@lfaureyt
Copy link
Contributor

This automatic adaptation to the buffering capacity of the device seems smart and powerful. Could you elaborate a little on the following part (if it is relevant) ?

We also clean-up the buffer behind a little to ensure we're not actually filling memory at once when doing that.

@peaBerberian
Copy link
Collaborator Author

Could you elaborate a little on the following part (if it is relevant) ?

In my current version of this algorithm, when I'm raising the wantedBufferAhead, I'm removing a large part of the already played buffer.

This may be unneeded but the idea was that raising the wantedBufferAhead at that point will trigger a queue of segment requests directly to fill the buffer, in contrast of doing regular playback with a wantedBufferAhead already reached where e.g. we load 2s segments every 2 seconds of playback.

When designing that solution, I was a little uneasy with playing with/guessing the device's memory like we did, and the burst of memory usage we could have when raising the wantedBufferAhead (on top of what could be close to the actual memory limit).

Normally if everything is done well, the browser should remove older buffer when reaching memory limitations regardless of if we're in such "bursts", so it shouldn't really matter.
But because I wasn't sure of everything that could go wrong at those time and because such heuristics already seem risky enough, I decided for now to free some older buffer before raising it ahead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal This Pull Request or Issue is only a proposal for a change with the expectation of a debate on it work-in-progress This Pull Request or issue is not finished yet
Projects
None yet
Development

No branches or pull requests

2 participants