Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Ideas on Timing for Data Collection and Metric Reporting #394

Open
zengzuo613 opened this issue Oct 18, 2023 · 3 comments
Open

New Ideas on Timing for Data Collection and Metric Reporting #394

zengzuo613 opened this issue Oct 18, 2023 · 3 comments

Comments

@zengzuo613
Copy link

Web Vitals supports six metrics, and I think the generation time of TTFB, FCP, and LCP are all before the onload event, while the other metrics are related to page load time and user operations.
According to the continuous callbacks and separate reporting of each metric, this will consume a large amount of frontend resources (connections, bandwidth and CPU, etc.), and will also increase the difficulty of backend processing.

We believe:

  1. After the page is onload and the page loading reaches 5s (5s is the CLS session window), collect and report the six metrics all together
  2. Before the page closes, collect and report FID, CLS, and INP metrics again

This way, we can obtain real page performance data and reduce the number of collect reports. Do you think this is reasonable?

@zengzuo613 zengzuo613 changed the title When to collect and report metrics? New Ideas on Timing for Data Collection and Metric Reporting Oct 18, 2023
@tunetheweb
Copy link
Member

I think the generation time of TTFB, FCP, and LCP are all before the onload event

LCP can in theory change even after the onload event. For example if a script calls a setTimeout which then alters the content. Also I think the onload event can fire before processing of those resources (e.g. large image decodes, or CSS or script process).

According to the continuous callbacks and separate reporting of each metric, this will consume a large amount of frontend resources (connections, bandwidth and CPU, etc.)

While it's true that the basic usage will fire 6 callbacks (which could result in 6 beacons), I'm not sure I'd classify that as a "large amount of front end resources" and will also increase the difficulty of backend processing.

However, we do document how to batch reports together for those wanting to reduce network requests (note it will still take memory and processing to batch the data).

This could also be handled automatically in your analytics code. I understand that gtag for example, as used by GA4, batches beacons together within a certain timeframe (though the first one is always sent).

We believe:

  1. After the page is onload and the page loading reaches 5s (5s is the CLS session window), collect and report the six metrics all together
  2. Before the page closes, collect and report FID, CLS, and INP metrics again

This way, we can obtain real page performance data and reduce the number of collect reports. Do you think this is reasonable?

That is certainly not a bad strategy and is similar to the way a log of commerical RUM providers are going (in fact many currently only do 1 and are looking to add 2 now). This minimises the number of network requests while still aiming to collect the most accurate data. There is a risk that some beacons can be lost with this strategy (for example, if a page crashes) whereas beaconing as soon as possible reduces that risk. Though beaconing on page visibility changes should minimise that risk.

The other complication is that, by default web-vitals.js only emits the callback when a metric is finalised. So we will not emit an "early CLS" for example. You will need to call this with reportAllChanges flag to get all the changes to allow early processing, and then process each one (which in itself potentially creates more processing!). So it may be easier to just report what you have after 5 seconds (even if it's not all 6), and then report again on visibility change as usual.

The library could add a mode to report for this 5 second and end of session method, rather than everything in between but at the moment we prefer to report as accurately as possible and leave how to process the events to implementers themselves.

@zengzuo613
Copy link
Author

zengzuo613 commented Nov 1, 2023

While it's true that the basic usage will fire 6 callbacks (which could result in 6 beacons), I'm not sure I'd classify that as a "large amount of front end resources" and will also increase the difficulty of backend processing.

I need to merge the six beacons into one row of data. Using batch reporting can reduce such situations.


That is certainly not a bad strategy and is similar to the way a log of commerical RUM providers are going (in fact many currently only do 1 and are looking to add 2 now). This minimises the number of network requests while still aiming to collect the most accurate data. There is a risk that some beacons can be lost with this strategy (for example, if a page crashes) whereas beaconing as soon as possible reduces that risk. Though beaconing on page visibility changes should minimise that risk.

We make the RUM SDK similar to commercial companies, evaluating the performance of all projects. Not only do we collect web vitals, but we also collect a lot of data from performance timings, navigation, entries, etc., which helps us comprehensively analyze page performance.

I need to standardize the collection timing as much as possible, so it is fair for all projects. Metrics like CLS and INP might increase as the collection timing extends. Therefore, I chose the timing when the page has finished loading and has been idle for 5 seconds.

Reducing the number of collections and reports, making the collection timing fair for all projects, and not losing data are like the CAP theorem, we can't achieve all. We can only ensure accuracy(after the page has finished loading.) and fairness, try to reduce the number of collections and reports without affecting performance, and collect as much data as possible. We can accept losing a small amount of data because as long as we ensure a large enough volume of data, the results will not deviate too much.

Therefore, we design two collections and reports.

  1. After the page is onload and the page loading reaches 5s (5s is the CLS session window), collect and report the six metrics all together
  2. Before the page closes, collect and report FID, CLS, and INP metrics again

When we do statistics, we only consider the first scenario, which is fair to all projects. The second scenario is only used as a performance reference and is not included in the performance statistics.


The other complication is that, by default web-vitals.js only emits the callback when a metric is finalised. So we will not emit an "early CLS" for example. You will need to call this with reportAllChanges flag to get all the changes to allow early processing, and then process each one (which in itself potentially creates more processing!). So it may be easier to just report what you have after 5 seconds (even if it's not all 6), and then report again on visibility change as usual.

If you think the collection strategy is relatively reasonable, then we need to modify Web Vitals to support getting all metrics at once. Setting reportAllChanges to true only triggers a callback each time, but it can't achieve this.

The core logic code is as follows:

observeAllWithTimeout = (): Promise<Map<WebVitalsEntryType, PerformanceEntry[]>> => {
    return new Promise((resolve, reject) => {
      const start = performance.now();
      let timer = 0;
      let isCallback = false;
      this.onWebVitalsObserver((entryMap: Map<WebVitalsEntryType, PerformanceEntry[]>) => {
        timer && window.clearInterval(timer);
        isCallback = true;
        resolve(entryMap);
      });
      if (!isCallback) {
        timer = window.setInterval(() => {
          if (!isCallback && timer && performance.now() - start > 100) {
            window.clearInterval(timer);
            reject();
            reportSdkError(new Error('The onWebVitals method did not callback within 100ms.'));
          }
        }, 10);
      }
    });
  };


onAllObserver = (callBack: Function): void => {
    const observer = new PerformanceObserver((list) => {
      const entryMap = new Map<WebVitalsEntryType, PerformanceEntry[]>();
      for (const entry of list.getEntries()) {
        if (entry.name === 'first-contentful-paint') {
          this.addMap(WebVitalsEntryType.FCP, entry, entryMap);
        } else if (entry.entryType === WebVitalsEntryType.CLS) {
          this.addMap(WebVitalsEntryType.CLS, entry, entryMap);
        } else if (entry.entryType === WebVitalsEntryType.LCP) {
          this.addMap(WebVitalsEntryType.LCP, entry, entryMap);
        } else if (entry.entryType === WebVitalsEntryType.FID) {
          this.addMap(WebVitalsEntryType.FID, entry, entryMap);
        } else if (entry.entryType === WebVitalsEntryType.INP) {
          this.addMap(WebVitalsEntryType.INP, entry, entryMap);
        }
      }
      observer.disconnect();
      callBack(entryMap);
    });
    [
      { type: 'paint'},
      { type: WebVitalsEntryType.CLS},
      { type: WebVitalsEntryType.LCP},
      { type: WebVitalsEntryType.FID},
      { type: 'event', durationThreshold: 40},
    ].forEach((config) => {
      observer.observe({ ...config, buffered: true});
    });
  };

We found through testing that the collection timing(the page is onload and the page loading reaches 5s) will definitely trigger a callback of type=paint. In the callback, it will return entries of all types, including those of other Web Vitals indicators. This way, we can calculate all Web Vitals indicators at once.

So I still want to ask a few other questions.

  1. Is it reasonable to get all metric values at once using this method?
  2. Its premise is that various types of performanceEntry will be saved in the browser after the page is opened and can be obtained together after onload. Can the browser widely support it?

@tunetheweb
Copy link
Member

Just cleaning up old issues and realised I never answered this:

Is it reasonable to get all metric values at once using this method?

Measuring directly from the API misses several nuances (which are the whole point of the web-vitals library!) as there are often differences between the API and the metric (see LCP differences as one example). Also it doesn't look like you're excluding layout shifts with hadRecentInput set to true, and not sure if you're windowing the layout shifts into CLS.

Its premise is that various types of performanceEntry will be saved in the browser after the page is opened and can be obtained together after onload. Can the browser widely support it?

Yes the point of buffered entries are that they can be replayed afterwards. However, there are a few nuances to understand:

  • There are max buffer sizes for each entry type. However if recording after onload it's unlikely you'll overflow that buffer and miss things. But something to be aware off if calling later.
  • Certain metrics (TTFB) are not available through Performance Observers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants