Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cross-origin IFRAME opting-in to sharing ResourceTiming data #210

Open
nicjansma opened this issue Jun 11, 2019 · 8 comments
Open

Cross-origin IFRAME opting-in to sharing ResourceTiming data #210

nicjansma opened this issue Jun 11, 2019 · 8 comments

Comments

@nicjansma
Copy link
Contributor

I'd like to propose a way for cross-origin IFRAMEs to share their ResourceTiming data with their parent pages, if they choose.

Background

ResourceTiming is one of the cornerstones of RUM, and provides a lot of value to anyone using it. Unfortunately one of the key limitations of ResourceTiming is that not all resources fetched by the page are "visible" to ResourceTiming. There are several reasons for this, but it is primarily due to each frame only reporting on the resources fetched by itself, and not any child IFRAMEs.

One can gather ResourceTiming entries from all children/grandchildren+ IFRAMEs by crawling them (though the ergonomics are not ideal). Both PerformanceTimeline (getEntries()) and PerformanceObserver need to do this crawling.

Unfortunately, due to cross-origin DOM restrictions, one cannot "crawl" into a cross-origin IFRAME to query its frames[n].performance.getEntries() interface. Thus, all content loaded from cross-origin IFRAMEs (and below) are completely hidden to ResourceTiming.

In an analysis I did of ResourceTiming Visibilty, I found that over 30% of all resources fetched by the browser on popular sites are missing from ResourceTiming's crawls, accounting for more than 50% of all bytes downloaded (Page Weight).

This is not ideal, and makes RUM analysis biased, especially compared to browser developer tools and synthetic tests, which report on 100% of all resources. RUM cannot report on Page Weight accurately today.

My guess, based on the common types of content typically loaded from cross-origin IFRAMEs, is that ads and social widgets are likely one of the biggest things completely flying under the radar for ResourceTiming/RUM. I'd really like to be able to shine a better light on both of those third-party components, so we can hold them accountable.

Video content is even worse, with 75% of videos missing from ResourceTiming, 97.3% by byte count.

Cross-origin IFRAMEs are restricted for a reason, as opening access has privacy, security, and probably other concerns. Hoping there's a way to get access to just their fetches, if they opt-in.

We've experimented with a way for third-parties (e.g. ad providers) to bubble up their content via a postMessage() handshake, but it takes JavaScript coordination in both pages to do so.

Proposal

I'd like to figure out if there's a way for content-owners (third-parties that serve content in IFRAMEs) to opt-in to sharing some of their performance data to the base page. We'll need three pieces to make this work:

  1. Giving content owners a way of opting-in to sharing performance data to the base page
  2. Figuring out which performance data is shared
  3. Accessing the shared performance data from the base page

1. Opting-In

For cross-origin resources fetched from the base page, the content owner (e.g. third-party site, CDN, etc) can opt-in to sharing detailed performance data via the TAO HTTP header:

Timing-Allow-Origin: *

I think we could re-use this concept for cross-origin frames. Two options come to mind:

  1. Re-use the Timing-Allow-Origin header: If Timing-Allow-Origin is available on a HTML page that is an IFRAME, it has "opted-in" to sharing not only the detailed performance data for that URL itself, but also anything it fetches. This expands the scope of TAO, so I'm not sure it would pass a security review, but would get us a lot more data quicker (the TAO header is on only around 15% of responses today).
  2. Add a new Timing-Allow-Origin-Resources (or something) header, that would opt the IFRAME into sharing its performance data with the base page.

2. Which Content is Shared?

The most interesting information I personally want from cross-origin IFRAMEs is ResourceTiming data, but I could see it being useful to share UserTiming data as well.

LongTasks already exposes its data to parent frames.

There is probably more performance data that would be useful to "share up", but we could also just scope this to ResourceTiming data for now.

3. Accessing the Shared Performance Data

I'm not sure of the best ergonomics of this. A couple options come to mind:

  1. If opted-in, the base page could somehow get access to frames[n].performance, but nothing else on the frames[n].*. Seems like a tough sell.
  2. Provide an interface on the base page to access all IFRAMEs easily, and the browser would "crawl" the IFRAMES, same- and cross-origin-opted-in. Something like performance.getEntriesByType('resource', { frames: true }). I know there's not much appetite for working on the "old" interfaces though.
  3. If opted-in, a flag could be passed to PerformanceObserver (bubbles!) where any IFRAMEs, same- and cross-origin-opted-in, would report their entries.

The third option would probably get my vote, but it would require bubbles support.

@nicjansma nicjansma added this to the Level 3 milestone Jun 11, 2019
@igrigorik
Copy link
Member

The analysis on missing resources and unattributed bytes definitely motivates the need for an investigation (and a solution :)) in this space. Thank you for the thorough job on the analysis!

In terms of how we go about addressing this, my first reactions are:

  1. New opt-in. Extending TAO might have unintended consequences.
    • At the same time, I'm wary of adding more and more opt-in's and I wonder if there are other (a) existing mechanisms (e.g. something around Feature Policy?) that we could use, or define an extensible mechanism, especially if we're thinking about granular opt-in across different metrics.
  2. Ditto, +1 for bubbles.

@npm1
Copy link
Contributor

npm1 commented Sep 17, 2019

Discussed at TPAC. I think browser vendors are interested in seeing concrete examples of interested parties who would use the API (from the embedded iframe to the embedder). It is already possible to polyfill this (via postMessage) so it's worth having strong use cases before diving deeper into this.

@thebengeu
Copy link

Google Display and Video Ads would be likely to use an API to opt-in to sharing performance data, for ad iframes that we control.

We have RUM for most ads we serve, and TAO on most resources we control. ResourceTiming data has been valuable for quantifying and prioritizing ads improvements, as well as catching regressions.

Ad iframes are indeed frequently cross-origin, and often have cross-origin child iframes of their own. So even if we have code we control within the outermost ad iframe, we may not have visibility into the ad's total byte weight, and it would be useful to us, not just the base page, to have more visibility.

We would be open to sharing most ResourceTiming data to parent frames, aside from full resource URLs, which could possibly leak sensitive personal information. I suspect that may be a concern for other third parties as well.

To address that and make adoption more likely, there could be an additional option to either trim URLs to just the host portion, which should still allow attribution to specific third parties in most cases, or omit URLs entirely, which may still be useful to attribute total byte weight or other aggregate measurements to the iframe sharing ResourceTiming data.

@nicjansma
Copy link
Contributor Author

Both of those ideas (URL "rounding" to the host, or URL being hidden entirely) are great thoughts!

My only concern is that some people may want to expose/capture full URLs, and others want host-only or hidden URLs, then we'd need a way to specify which that option. Through different HTTP header names? Could make this opt-in a bit more complex to specify and implement.

@npm1
Copy link
Contributor

npm1 commented Jan 21, 2020

What would be next steps here? Perhaps talk about it on the F2F? Or is there something we should be doing before then?

@yoavweiss
Copy link
Contributor

@nicjansma - would you be interested in presenting on this to the group?

@nicjansma
Copy link
Contributor Author

Yes, I'm happy to chat about it again at the next meeting

@clelland
Copy link
Contributor

See w3c/performance-timeline#207 for a more general solution being discussed there, not restricted to just resource timing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants