Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC 98: Remote channels for cross-browsing-group communication #98

Merged
merged 14 commits into from
Nov 22, 2021

Conversation

jgraham
Copy link
Contributor

@jgraham jgraham commented Sep 21, 2021

@jgraham jgraham changed the title Add proposal for remote channels for cross-browsing-group communication RFC 98: Remote channels for cross-browsing-group communication Sep 21, 2021
rfcs/remote_channel.md Outdated Show resolved Hide resolved
rfcs/remote_channel.md Outdated Show resolved Hide resolved
rfcs/remote_channel.md Outdated Show resolved Hide resolved
rfcs/remote_channel.md Outdated Show resolved Hide resolved
rfcs/remote_channel.md Outdated Show resolved Hide resolved
@jgraham
Copy link
Contributor Author

jgraham commented Sep 29, 2021

Gentle ping here. This has been open for a week, so per the process I could go ahead and merge it, but I'm pretty sure you don't want me to do that ;)

field provides the line/column numbers of the original exception,
where available.

In addition there is a `RemoteWindow.executeScriptNoResult(fn,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on my experience on bfcache tests, I feel executeScriptNoResult() might be hard to use, because:

  • Mixing await executeScriptNoResult() and await executeScript() might complicates the timeline. Having await executeScript() only would be clear, because it serializes the script evaluation on caller and remote context.
  • The choice between await executeScriptNoResult() and await executeScript() might depend on internals of executeScript(). When I was migrating my bfcache tests from COOP-based framework to test_driver-based framework, I had to make changes around executeScriptNoResult()-equivalent that were hard to explain without knowing the internal structure of the framework, so I switched to the executeScript()-only interface.

So I'd prefer removing executeScriptNoResult().
Still there would remain complexities around navigation: instead of executeScriptNoResult(), I created promises / async callbacks triggered by executeScript() but not blocking async executeScript() in bfcache tests. But by removing executeScriptNoResult() I expect we can make the complexities only navigation-specific, not affecting RemoteWindow interface.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So (for the context of others) the technical concern here is that for bfcache tests in particular you need all sockets to be closed before initiating the navigation. The problem is that if you send a message, execute some script that closes all sockets, and then send the result, you end up re-opening a socket for the result. So there are broadly two choices:

  • Don't try to send a result at all. postMessage works like this, and executeScriptNoResult does the same: we simply don't provide a channel to send the result on so there's no concern about sockets being reopened.
  • Send a result, but ensure we navigate after the result has been sent, so nothing will implictly reopen a socket. Practically this means something like setTimeout(async url => {await closeAllSockets(); startNavigation(url)}, 0). The problem with that is that we are currently linting for setTimeout so the infra discourages even usage that's required like this.

I don't quite get the argument about serialization; in the case that you're putting the actual action of the remote script in a timeout the test window can guarantee the script has been sent, but the actual action of the script hasn't happened at the time the value is returned, so it doesn't seem like that method offers particuarly stronger guarantees around ordering. I suppose one advantage is that any exception that happens when evalutaing the script is reported through the return value, rather than just happening in the remote window.

So, I think I'm mostly OK with dropping executeScriptNoResult because, as you say, it's exposing an implementation detail and it seems cleaner to start with a smaller API. My concern is that for some kinds of tests (notably bfcache), understanding that implementation detail is very important, and there's an argument that making it an explicit part of the API will help people write correct tests. So I'd like to hear other opinions here.

at the time of navigation, otherwise the page will be excluded from
bfcache. In the current prototype this is handled as follows:

* A `pause` method on `SendChannel`. This causes a server-initiated

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As commented at web-platform-tests/wpt#29803 (comment), I'm not sure how pause works and whether pause is needed (the draft impl doesn't use pause).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the problem pause is intended to solve is where you have something like:

await remote.executeScript(() => startNavigation());
await remote.executeScript(() => ensureNavigationHappened());

With a poll() based interface this can work as written; the remote simply doesn't call poll() after startNavigation and before the navigation is complete. But with websockets, we don't have any synchronisation between the navigation and sending the ensureNavigationHappened message. If the test sends the second message before the socket is closed on the remote, it can end up being lost. So pause solves this by effectively providing a synchronisation point around the socket being closed:

await remote.executeScript(() => startNavigation());
remote.pause();
// We now know the remote now won't listen for messages until it reconnects to the socket after navigation
await remote.executeScript(() => ensureNavigationHappened());

From the point of view of the remote, closeAllSockets() and pause() do race, but the effect of pause() is a server-initiated disconnect of the read channel for the remote, and the effect of closeAllSockets is a client-initiated disconnect of all channel-related sockets, so the ordering isn't very important.

We can avoid requiring pause by not sending requests to a remote after initiating a navigation, before getting a confirmation that the navigation is actually complete. In practice this means that the target page has to be a different RemoteWindow (i.e. have a different uuid) and has to process a message i.e. something like:

await remote.executeScript(() => startNavigation());
await navigationTargetRemote.executeScript(() => goBack());
await remote.executeScript(() => ensureNavigationHappened());

That's what the bfcache tests are doing now, and why pause isn't needed there, but either way the ergonomics are a bit tricky. I think we could drop it from the initial featureset, but these races around navigation are tricky to understand and debug.

rfcs/remote_channel.md Outdated Show resolved Hide resolved
via the main test window, make it hard to build an ergonomic
cross-context messaging API.

### Proposal

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to clarify (somewhere in the RFC) what are the dependencies of this RFC in terms of implementation.

IIUC this RFC can be implementable by pure JavaScript outside WPT infra + existing server-side stash, which is good because we can run the tests by running wpt server and bare browsers, without further depending on WPT test infra.
This RFC mentions integration with WebDriver BiDi, test_driver and testharness, but these are out of scope of this RFC itself.

Is my understanding correct?
Why the changes to resources/testharness.js and tools/wptrunner/wptrunner are needed in web-platform-tests/wpt#29803 ?

(Other changes, in JS/HTML files, tools/wptserve/wptserve/stash.py, and websockets/handlers/msg_channel_wsh.py look like "implementable by pure JavaScript outside WPT infra + existing server-side stash")

they are unable to communicate. For example windows opened with the
`noopener` attribute will not have a handle to their parent, nor will
the parent have a handle to the child. Similarly, cross-origin
navigations with the `cross-origin-opener-policy` header appropriately

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this RFC going to replace the framework in the COOP/COEP tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't expect the RFC to require moving the COOP/COEP tests. However I expect the RFC to provide all the features needed for those tests, and would anticipate the actual migration happening as a followup once the implementation has landed.

synchronize the navigation starting (which will close the socket) with
writing the response.

TODO: the naming here isn't great. In particular a `RemoteWindow`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't RemoteContext work (as proposed in #91)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because RemoteContext is already used by testharness.js to mean something slightly different:

    /*
     * A RemoteContext listens for test events from a remote test context, such
     * as another window or a worker. These events are then used to construct
     * and maintain RemoteTest objects that mirror the tests running in the
     * remote context.
     */

In the long term it would make sense to unify these features, but in the short term a naming conflict will cause breakage.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe RemoteGlobalScope?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would RemoteGlobal work? RemoteGlobalScope is quite a mouthful.

the transition may even be seamless.

testdriver integration is possible. For example we could add
`RemoteContext.testdriver.click` to execute a click in the remote

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would RemoteContext.testdriver.click be a pure JavaScript wrapper like RemoteContext.executeScript(() => test_driver.click())?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't work as written, because all testdriver commands have to go via the test window since that's the one that webdriver is using. So it would desugar into testdriver.click(RemoteContext.uuid), and testdriver would learn to look up contexts from a uuid parameter in the URL, as well as from the internal identifier it uses today.

without providing the API surface that only makes sense in a test
window.

It may be possible in the future to replace the backend with a

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For further integration with WebDriver, testdriver etc., discussion about benefits and risks would be helpful (not necessarily now though, given that this is "Possible Future Additions").

One one hand ("Risk" side), I basically prefer not integrating the framework too much to WebDriver/test_driver etc., to make the framework work with minimum dependencies and work with bare browsers, and not to make debugging harder. We might want to JSON.stringify() instead of WebDriver BiDi format if we want to reduce complexity around serialization, given that the current tests considered so far don't need WebDriver BiDi capability beyond JSON.stringify().
So I'm interested in hearing more about the benefits (from the point of view of test writers and WPT infra maintainers) when we consider the further integration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, future additions here would be subject to the same RFC process. So the idea was just to sketch out some possibilites in case people felt there were specific parts that should be given additional consideration in the initial design, or would be high value to work on sooner.

In terms of the serialization format, we are already using extra complexity beyond JSON-stringify when serializing a SendChannel for message responses. We obviously could special case that, but in general I think using plain JSON as a serialization format for data where we might want to support non-JSON types is a bad idea (e.g. current WebDriver mostly uses plain JSON but also has super-hacky support for serializing Element objects which doesn't generalise to other types and is generally a mess). Aiming for a featureset closer to structured clone seems more future proof, but structured clone doesn't actually define a serialization. So, given that WebDriver BiDi has very similar requirements, and actually defines the wire format, it seems sensible to use that as a reference point. The fact that, going forward, we could use WebDriver BiDi instead of the stash as the backend in the wptrunner implementation is good, but it's not the primary motivation here.


TODO: the naming here isn't great. In particular a `RemoteWindow`
could actually be some other kind of global like a worker, and
`start_window()` is a pretty nondescript method name.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a similar thing to start_window() in my bfcache tests and I was also not sure about the better naming, better semantics, etc. (and thus I just made it const executor = new Executor(uuid); and avoided explicit naming).
So I'm also curious how this should be named.

@hiroshige-g
Copy link

Commented somehow verbosely to move discussion forward.

As for executeScript(), the RFC basically looks similar to what the current RemoteContext.execute_script() and bfcache tests do, while there are more primitives around navigations (executeScriptNoResult, pause and closeAllChannelSockets in this RFC while the bfcache tests only have closeAllChannelSockets-equivalent, as commented inline above) and I'd like to understand the differences (IIUC there aren't fundamental differences, because I expect switching the impl from Fetch API to websocket doesn't impact so much).
So in terms of executeScript(), this RFC looks like switching from Fetch API to websocket based implementation (compared to the previous RFCs and web-platform-tests/wpt#28950), both pure JavaScript-based, and thus doesn't cause test bodies except for mechanical changes (correct?).

The RFC is more about replacing the existing send() and receive() primitives currently used in COEP/COOP tests, by introducing related API classses, adding more APIs in addition to send()/receive() etc., and I expect test writers (other than me) and WPT infra people might have more interests on these.

cc/ @ArthurSonzogni and @foolip.

@jgraham
Copy link
Contributor Author

jgraham commented Oct 5, 2021

@hiroshige-g Thanks for the detailed comments, it's much appreciated! I've responded to the technical concerns, hopefully in a way that also provides context for other reviewers. I'll update the RFC for the more editorial issues. Please let me know if I miss[ed] anything.

jgraham and others added 3 commits October 11, 2021 16:57
* Remove executeScriptNoResult

* Rename `start_window()` to `start_window_channel()` and add
`window_channel()` without the auto-connect behaviour.

* Change the (de)serialization model to automatically create local
objects (like structuredClone) rather than requiring a `toLocal()`
call.

* Updates from PR feedback.
@foolip
Copy link
Member

foolip commented Oct 12, 2021

Ping @web-platform-tests/wpt-core-team for review on this. I'll be taking a look tomorrow (I am a liar) but more review is better!

rfcs/remote_channel.md Outdated Show resolved Hide resolved
rfcs/remote_channel.md Show resolved Hide resolved
rfcs/remote_channel.md Outdated Show resolved Hide resolved
rfcs/remote_channel.md Outdated Show resolved Hide resolved
rfcs/remote_channel.md Outdated Show resolved Hide resolved
rfcs/remote_channel.md Outdated Show resolved Hide resolved
rfcs/remote_channel.md Outdated Show resolved Hide resolved
rfcs/remote_channel.md Show resolved Hide resolved
rfcs/remote_channel.md Outdated Show resolved Hide resolved
[PR 29803](https://github.com/web-platform-tests/wpt/pull/29803)
contains a prototype implementation of this.

<!-- LocalWords: UUID WebDriver wptrunner testharness APIs UI
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's LocalWords?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry this is emacs embedding the extra dictionary entries.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to keep them, then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well it's convenient for me when editing, but I could remove them once the overall RFC is approved, right before merging.

Co-authored-by: Philip Jägenstedt <[email protected]>
@jgraham jgraham requested a review from foolip October 27, 2021 10:24
@jgraham
Copy link
Contributor Author

jgraham commented Nov 4, 2021

Ping, again. This has now gone another week since the last round of updates without further review.

send messages to the remote. Alternatively the `RemoteWindow` may be
created first and its `uuid` property used when constructing the URL.
This API is provided by a `RemoteGlobal` object. The `RemoteGlobal`
object doesn't handle creating the browsing context (or other global
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good clarification!

Copy link
Member

@foolip foolip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some missing punctuation.

rfcs/remote_channel.md Outdated Show resolved Hide resolved
@foolip foolip merged commit 49d43a7 into master Nov 22, 2021
@foolip foolip deleted the remote_channel branch November 22, 2021 11:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants