-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use of URI templates seems problematic #259
Comments
Thanks for raising this. For our usage parsing is limited just to finding and replacing the substitution expressions as defined in rfc6570 which should be simpler then general URL parsing. We definitely want to make sure we're reusing functionality that's already present in browsers where possible though. I reviewed the two linked specs, but it doesn't look to me like those provide template substitution functionality. Is there possibly another spec thats in use in browsers which provides template substitution functionality and would be preferable to rfc6570? |
You are correct. This would require a prose-version of whatwg/urlpattern#73. @jeremyroman @sisidovski do you think you could add that to URLPattern so the IFT specification can build upon URLPattern instead of URI templates? |
I'm still catching up with the IFT. @garretrieger Could you help me understand how the URI templates are used in IFT? |
They are used to efficiently store the url of font patches, which can be downloaded to extend an already-downloaded font. |
To expand on that, the idea is this:
A couple of things to note: the IFT client doesn't parse anything in the URL template other then locating the {id} substitution point. Further parsing/fetching of the URL is delegated to fetch. |
Thank you, that is helpful. Regarding the format, from the spec, it looks patch URLs could have multiple substitutions like this.
In URLPattern, substitutions are expressed like It looks there are only six substitutions, all of them are fixed characters and not configurable? https://w3c.github.io/IFT/Overview.html#uri-templates If so, I wonder if IFT really wants whatwg/urlpattern#73, as the use case is pretty limited. |
Yeah, I had missed that the template is fixed. An alternative here might be that IFT just builds the necessary URL paths itself with a prose-written algorithm. It would have to handle certain percent-encoding details, but it shouldn't be too much work and allows removing this dependency. (At least as a normative reference, I think for illustrative purposes it might still be useful to call out the equivalent URI template (or eventual URLPattern template, once we get there).) |
I believe the reason we reference the URI template mechanism is to make it clear how the templates described in the specification are intentionally compatible with it. We had a need for a fill-in template and it seemed more sensible to base the one in the spec off an actual standard rather than rolling our own (which is what an earlier prototype did). It was never about requiring the addition of a full parser. Our current prototype does use https://docs.rs/uri-template-system/latest/uri_template_system/ but that's more a matter of convenience than anything else.
Suppose we were to change the language to reference RFC6570 but note that because the six variables are fixed, it is fine to use a custom algorithm that only supports those variables as long as it produces the same result. Would that be sufficient? There are many ways of accomplishing the substitutions and proscribing an algorithm in the spec would take the other options off the table. (For example, what if there is a URI template implementation available in the client?) |
@skef it depends on whether the URLs that end up being produced would be identical or not. E.g., is the percent-encoding lowercase or uppercase, etc. It's probably easier to accomplish that by just writing something in terms of the URL standard, but if someone verified it, maybe it could work. |
I'd prefer to stick with rfc6570 versus writing our own algorithm. The standard is mature, has a wide variety of good quality implementations already, an existing test suite, and does pretty much exactly what we need it too for our use case. It's important to note that an implementation of rfc6570 doesn't require doing general URL parsing. An implementation only needs to search for and replace substitution expressions (identified by {...}) and anything outside of those expressions is just blindly copied or percent encoded if the codepoints aren't url safe. For context here's a couple of relevant quotes from the spec text: "The syntax is designed to be trivial to parse while at the same time providing enough flexibility to express many common template scenarios." and "The process of URI Template expansion is to scan the template string from beginning to end, copying literal characters and replacing each expression with the result of applying the expression's operator to the value of each variable named in the expression." An expression here refers to anything enclosed in {...}. In my opinion writing our own template syntax and expansion algorithm would likely introduce a similar level of complexity to rfc6570 and since expanding templates in this fashion doesn't require understanding the structure of the literal URL string surrounding the substitution expressions I don't think we need to frame anything in terms of the URL standard. What we're using rfc6570 for is essentially just string substitution and percent encoding where needed. If there's a desire to limit some of the more complex substitution types found in the spec we could look at limiting the level of expression support required by for an IFT implementation. The rfc defines 4 levels of substitution expressions ranging from simple (l1) to complex (l4) (see: https://datatracker.ietf.org/doc/html/rfc6570#section-1.2). For our use cases it would likely be sufficient to only require level 1 and a subset of the level 3 operators ( /, ?, &, ;). Level 3 operators aren't strictly needed, but do allow for more compact templates for some expected use cases. |
I think given how IFT uses this building up a path and calling percent-encode would not be that much complexity. What I'm worried about is that 6570 does not define for instance whether percent-encoding happens with uppercase or lowercase alpha digits. That is a problem. The code points that get percent-encoded might also differ from the percent-encode sets defined by the URL standard. That's a potential issue as I did not try to confirm this one, the prior one seems substantive enough on its own. |
Ah I didn't notice this, that is some what of an issue. I dug into this a bit more, the spec references rfc3986 for percent encoding, which does indicate a preference ("should") for uppercase when producing URLs. There's a few examples throughout 6570 that all use upper case and the official test cases consistently use upper case too. Based on that I think it's safe to assume the intention is for the output to use uppercase. For our use in IFT I think it would be reasonable to specifically clarify this issue and explicitly require upper case in any percent encoding in the produced URLs.
I'll need to look into this one a bit more. Would https://url.spec.whatwg.org/ be the appropriate thing to compare too? |
Yes. |
Spent some time reviewing URL, rfc6570, and how those intersect with what we're trying to do in IFT and here's what I've concluded:
What do you think? |
If you can only produce paths I don't think the URL parser will ever fail (note that it doesn't necessarily fail for invalid input). Also, for percent-encoding there's a percent-encode set you need to decide on for all the code points in the ASCII range. https://url.spec.whatwg.org/#example-percent-encode-operations has an example that goes over the various caller options. |
I wonder how substitution expressions will look like. Once we implement and standardize whatwg/urlpattern#73, are the planned substitution expressions compatible with the URLPattern? |
To my best understanding it doesn't really matter as the template is used as a convenient specification-internal way to generate a set of paths. It's not directly exposed. (I also think that directly calling the relevant percent-encode operations in the URL standard and appending the resulting strings to a list which is then used as the path would be more precise and likely have less overall complexity.) |
For our use template expansion is not limited to just paths. The IFT spec allows them to expand to either relative or absolute URLs.
I did some more checking on what differences exist in which codepoints will be percent encoded when expanding the template (looking at literals only since we know the substitution values will only contain base64/base32 chars). If you intersect the allowed literal set with the set that are to required to be percent encoded you end up with 0x00-0x20 and everything >=0x7F. Which is a subset of all percent encoding sets used during URL parsing, and as a result I believe it's correct that these are always percent encoded. For everything else (0x21-0x7E) this effectively pushes percent encoding decisions onto the creator of the template. For example "?" should be percent encoded if part of a path segment, but would be un-encoded when delimiting the query string. The template creator would be responsible for ensuring that percent encoding has been correctly applied so that expansions result in valid URLs that point to the intended resources. |
I misunderstood. I think the problem @sisidovski highlights is real. At least it seems from https://w3c.github.io/IFT/Overview.html#patch-map-format-1 that one of the inputs is a URI template. That seems more problematic to me as that essentially requires a URI template implementation. |
- For #259 this significantly reduces the complexity of expansion implementation needed by clients. - Note that percent encoding must only produce upper case letters. - Add a note that implementations should consider a simple custom implementation of expansion over reusing a general purpose one.
- For #259 this significantly reduces the complexity of expansion implementation needed by clients. - Note that percent encoding must only produce upper case letters. - Add a note that implementations should consider a simple custom implementation of expansion over reusing a general purpose one.
- For #259 this significantly reduces the complexity of expansion implementation needed by clients. - Note that percent encoding must only produce upper case letters. - Add a note that implementations should consider a simple custom implementation of expansion over reusing a general purpose one.
- For #259 this significantly reduces the complexity of expansion implementation needed by clients. - Note that percent encoding must only produce upper case letters. - Add a note that implementations should consider a simple custom implementation of expansion over reusing a general purpose one.
- For #259 this significantly reduces the complexity of expansion implementation needed by clients. - Note that percent encoding must only produce upper case letters. - Add a note that implementations should consider a simple custom implementation of expansion over reusing a general purpose one.
- For #259 this significantly reduces the complexity of expansion implementation needed by clients. - Note that percent encoding must only produce upper case letters. - Add a note that implementations should consider a simple custom implementation of expansion over reusing a general purpose one.
Small update here, I've changed the IFT spec to utilize whatwg URL instead of rfc3986 and restricted template syntax to only level 1 which makes for a fairly simple expansion implementation (see: #263). I'll continue to watch the progress on whatwg/urlpattern#73 and once that functionality becomes available we can look into switching over to it instead of rfc6570 style templates. |
@annevk does that update resolve your comment? |
Browsers don't implement a URI parser, let alone a URI template parser. Instead they use https://url.spec.whatwg.org and https://urlpattern.spec.whatwg.org. I think it would be better to build upon those primitives.
The text was updated successfully, but these errors were encountered: