Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for more granular route matching in Spin #1923

Closed
karthik2804 opened this issue Oct 23, 2023 · 19 comments · Fixed by #2464
Closed

Allow for more granular route matching in Spin #1923

karthik2804 opened this issue Oct 23, 2023 · 19 comments · Fixed by #2464

Comments

@karthik2804
Copy link
Contributor

Currently, the route matching in spin can be of two forms

# match a single route
route = "/{path}"

# match any route with the prefix
route = "/{path-prefix}/..."

It would be helpful in certain cases if the route matching allowed more granularity. For an example consider the case of a Bartholomew app with a proxy component.

[[component]]
id = "bartholomew"
[component.trigger]
route = "/..."

[[component]]
id = "proxy"
[component.trigger]
route = "/spin/*"

With such a configuration, the proxy component would match the route /spin/index but not /spin/v1/index. Thus allowing the proxy to handle one specific subset of the routes while bartholomew handles the rest.

There is potential for another level of granularity where we could define something like

route = "/spin/*/test"

Which would only match /spin/index/test but not /spin/index/something-else

Thoughts?

@lann
Copy link
Collaborator

lann commented Oct 23, 2023

I'm not necessarily opposed to this but just want to note that V2 manifests help with a workaround in some (not all!) situations:

[[trigger.http]]
route = "/..."
component = "bartholomew"

[[trigger.http]]
route = "/spin/..."
component = "proxy"

[[trigger.http]]
route = "/spin/v1/..."
component = "bartholomew"

@karthik2804
Copy link
Contributor Author

@lann that works in that specific case but if someone adds a /spin/v2 as well, then we would need to add another definition in the manifest. So I would imagine it would not scale very well.

@karthik2804
Copy link
Contributor Author

@lann one other practical issue is that it increases the memory usage since the extra component mounts end up creating their own copy of the static files

@lann
Copy link
Collaborator

lann commented Oct 27, 2023

With v2 manifests that shouldn't happen if you point two triggers at one component. If mounts are duplicated in that case it's a bug.

@karthik2804
Copy link
Contributor Author

karthik2804 commented Oct 27, 2023

Oh ok - I was testing with just v1 and spin v1.5.1. I was creating multiple components, so that makes sense.

@vdice vdice added this to Spin Triage Nov 8, 2023
@vdice vdice moved this to 🆕 Triage Needed in Spin Triage Nov 8, 2023
@vdice
Copy link
Member

vdice commented Nov 8, 2023

@karthik2804 did Spin v2 and the v2 manifest help here? I'm assuming we'd still like this issue to track nested wildcard route support (eg route = "/spin/*/test" mentioned above)?

@vdice vdice moved this from 🆕 Triage Needed to 📋 Investigating / Open for Comment in Spin Triage Nov 8, 2023
@itowlson
Copy link
Contributor

Do we have a good sense of what we want from this feature? It seems like the ask is specifically for a kind of "single level" wildcard, i.e. like /... except only up to the next / (if there is one). Is that the full extent of it?

Some challenges I see here are:

  • Do we need to surface the "matched" segment(s) in the same way we surface /... tails via spin-path-info?
  • How does the "longest match" rule translate e.g. if I have /a/b/*/d and /a/*/c/d then which one should handle /a/b/c/d?

Any other design considerations?

@karthik2804
Copy link
Contributor Author

The use case I have had requires only a single level of wildcard but based on @lann's workaround above, it could be done by adding multiple triggers.

  • I would expect the spin-path-info to be exposed as it is today. I am not sure I follow the question about "matched" segments.
  • I do not have a solid answer for the longest match rule translation.

@itowlson
Copy link
Contributor

@karthik2804 Sorry I maybe confused the question by giving the spin-path-info analogy - it was intended only as analogy. Let me rephrase:

Suppose I have a trigger /a/*/b. Suppose a request matches that trigger, and my component runs. My component wants to know what the * was. (For example, if the user visited /a/gorilla/b, I want the gorilla part.) How does it get it?

(Potentially nastier is the multiple case of /user/*/note/*. Now from /user/itowlson/note/4067 my component wants to know that the wildcards were itowlson and 4067.)

@itowlson
Copy link
Contributor

@karthik2804 Is your comment about the multiple trigger workaround suggesting that this is no longer important (at least to you) and could be dropped from the 2.1 list? I am not sure if we have any other users asking after this yet.

@karthik2804
Copy link
Contributor Author

karthik2804 commented Nov 15, 2023

@itowlson, it was in reference to the following

It seems like the ask is specifically for a kind of "single level" wildcard, i.e. like /... except only up to the next / (if there is one). Is that the full extent of it?

The workaround above solves the issue I ran into (I have yet to test).

On the sharing multiple matched segments, could add a new header spin-matched-segments: "itowlson,4067" where the segments are separated by commas. URL paths can contains commas.

@itowlson
Copy link
Contributor

I chatted with Karthik offline and some alternative ideas for capturing * values could be:

  • A CSV / multi-instance header, with commas in values encoded if present (which should be rare).
  • Allow/require names for wildcards e.g. /user/*username/note/*noteid giving rise to separate spin-something-username and spin-something-noteid headers.

@diversable
Copy link

  • Allow/require names for wildcards e.g. /user/*username/note/*noteid giving rise to separate spin-something-username and spin-something-noteid headers.

FYI: From a customer perspective, this would be a really useful feature to have - personally, I would vote for named wildcards, but I'd be happy with any workable solution for enabling more powerful wildcard paths 😁

@seungjin
Copy link

seungjin commented Mar 18, 2024

Regular Expressions are The Swiss Army Knife of Text Processing.

Flexible routing structure can give me more flexible / micro-sized modularization which I can split my app into more smaller wasms. Isn't that the recommended direction for wasm architecturing?

@lann
Copy link
Collaborator

lann commented Mar 18, 2024

Given that #2305 is nearing completion, one thing I'd like to explore is using that service chaining feature to enable advanced routing scenarios via a routing component, especially for something like regular expressions which may not be appropriate for all users.

@itowlson
Copy link
Contributor

I started exploring this and I am getting very hung up on potential ambiguities. For example, if the app has routes /a/*/c and /a/b/..., which one does /a/b/c match? I started going with a "longest initial exact segment takes precedence" rule, which runs into tiebreaking problems, but those are fixable. But then I started thinking "exact trailing segment should trump wildcard trailing segment".

And then I started thinking "what if we just reject applications with this kind of ambiguous overlap", requiring any pair of routes to be either disjoint or in a subset relationship. Because if you have /a/*/c and /a/b/... then you probably have problems that you need to address before the ambiguous request comes in. And this seems like a reaonable rule if it's easy to implement, but I bet it will turn out to be equivalent to the halting problem because everything always does. And it leaves people who really want to express a / anything-but-b / c stuffed unless we extend the pattern language further (hello regular expressions, hello unbounded evaluation times).

Another possibility, which lines up with some programmatic routers, is to make it order-dependent: first (or last) match wins. Pre-1.0 Spin had that rule, and we replaced it with longest match because it provides more reliability as manifests change (e.g. triggers getting tacked on the end via spin add). We could bring that back for the specific case of "best match is not well defined" but I'm really reluctant to do that because who knows what well-defined means in this case.

Finally, the original issue asked for single-segment match only in a trailing position, e.g. a * suffix to go alongside the ... one. That avoids any ambiguity, but falls short of what others in the thread have been talking about.

I feel like I'm overthinking this because lots of routers just do this stuff, so, save me from myself folks: how should this behave?

@itowlson itowlson added the cursed cursed issue label Apr 19, 2024
@fibonacci1729
Copy link
Contributor

fibonacci1729 commented Apr 19, 2024

Could we lean into routefinder here? If we introduce named parameters (e.g. :planet) it seems like we could establish a mapping to the syntax supported by routefinder (e.g. /... => /*). This would appear to cover the usecases above.

Also, I'm not a big fan of supporting ... and *; historically I've seen them both used to mean "wildcard" and dread the confusion to inevitably plague our route dx.

The question then is how do we surface named captures to a guest? Immediately I'm not opposed to propagating captures as spin-route-param headers (or something of the sort) as mentioned earlier.

The major benefit of creating an isomorphism to routerfinder (or similar syntax) is that we can take advantage of the ranking and ordering of routes to handle all the ambiguities mentioned.

@lann
Copy link
Collaborator

lann commented Apr 19, 2024

I like routefinder's precedence rules and am similarly wary of using * since its meaning varies so much (any single segment vs any number of segments vs any number of trailing segments).

I'd be fine with e.g. /:param/... (disallowing * as an input) as the next step here. Given that we already have a bunch of magic spin-* headers I suppose something like spin-param-<param> is probably fine.

@itowlson
Copy link
Contributor

Yeah sorry * was not a syntax proposal it was a way of expressing the ambiguity. The ambiguity, not the syntax, is the problem.

I will look at routefinder precedence: thanks.

@itowlson itowlson removed the cursed cursed issue label Apr 28, 2024
@itowlson itowlson moved this from Nice to have to 🏗 In progress in Spin 2.5 Release - April Release Apr 28, 2024
@github-project-automation github-project-automation bot moved this from 📋 Investigating / Open for Comment to ✅ Done in Spin Triage Apr 29, 2024
@melissaklein24 melissaklein24 moved this from 🏗 In progress to ✅ Done in Spin 2.5 Release - April Release May 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Status: Done
Development

Successfully merging a pull request may close this issue.

7 participants