-
Notifications
You must be signed in to change notification settings - Fork 213
Namespace: explicit parsing errors #455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This PR requires the parser to throw an error if a (percent-encoded) solidus `/` is encountered in any path segment. Signed-off-by: Piotr P. Karwasz <[email protected]>
Originally posted by @matt-phylum in #452 (comment) |
johnmhoran
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ppkarwasz I'm not sure if you overlooked replacing "solidus" with "slash" or instead are insistent on using "solidus". I have no linguistic objection to that beautiful Latin-based word, but we don't use it anywhere else in the spec -- it's slashes all the way down -- and RFC 3986 similarly uses "slash" not "solidus". Some famous person somewhere once said something like "a foolish consistency is the hobgoblin of small minds", but imho consistency in the terms we use is a distinct advantage.
What do you think?
I didn't notice it. Fixed in 9bd2568 Regarding |
johnmhoran
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ppkarwasz -- LGTM!
|
PURL-SPECIFICATION.rst has been deprecated so this PR will need to be re-applied to /docs/how-to-parse.md. |
|
I refactored the PR to comply with the new file structure. |
| - Discard any empty segment from that split | ||
| - Percent-decode each segment | ||
| - UTF-8-decode each segment if needed in your programming language | ||
| - Report an error if any segment contains a slash `/` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for clarity - what is the purpose of this rule, and for reporting an error?
I'm thinking of a couple of things when I read this.
- Is this within the spirit of Postel's Law?
- Should it also require the parsing to stop/die/break/exit/produce an exception? ("Stop the parsing and report an error if any segment contains a
/") - Are there other bytes that should be illegal? E.g. disallow %00 (the null byte), since this also illegal in filenames?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main reason to report an error here is to protect consumers of PURLs from malformed or malicious input. By error I mean whatever the parser uses to signal a problem (exception, exit status, etc.).
Historically, many vulnerabilities in HTTP servers came from path traversal attacks where characters like . or / were smuggled in through alternative encodings (e.g. . as %2E, %C0%AE, %E0%80%AE, %F0%80%80%AE; / as %2F, %C0%AF, %E0%80%AF, %F0%80%80%AF). Allowing these in PURLs would create ambiguity and open the door to similar exploits.
In the PURL spec today:
- Empty segments (
//) are not meaningful, but are often an honest mistake in producers. The current parser recommendation is to normalize them away rather than fail. - Slash in a segment (
/) is never valid. Since the parse process splits on/before decoding, any literal/inside a segment must have been hidden behind percent-encoding, which is a strong signal of an attempt to “escape” the namespace. In this case, failing fast and surfacing an error is IMHO the safer and clearer choice.
This distinction matters because some ecosystems map PURLs directly to URLs. A PURL like:
pkg:golang/github.com/foo/..%2Fbar/artifact
could trick a consumer into resolving bar/artifact instead of foo/artifact if the parser silently accepts it. By requiring an error, the spec prevents that entire class of misinterpretation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Namespaces should not contain %2F because of the way PURL performs encoding and decoding, not because it is an illegal filename character on some operating systems. If you read the PURL pkg:generic/a%2Fb/c the a%2Fb cannot be represented and turns into a/b while parsing.
However, I don't think it really matters for namespaces and it's probably better not to do this. I guess this avoids potentially confusing parses like pkg:generic/a%2Fb/c%2Fd/e%2Ff having a namespace a/b/c/d and name e/f, and the unnecessary edge case about empty segments created by a previous rule about namespaces. I still believe that namespaces are a mistake that needs to be fixed by treating the part between the type and the version as an opaque path string, similar to how it works in URL, with the meaning defined by the package type. This new rule may be a step in the wrong direction because it forbids certain character sequences from being in that segment in a convoluted way. For example, pkg:golang has no namespace but the name often contains slashes, so if namespaces are eliminated then the path would still need to be something like github.com%2Fpackage-url%2Fexample for compatibility with namespace+name implementations, which would be allowed because there are no %2F characters followed by an unencoded / character. However, in some other package type (maybe pkg:swid), Acme A%2FB/Widgets would need to be forbidden because parsers implementing this proposed addition to the spec would see the %2F as being an illegal namespace character, making it complicated to deal with company names ending in "A/B".
Maybe it's too broken already and fixing namespaces would need to wait for a pkg2 that follows URL parsing semantics, parsing only from the left and not trying to apply special meaning to the path strings used by different backends.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main reason to report an error here is to protect consumers of PURLs from malformed or malicious input. By error I mean whatever the parser uses to signal a problem (exception, exit status, etc.).
Unlike subpaths, namespaces are not paths, and should not be blanket sanitized as if they are paths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Namespaces should not contain
%2Fbecause of the way PURL performs encoding and decoding, not because it is an illegal filename character on some operating systems. If you read the PURLpkg:generic/a%2Fb/cthea%2Fbcannot be represented and turns intoa/bwhile parsing.
per current spec, namespace segments MUST NOT contain %2F anyway (see grammar in #578) - and if they did, then the whole thing is not a valid PRUL. So far, there is no rule what to do if any forbidden chars occurred.
This is what this PR tries to fix: it adds a rule that expresses to report the error and fail the parsing all along. (in this case, Postel's Law must not be applied - fail and report - no "try to fix it" approach.)
This PR contains the part of #452 that is specific to
namespace.It requires the parser to throw an error if a (percent-encoded) solidus
/is encountered in any path segment.