Namespace: explicit parsing errors #455

ppkarwasz · 2025-04-10T09:43:13Z

This PR contains the part of #452 that is specific to namespace.

It requires the parser to throw an error if a (percent-encoded) solidus / is encountered in any path segment.

This PR requires the parser to throw an error if a (percent-encoded) solidus `/` is encountered in any path segment. Signed-off-by: Piotr P. Karwasz <[email protected]>

ppkarwasz · 2025-04-10T09:48:12Z

Does this simplify namespaces? This seems like it only makes namespaces more complicated. Unlike subpaths, namespaces do not care about . or .. segments (and to do so would be a breaking change) so there is no reason to treat %2F and / differently.

Originally posted by @matt-phylum in #452 (comment)

PURL-SPECIFICATION.rst

johnmhoran

@ppkarwasz I'm not sure if you overlooked replacing "solidus" with "slash" or instead are insistent on using "solidus". I have no linguistic objection to that beautiful Latin-based word, but we don't use it anywhere else in the spec -- it's slashes all the way down -- and RFC 3986 similarly uses "slash" not "solidus". Some famous person somewhere once said something like "a foolish consistency is the hobgoblin of small minds", but imho consistency in the terms we use is a distinct advantage.

What do you think?

ppkarwasz · 2025-04-15T20:13:26Z

I'm not sure if you overlooked replacing "solidus" with "slash" or instead are insistent on using "solidus".

I didn't notice it. Fixed in 9bd2568

Regarding solidus: it is the Unicode name of the character, but I don't really care about the naming as long as it is consistent (which as you pointed out was not).

johnmhoran

Thanks @ppkarwasz -- LGTM!

mjherzog · 2025-09-18T19:46:06Z

PURL-SPECIFICATION.rst has been deprecated so this PR will need to be re-applied to /docs/how-to-parse.md.

…-namespace

ppkarwasz · 2025-09-22T10:52:15Z

@mjherzog,

I refactored the PR to comply with the new file structure.

sjn · 2025-09-22T13:41:32Z

docs/how-to-parse.md

  - Discard any empty segment from that split
  - Percent-decode each segment
  - UTF-8-decode each segment if needed in your programming language
+  - Report an error if any segment contains a slash `/`


Just for clarity - what is the purpose of this rule, and for reporting an error?

I'm thinking of a couple of things when I read this.

Is this within the spirit of Postel's Law?

Should it also require the parsing to stop/die/break/exit/produce an exception? ("Stop the parsing and report an error if any segment contains a /")

Are there other bytes that should be illegal? E.g. disallow %00 (the null byte), since this also illegal in filenames?

The main reason to report an error here is to protect consumers of PURLs from malformed or malicious input. By error I mean whatever the parser uses to signal a problem (exception, exit status, etc.).

Historically, many vulnerabilities in HTTP servers came from path traversal attacks where characters like . or / were smuggled in through alternative encodings (e.g. . as %2E, %C0%AE, %E0%80%AE, %F0%80%80%AE; / as %2F, %C0%AF, %E0%80%AF, %F0%80%80%AF). Allowing these in PURLs would create ambiguity and open the door to similar exploits.

In the PURL spec today:

Empty segments (//) are not meaningful, but are often an honest mistake in producers. The current parser recommendation is to normalize them away rather than fail.

Slash in a segment (/) is never valid. Since the parse process splits on / before decoding, any literal / inside a segment must have been hidden behind percent-encoding, which is a strong signal of an attempt to “escape” the namespace. In this case, failing fast and surfacing an error is IMHO the safer and clearer choice.

This distinction matters because some ecosystems map PURLs directly to URLs. A PURL like:

pkg:golang/github.com/foo/..%2Fbar/artifact

could trick a consumer into resolving bar/artifact instead of foo/artifact if the parser silently accepts it. By requiring an error, the spec prevents that entire class of misinterpretation.

Namespaces should not contain %2F because of the way PURL performs encoding and decoding, not because it is an illegal filename character on some operating systems. If you read the PURL pkg:generic/a%2Fb/c the a%2Fb cannot be represented and turns into a/b while parsing.

However, I don't think it really matters for namespaces and it's probably better not to do this. I guess this avoids potentially confusing parses like pkg:generic/a%2Fb/c%2Fd/e%2Ff having a namespace a/b/c/d and name e/f, and the unnecessary edge case about empty segments created by a previous rule about namespaces. I still believe that namespaces are a mistake that needs to be fixed by treating the part between the type and the version as an opaque path string, similar to how it works in URL, with the meaning defined by the package type. This new rule may be a step in the wrong direction because it forbids certain character sequences from being in that segment in a convoluted way. For example, pkg:golang has no namespace but the name often contains slashes, so if namespaces are eliminated then the path would still need to be something like github.com%2Fpackage-url%2Fexample for compatibility with namespace+name implementations, which would be allowed because there are no %2F characters followed by an unencoded / character. However, in some other package type (maybe pkg:swid), Acme A%2FB/Widgets would need to be forbidden because parsers implementing this proposed addition to the spec would see the %2F as being an illegal namespace character, making it complicated to deal with company names ending in "A/B".

Maybe it's too broken already and fixing namespaces would need to wait for a pkg2 that follows URL parsing semantics, parsing only from the left and not trying to apply special meaning to the path strings used by different backends.

The main reason to report an error here is to protect consumers of PURLs from malformed or malicious input. By error I mean whatever the parser uses to signal a problem (exception, exit status, etc.).

Unlike subpaths, namespaces are not paths, and should not be blanket sanitized as if they are paths.

Namespaces should not contain %2F because of the way PURL performs encoding and decoding, not because it is an illegal filename character on some operating systems. If you read the PURL pkg:generic/a%2Fb/c the a%2Fb cannot be represented and turns into a/b while parsing.

per current spec, namespace segments MUST NOT contain %2F anyway (see grammar in #578) - and if they did, then the whole thing is not a valid PRUL. So far, there is no rule what to do if any forbidden chars occurred.
This is what this PR tries to fix: it adds a rule that expresses to report the error and fail the parsing all along. (in this case, Postel's Law must not be applied - fail and report - no "try to fix it" approach.)

Namespace: explicit parsing errors

aca3716

This PR requires the parser to throw an error if a (percent-encoded) solidus `/` is encountered in any path segment. Signed-off-by: Piotr P. Karwasz <[email protected]>

ppkarwasz mentioned this pull request Apr 10, 2025

Namespace/subpath: simplify parsing #452

Closed

jkowalleck added the PURL component: namespace label Apr 10, 2025

jkowalleck previously approved these changes Apr 10, 2025

View reviewed changes

jkowalleck requested a review from a team April 10, 2025 09:55

johnmhoran reviewed Apr 10, 2025

View reviewed changes

PURL-SPECIFICATION.rst Outdated Show resolved Hide resolved

jkowalleck requested a review from a team April 11, 2025 07:24

jkowalleck added this to the 1.0-draft milestone Apr 11, 2025

Replace Signal with Report

7a97932

ppkarwasz dismissed jkowalleck’s stale review via 7a97932 April 15, 2025 19:37

Merge branch 'main' into feat/path-parser-namespace

58baca8

johnmhoran reviewed Apr 15, 2025

View reviewed changes

Replace solidus -> slash

9bd2568

johnmhoran approved these changes Apr 15, 2025

View reviewed changes

jkowalleck approved these changes Apr 16, 2025

View reviewed changes

mjherzog modified the milestones: PURL-Spec v1.0, PURL-Spec v0.90 May 28, 2025

johnmhoran added the 2 medium priority label Jun 3, 2025

Merge remote-tracking branch 'package-url/main' into feat/path-parser…

b573897

…-namespace

jkowalleck requested a review from a team September 22, 2025 10:55

sjn reviewed Sep 22, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Namespace: explicit parsing errors #455

Namespace: explicit parsing errors #455

Uh oh!

ppkarwasz commented Apr 10, 2025

Uh oh!

ppkarwasz commented Apr 10, 2025

Uh oh!

Uh oh!

johnmhoran left a comment

Uh oh!

ppkarwasz commented Apr 15, 2025

Uh oh!

johnmhoran left a comment

Uh oh!

mjherzog commented Sep 18, 2025

Uh oh!

ppkarwasz commented Sep 22, 2025

Uh oh!

sjn Sep 22, 2025 •

edited

Loading

Uh oh!

ppkarwasz Sep 22, 2025

Uh oh!

matt-phylum Sep 22, 2025

Uh oh!

matt-phylum Sep 22, 2025

Uh oh!

jkowalleck Sep 22, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Namespace: explicit parsing errors #455

Are you sure you want to change the base?

Namespace: explicit parsing errors #455

Uh oh!

Conversation

ppkarwasz commented Apr 10, 2025

Uh oh!

ppkarwasz commented Apr 10, 2025

Uh oh!

Uh oh!

johnmhoran left a comment

Choose a reason for hiding this comment

Uh oh!

ppkarwasz commented Apr 15, 2025

Uh oh!

johnmhoran left a comment

Choose a reason for hiding this comment

Uh oh!

mjherzog commented Sep 18, 2025

Uh oh!

ppkarwasz commented Sep 22, 2025

Uh oh!

sjn Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ppkarwasz Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

matt-phylum Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

matt-phylum Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

jkowalleck Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

sjn Sep 22, 2025 •

edited

Loading

jkowalleck Sep 22, 2025 •

edited

Loading