Skip to content

What to do about validation errors #156

@domenic

Description

@domenic

The URL standard is getting more sophisticated about validation errors. See e.g. whatwg/url#406.

There is also the issue where the URL standard contains two different ways of checking URL validity, which are probably not equivalent; see some discussion in whatwg/url#437 (comment). I'll call these "parser validity" and "writing validity", since the latter is derived from the URL writing section.

It would be very neat if this package could provide:

  • Whether a URL is writing-valid. (At least for "absolute-URL-with-fragment string"; maybe relative URLs are out of scope.)
  • Whether a URL is parser-valid, including the name of any validation error(s) triggered during parsing.
  • Surface this information in the live URL viewer.
  • A test corpus of valid/invalid URLs, maybe drawn from urltestdata.json. We should at least have one parser-invalid URL for every named parser validation error, and it'd also be good to have writing-invalid URLs derived from systematically violating every rule in the writing URLs section.
  • Bonus: a fuzzer that generates strings and tests if they are parser-valid but writing-invalid, or parser-invalid but writing-valid.

Right now we have some not-very-well-maintained code to track validation errors when the spec says to do so. It does this by setting urlStateMachine.parseError to true. ("parse error" is the old name for "validation error".) We'd probably want to update this to an array of strings (representing validation error names) that get pushed. And we should probably expose it on the URL record; that's #61. Or maybe elsewhere, since if you get a parsing failure, there's no URL record, so any fatal validation errors would be invisible this way?

We have some known places where we miss validation errors that the spec asks about; I think most of them are due to not implementing "is a URL code point", which #59 tried to address, but got stalled.


One reason why I was ambivalent about #59 is that it's a good bit of code, and potentially performance cost, for validity checking. At least one major consumer of this project, jsdom, prefers speed, and has no use for validity checking. It would also add memory consumption for every URL record, especially if as per the above we go beyond the simple boolean to instead be an array of strings.

On the other hand, there aren't many other consumers of this project for "production" use. Everyone else should probably be using the built-in URL constructor in their environment (either Node or the browser). And we're not exactly tuned for speed anyway; there is probably lower-hanging fruit we're ignoring.

We could address this in a number of ways, e.g.:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions