What to do about validation errors

The URL standard is getting more sophisticated about validation errors. See e.g. https://github.com/whatwg/url/issues/406.

There is also the issue where the URL standard contains two different ways of checking URL validity, which are probably not equivalent; see some discussion in https://github.com/whatwg/url/issues/437#issuecomment-488022818. I'll call these "parser validity" and "writing validity", since the latter is derived from the [URL writing](https://url.spec.whatwg.org/#url-writing) section.

It would be very neat if this package could provide:

* Whether a URL is writing-valid. (At least for "absolute-URL-with-fragment string"; maybe relative URLs are out of scope.)
* Whether a URL is parser-valid, including the name of any validation error(s) triggered during parsing.
* Surface this information in the live URL viewer.
* A test corpus of valid/invalid URLs, maybe drawn from urltestdata.json. We should at least have one parser-invalid URL for every named parser validation error, and it'd also be good to have writing-invalid URLs derived from systematically violating every rule in the writing URLs section.
* Bonus: a fuzzer that generates strings and tests if they are parser-valid but writing-invalid, or parser-invalid but writing-valid.

---

Right now we have some not-very-well-maintained code to track validation errors when the spec says to do so. It does this by setting `urlStateMachine.parseError` to true. ("parse error" is the old name for "validation error".) We'd probably want to update this to an array of strings (representing validation error names) that get pushed. And we should probably expose it on the URL record; that's #61. Or maybe elsewhere, since if you get a parsing failure, there's no URL record, so any fatal validation errors would be invisible this way?

We have some known places where we miss validation errors that the spec asks about; I think most of them are due to not implementing "is a URL code point", which #59 tried to address, but got stalled.

---

One reason why I was ambivalent about #59 is that it's a good bit of code, and potentially performance cost, for validity checking. At least one major consumer of this project, jsdom, prefers speed, and has no use for validity checking. It would also add memory consumption for every URL record, especially if as per the above we go beyond the simple boolean to instead be an array of strings.

On the other hand, there aren't many other consumers of this project for "production" use. Everyone else should probably be using the built-in `URL` constructor in their environment (either Node or the browser). And we're not exactly tuned for speed anyway; there is probably lower-hanging fruit we're ignoring.

We could address this in a number of ways, e.g.:

- Take the code size hit, and benchmark the speed hit. (Probably after #155 gets us some basic benchmarks.) Maybe it's fine for jsdom's purposes.

- Use source code rewriting to prune out validity-related parts of the parser code, so that we end up with two output `.js` files, e.g. `url-state-machine-with-validy-checking.js` and `url-state-machine.js`. (Related: #153, and maybe #155 because if we remove the `p()` source code rewriting then this would just be adding some back.) This would result in a larger npm package size but a smaller loaded-into-memory size.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

What to do about validation errors #156

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

What to do about validation errors #156

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions