Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a way to specify error value when failing to match #36

Open
osa1 opened this issue Nov 23, 2021 · 0 comments
Open

Provide a way to specify error value when failing to match #36

osa1 opened this issue Nov 23, 2021 · 0 comments
Labels
design feature New feature or request

Comments

@osa1
Copy link
Owner

osa1 commented Nov 23, 2021

This is related to #35 and we use the same example.

Suppose in b"\xa" I want to fail with "invalid hex escape".

With a "cut" operator as described in #35 the best we can have in a concise way is an "invalid token" error.

To raise a "invalid hex escape" error we need to use new rules. For example:

rule ByteString {
    "\\x" => |lexer| lexer.switch(LexerRule::ByteStringHexEscape),

    ($ascii_for_string | $byte_escape | $string_continue | "\r\n")* '"' => |lexer| {
        let match_ = lexer.match_();
        lexer.switch_and_return(LexerRule::Init, Token::Lit(Lit::ByteString(match_)))
    },
}

rule ByteStringHexEscape {
    $hex_digit $hex_digit => |lexer| lexer.switch(LexerRule::ByteString),
    $ | _ | _ _ =? |lexer| lexer.return_(Err(CustomError::InvalidHexEscape)),
}

The new rule ByteStringHexEscape matches two hex digits, and fails with InvalidHexEscape on everything else. Note that we don't want to match more than two characters here, so we have cases for end-of-stream ($), one character (_) and two characters (_ _). We can't do something like _* because that would match \xaaaa and fail with InvalidHexEscape.

(This is a case where a syntax for mathing between given numbers of occurrences would be useful, e.g. _{0,2} would expand to $ | _ | _ _. alex has this feature.)

It would be good to have a more concise way of failling with a given error. For example, in the definition of byte_escape:

let byte_escape = ("\\x" $hex_digit $hex_digit) | "\\n" | "\\r" | "\\t" | "\\\\" | "\\0" | "\\\"" | "\\'";

Maybe we could have something like:

let byte_escape = ("\\x" !CustomError::InvalidHexEscape $hex_digit $hex_digit)
            | "\\n" | "\\r" | "\\t" | "\\\\" | "\\0" | "\\\"" | "\\'";

where ! is the cut operator as described in #35, but when the match fails, instead of InvalidToken we now raise InvalidHexEscape.

One question is whether we also want a syntax for specifying the error value, without also adding a "cut". So far I didn't need this.

@osa1 osa1 added feature New feature or request design labels Nov 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design feature New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant