-
Export the
Visibility
enum fromlrlex
. -
Ensure that lrpar rebuilds a grammar if its
visibility
is changed.
- Fix a handful of Clippy warnings.
- The
MF
andPanic
recoverers (deprecated, and undocumented, since 0.4.3) have been removed. Please change toRecoveryKind::CPCTPlus
(or, if you don't want error recovery,RecoveryKind::None
).
- The stategraph is no longer stored in the generated grammar, leading to useful savings in the generated binary size.
- The modules generated for compile-time parsing by lrlex and lrpar have
private visibility by default. Changing this previously required a manual
alias. The
visibility
function in lrlex and lrpar's compile-time builders allows a different visibility to be set (e.g.visibility(Visibility::Public)
. Rust has a number of visibility settings and theVisibility
enum
s in lrlex and lrpar reflect this.
lrlex
now uses aLexerDef
which all lexer definitions mustimpl
. This means that if you want to call methods on a concrete lexer definition, you will almost certainly need to importlrlex::LexerDef
. This opens the possibility that lrlex can seamlessly produce lexers other thanLRNonStreamingLexerDef
s in the future.
-
lrlex::NonStreamingLexerDef
has been renamed tolrlex::LRNonStreamingLexerDef
; use of the former is deprecated. -
The
lrlex::build_lex
function has been deprecated in favour ofLRNonStreamingLexerDef::from_str
.
- The statetable and other elements were previously included in the user binary
with
include_bytes!
, but this could cause problems with relative path names. We now include the statetable and other elements in generated source code to avoid this issue.
-
The
Lexer
trait has been broken into two:Lexer
andNonStreamingLexer
. The former trait is now only capable of producingLexeme
s: the latter is capable of producing substrings of the input and calculating line/column information. This split allows the flexibility to introduce streaming lexers in the future (which will not be able to produce substrings of the input in the same way as aNonStreamingLexer
).Most users will need to replace references to the
Lexer
trait in their code toNonStreamingLexer
. -
NonStreamingLexer
takes a lifetime'input
which allows the input to last longer than theNonStreamingLexer
itself.Lexer::span_str
andLexer::span_lines_str
had the following definitions:fn span_str(&self, span: Span) -> &str; fn span_lines_str(&self, span: Span) -> &str;
As part of
NonStreamingLexer
their definitions are now:fn span_str(&self, span: Span) -> &'input str; fn span_lines_str(&self, span: Span) -> &'input str;
This change allows users to throw away the
Lexer
but still keep around structures (e.g. ASTs) which reference the user's input.rustc infers the
'input
lifetime in some situations but not others, so if you get an error:error[E0106]: missing lifetime specifier
then it is likely that you need to change a type from
NonStreamingLexer
toNonStreamingLexer<'input>
.
-
Fix two Clippy warnings and suppress two others.
-
Prefer "unmatched" rather than "unknown" when using the "turn lexing errors into parsing errors" trick.
-
Deprecate
Lexeme::len
,Lexeme::start
, andLexeme::end
. Each is now replaced byLexeme::span().len()
etc. An appropriate warning is generated if the deprecated methods are used. -
Avoid use of the unit return type in action code causing Clippy warnings.
-
Document the "turn lexing errors into parsing errors" technique and extend
lrpar/examples/calc_ast
to use it.
-
Introduce the concept of a
Span
which records what portion of the user's input something (e.g. a lexeme or production) references. Users can turn aSpan
into a string through theLexer::span_str
function. This has several API changes:lrpar
now exports aSpan
type.Lexeme
s now have afn span(&self) -> Span
function which returns theLexeme
's `Span.Lexer::span_str
replacesLexer::lexeme_str
function. Roughly speaking this:becomes:let s = lexer.lexeme_str(&lexeme);
let s = lexer.span_str(lexeme.span());
Lexer::line_col
now takes aSpan
rather than ausize
and, since aSpan
can be over multiple lines, returns((start line, start column), (end line, end column))
.Lexer::surrounding_line_str
is removed in favour ofspan_lines_str
which takes aSpan
and returns a (possibly multi-line)&str
of the lines containing thatSpan
.- The
$span
special variable now returns aSpan
rather than(usize, usize
).
In practise, this means that in many cases where you previously had to use
Lexeme<StorageT>
, you can now useSpan
instead. This has two advantages. First, it simplifies your code. Second, it enables better error reporting, as you can now point the user to a span of text, rather than a single point. See the (new) AST evaluator section of the grmtools book for an example of how code usingSpan
looks. -
The
$span
special variable is now enabled at all times and no longer needs to be turned on withCTBuilder::span_var
. This function has thus been removed.
-
If called as a binary, lrlex now exits with a return code of 1 if it could not lex input. This matches the behaviour of lrpar.
-
Module names in generated code can now be optionally configured with
mod_name
. The names default to the same naming scheme as before. -
Fully qualify more names in generated code.
-
lrlex_mod
andlrpar_mod
now take strings that match the paths ofprocess_file_in_src
. In other words what was:... .process_file_in_src("a/b/grm.y"); ... lrpar_mod!(grm_y);
is now:
... .process_file_in_src("a/b/grm.y"); ... lrpar_mod!("a/b/grm.y");
and similarly for
lrlex_mod
. This is hopefully easier to remember and also allows projects to have multiple grammar files with the same name. -
The
Lexer
API no longer requires mutability. What was:trait Lexer { fn next(&mut self) -> Option<Result<Lexeme<StorageT>, LexError>>; fn all_lexemes(&mut self) -> Result<Vec<Lexeme<StorageT>>, LexError> { ... } ... }
has now been replaced by an iterator over lexemes:
trait Lexer { fn iter<'a>(&'a self) -> Box<dyn Iterator<Item = Result<Lexeme<StorageT>, LexError>> + 'a>; ... }
This enables more ergonomic use of the new zero-copy feature, but does require changing structs which implement this trait.
lrlex
has been adjusted appropriately.In practise, the only impact that most users will notice is that the following idiom:
let (res, errs) = grm_y::parse(&mut lexer);
will produce a warning that the
mut
part of&mut
is no longer needed.
-
Add support for zero-copying user input when parsing. A special lifetime
'input
is now available in action code and allows users to extract parts of the input without callingto_owned()
(or equivalent). For example:Name -> &'input str: 'ID' { $lexer.lexeme_str(&$1.map_err(|_| ())?) } ;
See
lrpar/examples/calc_ast/src/calc.y
for a more detailed example.
-
Generated code now uses fully qualified names so that name clashes between user action code and that created by grmtools is less likely.
-
Action types can now be fully qualified. In other words this:
R -> A::B: ... ;
means that the rule
R
now has an action typeA::B
.
-
Deprecate the MF recoverer: CPCT+ is now the default and MF is now undocumented. For most people, CPCT+ is good enough, and it's quite a bit easier to understand. In the longer term, MF will probably disappear entirely.
-
License as dual Apache-2.0/MIT (instead of a more complex, and little understood, triple license of Apache-2.0/MIT/UPL-1.0).
- Action code uses
$
as a way of denoting special variables. For example, the pseudo-variable$2
is replaced with a "real" Rust variable by grmtools. However, this means that$2
cannot appear in, say, a string without being replaced. This release uses$$
as an escaping mechanism, so that one can write code such as"$$1"
in action code; this is rewritten to"$1"
by grmtools.
- Newer versions of rustc produce "deprecated" warnings when trait objects are
used without the
dyn
keyword. This previously caused a large number of warnings in generated grammar code fromlrpar
. This release ensures that generated grammar code uses thedyn
keyword when needed, removing such warnings.
-
Lexeme::empty()
has been renamed toLexeme::inserted()
. Although rare, there are grammars with empty lexemes that aren't the result of error recovery (e.g. DEDENT tokens in Python). The previous name was misleading in such cases. -
Lexeme insertion is no longer explicitly encoded in the API for lexemes end/length. Previously these functions returned
None
if a lexeme had been inserted by error recovery. This has proven to be more effort than it's worth with variants on the idiomlexeme.end().unwrap_or_else(|| lexeme.start())
used extensively. These definitions have thus been simplified, changing from:pub fn end(&self) -> Option<usize> pub fn len(&self) -> Option<usize>
to:
pub fn end(&self) -> usize pub fn len(&self) -> usize
- A new pseudo-variable
$span
can be enabled within parser actions ifCTBuilder::span_var(true)
is called. This pseudo-variable has the type (usize, usize) where these represent (start, end) offsets in the input and allows users to determine how much input a rule has matched.
- Some dynamic assertions about the correct use of types have been converted to static assertions. In the unlikely event that you try to run grmtools on a platform with unexpected type sizes (which, in practise, probably only means 16 bit machines), this will lead to the problems being determined at compile-time rather than run-time.
- Document lrpar more thoroughly, in particular hiding the inner modules, whose location might change one day in the future: all useful structs (etc.) are explicitly exposed at the module level.
-
Have the
process_file
functions in bothLexerBuilder
andCTParserBuilder
place output into a named file (whereas previouslyCTParserBuilder
expected a directory name). -
Rename
offset_line_col
toline_col
and have the latter return character offsets (whereas before it returned byte offsets). This makes the resulting numbers reported to humans much less confusing when multi-byte UTF-8 characters are used.
-
Add
surrounding_line_str
helper function to lexers. This is helpful when printing out error messages to users. -
Add a comment with rule names when generating grammars at compile-time. Thus if user action code contains an error, it's much easier to relate this to the appropriate point in the
.y
file.
- Documentation fixes.
-
Previously users had to specify the
YaccKind
of a grammar and then theActionKind
of actions. This is unnecessarily fiddly, so removeActionKind
entirely and instead flesh outYaccKind
to deal with the possible variants. For exampleActionKind::CustomAction
is now, in essence,YaccKind::Original(YaccOriginalActionKind::UserAction)
. This is a breaking change but one that will make future evolution much easier. -
The
%type
directive in grammars exposed by YaccKind::Original(YaccOriginalActionKind::UserAction) has been renamed to%actiontype
to make it clear what type is being referred to. In general, most people will want to move to theYaccKind::Grmtools
variant (see below), which doesn't require the%actiontype
directive.
-
grmtools has moved to the 2018 edition of Rust and thus needs rustc-1.31 or later to compile.
-
Add
YaccKind::Grmtools
variant, allowing grammar rules to have different action types. For most practical use cases, this is much better than using%actiontype
. -
Add
%avoid_insert
directive to bias ranking of repair sequences and make it more likely that parsing can continue.
-
Add
-q
switch tonimbleparse
to suppress printing out the stategraph and conflicts (some grammars have conflicts by design, so being continually reminded of it isn't helpful). -
Fix problem where errors which lead to vast (as in hundreds of thousands) of repair sequences being found could take minutes to sort and rank.
-
Add
YaccKind::Original(YaccOriginalActionKind::NoAction)
variant to generate a parser which simply tells the user where errors were found (i.e. no actions are executed, and not even a parse tree is created). -
lrlex
no longer tries to create Rust-level identifiers for tokens whose names can't be valid Rust identifiers (which led to compile-time syntax errors in the generated Rust code).
- Fix bug where
%epp
strings with quote marks in caused a code-generation failure in compile-time mode.
First stable release.