- Write it.
- Applying a
Semantics
to aFailure
should throw an exception.
- Improve error message when calling makeGrammar() on a source that has multiple grammars defined.
- Use Grammars and not Grammar as the start rule, and throw an error if != 1 grammar is produced.
var g = ohm.grammarFromScriptElement(/* optional element */);
var ns = ohm.grammarsFromScriptElements(/* optional element or node list */);
When you don't pass in the optional element / node list argument, these methods are sugars for:
ohm.grammar(document.querySelector(... script tag w/ language = text/ohm-js ...))
and
ohm.grammars(document.querySelectorAll(... script tags w/ language = text/ohm-js ...))
respectively.
-
Get rid of the 3rd argument to
Grammar.prototype.match
. -
match
's return value will be a CST node (for successful matches) or aFailure
object (for failed matches). -
Both
Node
s andFailure
s will support anisFailure()
method. (The former will always returnfalse
, the lattertrue
.) -
Failures
will hold on to theState
object so that we can lazily re-parse the input to collect "expected" strings / perps, etc. This should be done if / when the programmer asks for this information. -
Failure objects should only doing the second parse (to generate error info) on demand.
-
(We'll need to design the API of
Failure
s. Let's try to think about this while Pat's still in town.)
var s = g.semantics()...;
var ans = g.match('...');
if (ans.isFailure()) {
// deal with it
} else {
s(ans).doIt();
}
-
The unit tests are a mess right now. They were pretty good early on, but the language has been changed a lot since then. We should spend a couple of days cleaning up the unit tests. E.g.,
- Now that we have CSTs, there's no reason to check acceptance and semantic actions separately for each kind of
PExpr
. We should just compare the result ofGrammar.prototype.match
with the expected CST. (We may have to do some work to getNode.equals(anotherNode)
to work.) - ...
- Now that we have CSTs, there's no reason to check acceptance and semantic actions separately for each kind of
-
Check for superfluous properties in action dicts, i.e., properties that do not correspond to a rule in the grammar. This will have to be done in methods like
addOperation
,addAttribute
,extendOperation
, andextendAttribute
.
-
When you declare a new grammar that doesn't explicitly inherit from another, the first rule of that grammar is its start rule.
-
Once we have start rules, you should be able to omit the 2nd argument to
Grammar.prototype.match
.
Grammar.prototype.createSemantics(parentSemantics)
creates a new instance of Semantics
that inherits all of the operations and attributes from parentSemantics
. Note that all of the inherited operations and attributes that haven't been extend
ed explicitly must go through the arity and superfluous method checks the first time any of the operations or attributes of the "child" Semantics
is used.
Semantics.prototype.extend(name, dict)
should throw an error if:
- (I've thought about this, and decided it wouldn't be very nice to use. So let's not worry about it for now. -- Alex) The operation / attribute was created with the optional second argument
{ exhaustive: true}
and it's not exhaustive, i.e., it doesn't have a case for each type of CST. - The receiver did not inherit an operation or attribute called
name
from its parent. - The operation or attribute called
name
has already beenextend
ed in the receiver.
Semantics.prototype.addOperation(name, dict)
,
Semantics.prototype.addSynthesizedAttribute(name, dict)
, and
Semantics.prototype.addInheritedAttribute(name, dict)
should throw an error if:
- The receiver already has an operation or attribute with the same name.
- One or more of
dict
's methods are superfluous (i.e., do not correspond to a rule in the receiver's grammar) or have the wrong arity. - It should also be an error to try to declare a new operation or attribute whose name is
node
(see below).
To enable extensibilty, operations and attributes should always belong to an instance of Semantics
. Here's how this will work:
var g1 = ohm.grammar(...);
var s1 = g1.semantics()
.addOperation('eval', {
AddExp_plus: function(x, _, y) {
return x.eval() + y.eval();
},
...
});
Note that Semantics.prototype.addOperation(name, dict)
returns the receiver to allow chaining. (The same goes for Semantics.prototype.addSynthesizedAttribute
and Semantics.prototype.addInheritedAttribute
.)
The Semantics
objects act as a family of operations and attributes. Recursive (even mutually-recursive) uses of operations / attributes go through "wrapper objects" that hold a reference to an instance of Semantics
as well as a CST node, which may be accessed via the wrapper's node
property. (This avoids the problems I was having with early-binding in recursive calls, which were getting in the way of extensibility.) Operations are called as methods of the wrapper objects, while synthesized attributes are accessed as properties.
- Here's how you access / use the operations and attributes defined by a
Semantics
:
var s = g.semantics()...;
var cst = g.match(...);
// Now we "wrap" the CST and access the functionality that's defined in the Semantics obj.
s(cst).value; // attribute
s(cst).doIt(); // operation
...
To extend an operation or an attribute, you create a new Semantics
object that extends the Semantics
the the operation or attribute in question belongs to. You do this by passing the Semantics
that you want to extend as an argument to your grammar's createSemantics
method. Then you call extend(operationOrAttributeName, dict)
on that. E.g.,
var g2 = ... // some grammar that extends g1
var s2 = g2.semantics(s1)
.extend("eval", {
AddExp_foo: function(x, _, y) { ... }
});
A derived Semantics
instance -- i.e., one that is created by passing an existing Semantics
to Grammar.prototype.semantics()
-- automatically inherits all of the operations and attributes from the parent Semantics
.
E.g.,
G {
ListOf<elem, sep>
= elem (sep elem)* -- some
| -- none
PriExp
= ...
| ListOf<Exp, ','>
}
Notes:
- Using
<>
s instead of()
s so that we don't have to make whitespace significant, e.g., in OMeta,foo(bar)
was a rule application andfoo (bar)
was just afoo
followed by abar
-- this was very confusing to some people. - The things inside the
<>
s -- i.e., the arguments passed to parameterized rules -- are unevaluated parsing expressions. The arity of a parameter expression must be equal to 1. They are evaluated when they're used inside the parameterized rule. So in some sense the lexical scope of the body of a parameterized rule is different than that of a non-parametrized rule in the same grammar. We could implement this by- using a specialized version of the grammar where the argument names correspond to rule that when called will evaluate those parsing expressions, or
- keeping track of the names of the arguments so that we can do something special about them in code generation.
- You have to write semantic actions / attribute definitions for parameterized rules, just like any other rules.
Maybe now the stuff that's done in src/bootstrap.js
can be done for any grammar? That would enable people to share grammar as "binaries". (A while ago I had an ohm
command, but I removed it b/c it didn't support inheritance properly. That command turned into the less general but essential src/bootstrap.js
.)
- We should be able to (optionally) specify the number of arguments that are required by an operation.
- We'll have to take these into account when doing the arity checks.
- Add support for inherited attributes.
- A couple of changes: (i) get rid of the
$idx
syntax and use arrays instead, and (ii) get rid of the attribute'sset
method, just use return values instead. E.g.,
var s = g.semantics().addInheritedAttribute('foo', {
_base: function(child) { ... },
AddExpr_plus: [
function(x) {
return this.childAfter(x).foo() + 1;
},
ohm.actions.notSupported, // this is for the + or - operator
function(y) {
return 123;
}
],
_nonterminal: function(child) { ... }
});
// Client-side:
var node = g.match(...);
s(node).foo();
-
A refactoring that makes
parent
a synthesized attribute (right now it's just a semantic action that side-effects the CST nodes) -
Pat's visualizer / omniscient debugger
-
An IDE for Ohm
- Integrate Pat's visualizer
- Built-in support for unit testing grammars (re-run unit tests while programmer changes the grammar, show coverage, etc.)
- Automatic generation of random valid inputs
-
Better support for attribute grammars
- Take another pass at the API for writing inherited attributes
- Akira's editor?
-
Using persistent data structures (ImmutableJS?) for parsing contexts, so that adding that extra key to the memo table won't be so expensive.
-
Incremental parsing ala Papa Carlo?
Implement Alex's de-spockification idea.
In the following unit tests, I've added suffixes to each rule application so that each one can be written unambiguously in the "stacks". This matters b/c we want to know what the "continuation" of a recorded failure would have been -- which means that we must know which application of a rule r
caused the failure, if there is more than one.
start = addExp#1
addExp = addExp#2 "+" mulExp#1 | mulExp#2
mulExp = mulExp#3 "*" priExp#1 | priExp#2
priExp = number
If the input is "1-"
, the failed primitives recorded at position 1
should be:
"*"
, with stackstart > addExp#1 > mulExp#2
"+"
, with stackstart > addExp#1
The expected set should be {"*"
, "+"
}.
For more info, memo @ 0
:
start
-> FAIL, w/ failures@1
:"+"
,addExp#1
"*"
,addExp#1 > mulExp#2
addExp
-> SUCCESS@1
w/ failures@1
:"+"
, (empty stack)"*"
,mulExp#2
mulExp
-> SUCCESS@1
w/ failures@1
:"*"
, (empty stack)
|::|::|::|
| subsumes | ("+"
, start > addExp#1
) | ("*"
, start > addExp#1 > mulExp#2
) |
| +
| N | N |
| *
| N | N |
| +
or *
| N | N |
start = addExp#1 ";"
addExp = addExp#2 "+" mulExp#1 | mulExp#2
mulExp = mulExp#3 "*" priExp#1 | priExp#2
priExp = number
If the input is "1"
, the failed primitives recorded at position 1
should be:
"*"
, with stackstart > addExp#1 > mulExp#2
"+"
, with stackstart > addExp#1
";"
, with stackstart
The expected set should be {";"
}.
|::|::|::|::|
| subsumes | (";"
, start
) | ("+"
, start > addExp#1
) | ("*"
, start > addExp#1 > mulExp#2
) |
| ;
| N | Y | Y |
| +
| N | N | N |
| *
| N | N | N |
| ;
, +
or *
| N | Y | Y |
start = a#1 b#1 | b#2 a#2
a = "a"
b = "b"
If the input is ""
, the failed primitives recorded at position 0
should be:
"a"
, with stackstart > a#1
"b"
, with stackstart > b#2
The expected set should be {"a"
, "b"
}.
start = a#1 b#1 | a#2 | b#2 a#3
a = "a"
b = "b"
If the input is ""
, the failed primitives recorded at position 0
should be:
"a"
, with stacksstart > a#1
andstart > #a2
"b"
, with stackstart > #b2
The expected set should be {"a"
, "b"
}.
TODO: I added this example because it's important to consider the case when a terminal can be reached in different ways (i.e., w/ different stacks). Unfortunately in this case that doesn't make a difference. I'd like to find an example where it does.
Straw man: when a terminal has more than one stack in a FailureDescriptor
, we should only exclude it from the expected set if there is one or more entry (terminal) that subsumes it, i.e., considering all of its stacks / continuations. To say it in a different way: the subsuming terminal must be on the path to acceptance considering all of the stacks.
Start = c#1 "sucks" | c#2 "stinks"
c = "C" "++"?
If the input is "C rules"
, the recorded failed primitives at position 2
should be:
"++"
, with stackstart > c#1
"sucks"
, with stackstart
"stinks"
, with stackstart
The expected set should be {"sucks"
, "stinks"
}.
The interesting thing in this example is that neither "sucks"
nor "stinks"
"subsumes" "++"
, but "++"
shouldn't be included in the expected set. This is because even if it shows up in the input, we will still (eventually) need one of the other expected strings in order to get to an accepting state, which means that "++"
is superfluous.