Top-Down Operator Precedence Parsing
Face complex precedence and associativity rules without fear using
elm/parser
.
elm install elm/parser
elm install dmy/elm-pratt-parser
- Overview
- Getting Started
- About Pratt Parsers
- Terminology
- Design and Implementation Considerations
- References
- License
Writing parsers using
elm/parser
is usually simple and fun, but handling complex operators precedence and
associativity rules in an expression parser
can be tricky,
or even hard and frustrating for more complex cases.
This library goal is to fix this by adding a single
expression
parser to elm/parser
:
expression :
{ oneOf : List (Config expr -> Parser expr)
, andThenOneOf : List (Config expr -> ( Int, expr -> Parser expr ))
, spaces : Parser ()
}
-> Parser expr
This functions is configured with smaller standard parsers, precedence values and associativity rules, thanks to a minimalist flexible API, and handles the whole expression parsing complexity using a simple but powerful algorithm inherited from the one described by Vaughan Pratt in his 1973 paper "Top Down Operator Precedence" [1].
Helpers are provided for literals, constants, prefix, infix and postfix expressions but custom ones can be defined when needed.
The library is small, has a test suite, benefits from tail-call elimination
for left-associative operations, never uses
backtrackable
by itself and allows to produce helpful error messages using
Parser.Advanced
if wanted.
Here is a quite complete calculator.
It evaluates the result during parsing, without generating an explicit
intermediate abstract syntax tree (AST), so it directly uses Float
as the
expr
type.
import Parser exposing ((|.), (|=), Parser)
import Pratt
mathExpression : Parser Float
mathExpression =
Pratt.expression
{ oneOf =
[ Pratt.literal Parser.float
, Pratt.constant (Parser.keyword "pi") pi
, Pratt.prefix 3 (Parser.symbol "-") negate
, Pratt.prefix 5 (Parser.keyword "cos") cos
, parenthesizedExpression
]
, andThenOneOf =
[ Pratt.infixLeft 1 (Parser.symbol "+") (+)
, Pratt.infixLeft 1 (Parser.symbol "-") (-)
, Pratt.infixLeft 2 (Parser.symbol "*") (*)
, Pratt.infixLeft 2 (Parser.symbol "/") (/)
, Pratt.infixRight 4 (Parser.symbol "^") (^)
, Pratt.postfix 6 (Parser.symbol "°") degrees
]
, spaces = Parser.spaces
}
parenthesizedExpression : Pratt.Config Float -> Parser Float
parenthesizedExpression config =
Parser.succeed identity
|. Parser.symbol "("
|= Pratt.subExpression 0 config
|. Parser.symbol ")"
math : Parser Float
math =
Parser.succeed identity
|= mathExpression
-- string must end after an expression:
|. Parser.end
Parser.run math "-2^2^3" --> Ok -(2^(2^3))
Parser.run math "-2^2^3" --> Ok -256
Parser.run math "3--4" --> Ok (3-(-4))
Parser.run math "3--4" --> Ok 7
Parser.run math "cos (2*pi)" --> Ok 1
Parser.run math "cos (2*180°)" --> Ok 1
Parser.run math "1 - cos 360°" --> Ok 0
Let's describe step by step each part of the example.
1.
First we configure the parsers used at the start of an expression or after an
operator. The expression parser cannot work without at least one of these
parsers succeeding as it would not be able to parse an expr
value.
These parsers include among others: literals, constants, prefix
expressions or sub-expressions parsers.
mathExpression : Parser Float
mathExpression =
Pratt.expression
{ oneOf =
[ Pratt.literal Parser.float
, Pratt.constant (Parser.keyword "pi") pi
, Pratt.prefix 3 (Parser.symbol "-") negate
, Pratt.prefix 5 (Parser.keyword "cos") cos
, parenthesizedExpression
]
Note that
literal
,
constant
and prefix
helpers all use a Parser
argument, like float
,
keyword pi
or symbol "-"
here, so you have full control on parsing and
produced error messages.
The prefix
helper first needs an Int
argument, the
precedence. The higher it is, the higher the operator precedence is.
All parsers last parameter is a Config expr
, passed automatically by the
expression
parser, that allows to call recursively
subExpression
with a custom precedence.
This is used here in the parser for sub-expressions between parentheses and
also inside prefix
and infix
helpers.
This is why the type of each oneOf
parser is
Config expr -> Parser expr
.
parenthesizedExpression : Pratt.Config Float -> Parser Float
parenthesizedExpression config =
Parser.succeed identity
|. Parser.symbol "("
|= Pratt.subExpression 0 config
|. Parser.symbol ")"
Note that expression
is equivalent to subExpression 0
, so the
expression
parser starts parsing the expression with the lowest precedence.
2.
Then we configure the parsers that use the result of the previous parser.
As they use the previously parsed expression, they have an expr
parameter.
They typically include infix and postfix expressions parsers:
, andThenOneOf =
[ Pratt.infixLeft 1 (Parser.symbol "+") (+)
, Pratt.infixLeft 1 (Parser.symbol "-") (-)
, Pratt.infixLeft 2 (Parser.symbol "*") (*)
, Pratt.infixLeft 2 (Parser.symbol "/") (/)
, Pratt.infixRight 4 (Parser.symbol "^") (^)
, postfix 6 (Parser.symbol "°") degrees
]
Like configuration oneOf
parsers, they receive a Config expr
, but return
instead a tuple (Int, expr -> Parser expr)
because they need to provide their
precedence to the algorithm, and a expr -> Parser expr
parser that will
be called with the preceding expression (the left expression).
See subExpression
documentation to better understand the algorithm.
3. We can then complete the configuration by adding a spaces : Parser ()
parser that will be used between each previously configured parser.
Parser.spaces
is used here, but succeed ()
could have been used if a general parser was not
wanted.
, spaces = Parser.spaces
}
4. At last we require the end of string after our expression by including our
parser into a top-level elm/parser
one:
math : Parser Float
math =
Parser.succeed identity
|= mathExpression
-- string must end after an expression:
|. Parser.end
That's it!
And we have used all the functions exposed by the Pratt
module.
A more complete example is included in the package source code, see
examples/Calc.elm
.
For a similar example but with an AST using a custom
type, see examples/Math.elm
instead.
Of course, you can define parsers for any expression, not only mathematical
ones. For example have a look at examples/Boole.elm
for a start with Boole's
Algebra and an example of if then else
expressions.
The parsers configured by this library use a variant of the algorithm described by Vaughan Pratt in his 1973 paper "Top Down Operator Precedence" [1]. Such parsers are usuall called "Pratt parsers", "Top-Down Operator Precedence parsers", or "TDOP" parsers.
This algorithm is used in this library because my experience comparing alternatives confirmed for the most part what Vaughan Pratt claimed:
"The approach described [...] is very simple to understand, trivial to implement, easy to use, extremely efficient in practice if not in theory, yet flexible enough to meet most reasonable syntactic needs of users [...]. Moreover, it deals nicely with error detection."
Note that the later "precedence climbing" algorithm is now often considered as a special case of the earlier Pratt parsing algorithm.
See [2] to read more about them.
At last, Douglas Crockford, who implemented a Javascript parser using a Pratt parser for JSLint [4], said:
"Another explanation is that his technique is most effective when used in a dynamic, functional programming language."
I believe that the dynamic
part has already been proven wrong
[5][6][7], and I hope that this
implementation will contribute confirming it.
The terminology used in Vaughan R. Pratt's paper is not really intuitive nor mainstream so it has been changed in this library:
- Configuration
oneOf
parsers are called in the papernud
, for "Null Denotation". Here is how they are defined in the original paper:
"We will call the code denoted by a token without a preceding expression its null denotation or nud"
- Configuration
andThenOneOf
parsers are calledled
, for "Left Denotation". Here is how they are defined in the original paper:
"We will call the code denoted by a token with a preceding expression its left denotation or led"
- The precedence is called binding power in the paper. It is used for operators precedence, and also allows to achieve right-associative operations.
See [3] and the algorithm described in
subExpression
documentation for more details.
The main differences with Pratt's design and implementation, that reflects on this library API, are:
-
Tokens are reduced to their minimal form:
- just a parser for
nud
tokens - a precedence (alias binding power) and a parser
(with an expression parameter) for
led
tokens
There is therefore no actual token anymore and it is not necessary to tokenize the expression before parsing it.
- just a parser for
-
nuds
andleds
are not stored in aDict
using the operator as the key, instead they are just two lists of parsers used withParser.oneOf
.
- Vaughan R. Pratt, "Top Down Operator Precedence", 1973.
- Andy Chu, Pratt Parsing Index and Updates, 2017 (updated regularly)
- Eli Bendersky, Top-Down operator precedence parsing, 2010
- Douglas Crockford, Top Down Operator Precedence Douglas Crockford, 2007
- Andy Chu, Pratt Parsers Can Be Statically Typed, 2016
- Matthew Manela, A Monadic Pratt Parser, 2011
- Julian Hall, Pratt Parsing in Parsec. Perfect., parsec-pratt, 2016
BSD-3-Clause