ocparse is a production-ready openCypher parser written in pure Erlang. ocparse is closely aligned to the openCypher project and in future will be adapted on a regular basis as the openCypher project evolves. The openCypher project aims to deliver a full and open specification of the industry’s most widely adopted graph database query language: Cypher. And, with the EBNF file the project provides the basis for the definition of the LALR grammar.
MATCH (m:Movie)
WHERE m.title = 'The Matrix'
RETURN m
1> {ok, {ParseTree, Tokens}} = ocparse:source_to_pt("MATCH (m:Movie) WHERE m.title = 'The Matrix' RETURN m").
{ok,
{{cypher,
{statement,
{query,
{regularQuery,
{singleQuery,
[{clause,
{match,[],
{pattern,
[{patternPart,[],
{anonymousPatternPart,{patternElement,{...},...}}}]},
{where,
{expression,
{orExpression,{xorExpression,{andExpression,...},[]},[]}}}}},
{clause,
{return,[],
{returnBody,
{returnItems,[],[],[{returnItem,{...},...}]},
[],[],[]}}}]},
[]}}},
[]},
[{'MATCH',1},
{'(',1},
{'UNESCAPED_SYMBOLIC_NAME',1,"m"},
{':',1},
{'UNESCAPED_SYMBOLIC_NAME',5,"Movie"},
{')',1},
{'WHERE',1},
{'UNESCAPED_SYMBOLIC_NAME',1,"m"},
{'.',1},
{'UNESCAPED_SYMBOLIC_NAME',5,"title"},
{'=',1},
{'STRING_LITERAL',1,"'The Matrix'"},
{'RETURN',1},
{'UNESCAPED_SYMBOLIC_NAME',1,"m"}]}}
2> ParseTree.
{cypher,
{statement,
{query,
{regularQuery,
{singleQuery,
[{clause,
{match,[],
{pattern,
[{patternPart,[],
{anonymousPatternPart,
{patternElement,
{nodePattern,{variable,...},{...},...},
[]}}}]},
{where,
{expression,
{orExpression,
{xorExpression,
{andExpression,{notExpression,{...},...},[]},
[]},
[]}}}}},
{clause,
{return,[],
{returnBody,
{returnItems,[],[],
[{returnItem,{expression,{orExpression,...}},[]}]},
[],[],[]}}}]},
[]}}},
[]}
3> Tokens.
[{'MATCH',1},
{'(',1},
{'UNESCAPED_SYMBOLIC_NAME',1,"m"},
{':',1},
{'UNESCAPED_SYMBOLIC_NAME',5,"Movie"},
{')',1},
{'WHERE',1},
{'UNESCAPED_SYMBOLIC_NAME',1,"m"},
{'.',1},
{'UNESCAPED_SYMBOLIC_NAME',5,"title"},
{'=',1},
{'STRING_LITERAL',1,"'The Matrix'"},
{'RETURN',1},
{'UNESCAPED_SYMBOLIC_NAME',1,"m"}]
4> ocparse:pt_to_source_td(ParseTree).
<<"match (m :Movie) where m .title = 'The Matrix' return m">>
5> ocparse:pt_to_source_bu(ParseTree).
<<"match (m :Movie) where m .title = 'The Matrix' return m">>
The output of the parse tree in the Erlang shell is shortened (cause not known). The complete parse tree of the example code looks as follows:
{cypher,
{statement,
{query,
{regularQuery,
{singleQuery,
[{clause,
{match,[],
{pattern,
[{patternPart,[],
{anonymousPatternPart,
{patternElement,
{nodePattern,
{variable,{symbolicName,"m"}},
{nodeLabels,
[{nodeLabel,
{labelName,
{schemaName,{symbolicName,"Movie"}}}}]},
[]},
[]}}}]},
{where,
{expression,
{orExpression,
{xorExpression,
{andExpression,
{notExpression,
{comparisonExpression,
{addOrSubtractExpression,
{multiplyDivideModuloExpression,
{powerOfExpression,
{unaryAddOrSubtractExpression,
{stringListNullOperatorExpression,
{propertyOrLabelsExpression,
{atom,{variable,{symbolicName,"m"}}},
[{propertyLookup,
{propertyKeyName,
{schemaName,{symbolicName,"title"}}}}]},
[]},
[]},
[]},
[]},
[]},
[{partialComparisonExpression,
{addOrSubtractExpression,
{multiplyDivideModuloExpression,
{powerOfExpression,
{unaryAddOrSubtractExpression,
{stringListNullOperatorExpression,
{propertyOrLabelsExpression,
{atom,
{literal,{stringLiteral,"'The Matrix'"}}},
[]},
[]},
[]},
[]},
[]},
[]},
"="}]},
[]},
[]},
[]},
[]}}}}},
{clause,
{return,[],
{returnBody,
{returnItems,[],[],
[{returnItem,
{expression,
{orExpression,
{xorExpression,
{andExpression,
{notExpression,
{comparisonExpression,
{addOrSubtractExpression,
{multiplyDivideModuloExpression,
{powerOfExpression,
{unaryAddOrSubtractExpression,
{stringListNullOperatorExpression,
{propertyOrLabelsExpression,
{atom,{variable,{symbolicName,"m"}}},
[]},
[]},
[]},
[]},
[]},
[]},
[]},
[]},
[]},
[]},
[]}},
[]}]},
[],[],[]}}}]},
[]}}},
[]}
The documentation for ocparse is available here: Wiki.
The number of block comments (/* ... */
) is limted to one per line.
The rule Properties
has a higher precedence than the rule Literal
.
The following tokens may not be used as SymbolicName
:
ALL AND ANY AS ASC ASCENDING BY CONTAINS COUNT CREATE DECIMAL_INTEGER DELETE
DESC DESCENDING DETACH DISTINCT ENDS ESCAPED_SYMBOLIC_NAME EXPONENT_DECIMAL_REAL
EXTRACT FALSE FILTER HEX_INTEGER IN IS LIMIT MATCH MERGE NONE NOT NULL
OCTAL_INTEGER ON OPTIONAL OR ORDER REGULAR_DECIMAL_REAL REMOVE RETURN SET
SINGLE SKIP STARTS STRING_LITERAL TRUE UNESCAPED_SYMBOLIC_NAME UNION UNWIND
WHERE WITH XOR
An exception is the use of the token COUNT
as FunctionName
.
Unicode is not supported with Dash
, LeftArrowHead
, RightArrowHerad
or UnescapedSymbolicName
. Hence Dash
is limited to the hyphen (-
), LeftArrowHead
is limited to '<
' and RightArrowHead
is limited to '>
'.
In the scripts test\gen_test.bat
and test\gen_test_and_run.bat
, the heap size has been changed to speed up test data generation. If necessary, you are welcome to make suitable adjustments for your purposes.
No test data is generated for the following rules:
FunctionInvocation = FunctionName, [SP], '(', [SP], (D,I,S,T,I,N,C,T), ')' ;
Instead of
MultiPartQuery = (ReadPart | (UpdatingStartClause, [SP], UpdatingPart)), With, [SP], { ReadPart, UpdatingPart, With, [SP] }, SinglePartQuery ;
it is only used
MultiPartQuery = (ReadPart | (UpdatingStartClause, [SP], UpdatingPart)), With, [SP], { ReadPart, With, [SP] }, SinglePartQuery ;
SchemaName = ... | ReservedWord ;
SymbolicName = ... | (C,O,U,N,T) | (F,I,L,T,E,R) | (E,X,T,R,A,C,T) | (A,N,Y) | (N,O,N,E) | (S,I,N,G,L,E) ;
This project was inspired by the sqlparse project of the company K2 Informatics GmbH.