-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add context free proto file grammar #2
base: master
Are you sure you want to change the base?
Add context free proto file grammar #2
Conversation
d1e588f
to
46b9c66
Compare
46b9c66
to
bb3e77f
Compare
spec/grammar.ebnf
Outdated
|
||
end_statement = ";" ; | ||
|
||
aggregate_literal = "{" , { identifier , ":" , literal } , "}" ; (* whitespace and or comments seperate each field value *) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is a little tricky. Aggregate values referenced by an option must start with {
. But what goes inside the braces is actually the protobuf text format, which is much more lenient than what you have here.
In particular:
- You can use
<
and>
around nested message values (not just{
and}
) - The colon is optional for nested messages:
foo { a: 1 }
is valid - Repeated fields can use array-like notation:
foos: ["a", "b", "c"]
is valid - The identifier is not just a simple identifier -- it is a fully-qualified identifier and can optionally be surrounded by brackets or parentheses (
[name]
or(name)
) to indicate that it is the name of an extension, nor a regular field.
spec/grammar.ebnf
Outdated
import = "import" , ["weak" | "public"] , string_literal , end_statement ; | ||
package = "package" , full_identifier , end_statement ; | ||
|
||
option_name = (identifier | "(" , full_identifier , ")") , {"." , identifier } ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The trailing portions are defined here only as identifier
, but they are also allowed to be "(", full_identifier, ")"
, to indicate extension fields inside of a nested message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After checking the parser source code, it appears that an easier way to represent it would be to have an at-least one name_part
rule like so.
name_part = identifier | ("(" , full_identifier , ")") ;
option_name = name_part , { "." , name_part } ;
If I'm correct then this allows for everything the parser allows, including
foo // simple identifiers
foo.bar // full identifiers
(foo).bar // extensions
(foo.bar).baz // extensions with full identifiers
(foo.bar).(baz) // extensions on extensions
(foo).(bar).(baz) // deep extensions on extensions
; | ||
|
||
identifier = letter , { letter | decimal_digit | "_" } ; | ||
full_identifier = identifier , { "." , identifier } ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most places that accept a full_identifier
do in fact support a preceding ".", to explicitly indicate an absolute fully-qualified name (e.g. not relative to the current context). This includes references to custom option names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe this works for custom option names (at least anymore). I have a rule for those cases however where you can have a preceding dot called type_name
and it's used in areas where preceding dot is allowed.
spec/grammar.ebnf
Outdated
extend = "extend" , identifier , "{" , { field | group | end_statement } , "}" ; | ||
|
||
rpc = "rpc" , identifier , "(" , ["stream"] , type_name , ")" , "returns" , "(" , ["stream"] , type_name , ")" , (("{" , { option , end_statement } , "}") | end_statement) ; | ||
stream = "stream" , identifier , "(" , type_name , "," , type_name , ")" , (("{" , { option , end_statement } , "}") | end_statement) ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this is indicated in one of the grammars listed on the main docs site, protoc does not actually accept it. So it should be removed (from this grammar and from the docs site). I'm pretty sure this was transcribed from an internal version of protoc to support earlier versions of streaming stubby (the internal RPC framework at Google, pre-cursor to gRPC).
Streams are instead defined only with rpc
keyword and the use of stream
next to request or response type (or both(.
* Allow seperated string literals * Expand upon aggregate literals format * Fixed incorrect option name rule * Removed stream service method type
Was there a reason this never got merged? |
+1, Was there a reason this never got merged? |
ObsidianMinor, we were going to turn down this repository when we discovered this outstanding PR. Sorry for the losing track of this! Would you mind redirecting the location for the file to https://github.com/protocolbuffers/protobuf/blob/master/docs/grammar.ebnf? I'll work to get the doc reviewed by our engineering team in the meantime, in case any changes are needed before we can accept the PR. |
FWIW, here's another alternative that describes the language, also in EBNF. Unlike this document, it separates lexical analysis from the rest of the grammar productions in order to describe the nuance in tokenization relating to handling of whitespace and comments. It also has a slightly different way to interpret numeric literals, in an attempt to codify some of protoc's behavior. For example, in the face of "1to1000", the grammar in this PR suggests a tokenization of "1", "to", "1000", but protoc considers this a syntax error. I have high confidence in the grammar as it is based on a yacc grammar that powers what I think might be the closest implementation to protoc itself. |
Joshua, if you'd like to submit a PR for the same target location (so we can get all of the CLAs and such covered), we'd love to take a look at adding it to our docs. I'm still coming up-to-speed on EBNF, so I'll need to get some SWE eyes on it once the PR comes in if you do submit one. |
This repository is looking lacking, so I thought I'd get it started. I also plan to write specs for proto2 and proto3, but in the meantime I decided to write this context free grammar to document a grammar that works for every proto file regardless of whether the syntax is valid in context (like incorrect syntax versions or invalid values for certain options)