Add context free proto file grammar #2

ObsidianMinor · 2019-08-24T11:09:11Z

This repository is looking lacking, so I thought I'd get it started. I also plan to write specs for proto2 and proto3, but in the meantime I decided to write this context free grammar to document a grammar that works for every proto file regardless of whether the syntax is valid in context (like incorrect syntax versions or invalid values for certain options)

Add reserved declaration to enum

spec/grammar.ebnf

jhump · 2019-08-28T14:30:38Z

spec/grammar.ebnf

+
+end_statement = ";" ;
+
+aggregate_literal = "{" , { identifier , ":" , literal } , "}" ; (* whitespace and or comments seperate each field value *)


This one is a little tricky. Aggregate values referenced by an option must start with {. But what goes inside the braces is actually the protobuf text format, which is much more lenient than what you have here.

In particular:

You can use < and > around nested message values (not just { and })

The colon is optional for nested messages: foo { a: 1 } is valid

Repeated fields can use array-like notation: foos: ["a", "b", "c"] is valid

The identifier is not just a simple identifier -- it is a fully-qualified identifier and can optionally be surrounded by brackets or parentheses ([name] or (name)) to indicate that it is the name of an extension, nor a regular field.

jhump · 2019-08-28T14:32:18Z

spec/grammar.ebnf

+import = "import" , ["weak" | "public"] , string_literal , end_statement ;
+package = "package" , full_identifier , end_statement ;
+
+option_name = (identifier | "(" , full_identifier , ")") , {"." , identifier } ;


The trailing portions are defined here only as identifier, but they are also allowed to be "(", full_identifier, ")", to indicate extension fields inside of a nested message.

After checking the parser source code, it appears that an easier way to represent it would be to have an at-least one name_part rule like so.

name_part = identifier | ("(" , full_identifier , ")") ; option_name = name_part , { "." , name_part } ;

If I'm correct then this allows for everything the parser allows, including

foo // simple identifiers foo.bar // full identifiers (foo).bar // extensions (foo.bar).baz // extensions with full identifiers (foo.bar).(baz) // extensions on extensions (foo).(bar).(baz) // deep extensions on extensions

jhump · 2019-08-28T14:35:40Z

spec/grammar.ebnf

+;
+
+identifier = letter , { letter | decimal_digit | "_" } ;
+full_identifier = identifier , { "." , identifier } ;


Most places that accept a full_identifier do in fact support a preceding ".", to explicitly indicate an absolute fully-qualified name (e.g. not relative to the current context). This includes references to custom option names.

I don't believe this works for custom option names (at least anymore). I have a rule for those cases however where you can have a preceding dot called type_name and it's used in areas where preceding dot is allowed.

jhump · 2019-08-28T14:42:19Z

spec/grammar.ebnf

+extend = "extend" , identifier , "{" , { field | group | end_statement } , "}" ;
+
+rpc = "rpc" , identifier , "(" , ["stream"] , type_name , ")" , "returns" , "(" , ["stream"] , type_name , ")" , (("{" , { option , end_statement } , "}") | end_statement) ;
+stream = "stream" , identifier , "(" , type_name , "," , type_name , ")" , (("{" , { option , end_statement } , "}") | end_statement) ;


While this is indicated in one of the grammars listed on the main docs site, protoc does not actually accept it. So it should be removed (from this grammar and from the docs site). I'm pretty sure this was transcribed from an internal version of protoc to support earlier versions of streaming stubby (the internal RPC framework at Google, pre-cursor to gRPC).

Streams are instead defined only with rpc keyword and the use of stream next to request or response type (or both(.

* Allow seperated string literals * Expand upon aggregate literals format * Fixed incorrect option name rule * Removed stream service method type

derekperkins · 2020-02-20T02:06:56Z

Was there a reason this never got merged?

teijeong · 2020-07-17T02:31:46Z

+1, Was there a reason this never got merged?

Logofile · 2022-02-01T18:31:36Z

ObsidianMinor, we were going to turn down this repository when we discovered this outstanding PR. Sorry for the losing track of this!

Would you mind redirecting the location for the file to https://github.com/protocolbuffers/protobuf/blob/master/docs/grammar.ebnf? I'll work to get the doc reviewed by our engineering team in the meantime, in case any changes are needed before we can accept the PR.

jhump · 2022-02-01T19:21:34Z

FWIW, here's another alternative that describes the language, also in EBNF.
https://github.com/jhump/protocompile/blob/master/grammar/README.md

Unlike this document, it separates lexical analysis from the rest of the grammar productions in order to describe the nuance in tokenization relating to handling of whitespace and comments. It also has a slightly different way to interpret numeric literals, in an attempt to codify some of protoc's behavior. For example, in the face of "1to1000", the grammar in this PR suggests a tokenization of "1", "to", "1000", but protoc considers this a syntax error.

I have high confidence in the grammar as it is based on a yacc grammar that powers what I think might be the closest implementation to protoc itself.

Logofile · 2022-02-01T20:27:45Z

Joshua, if you'd like to submit a PR for the same target location (so we can get all of the CLAs and such covered), we'd love to take a look at adding it to our docs. I'm still coming up-to-speed on EBNF, so I'll need to get some SWE eyes on it once the PR comes in if you do submit one.

ObsidianMinor force-pushed the spec/cf-grammar branch from d1e588f to 46b9c66 Compare August 24, 2019 22:50

Added initial version of proto file grammar

bb3e77f

ObsidianMinor force-pushed the spec/cf-grammar branch from 46b9c66 to bb3e77f Compare August 25, 2019 00:00

ObsidianMinor added 4 commits August 25, 2019 07:13

Fix typo in octal_literal definition

39d6cd3

Remove group from field definition

f957bca

Add optional sign to enum_value int value

0063e9d

Add reserved declaration to enum

Remove misplaced equals sign in import

0bc288b

jhump reviewed Aug 28, 2019

View reviewed changes

ObsidianMinor added 3 commits August 28, 2019 15:51

Fix issues pointed out by @jhump

01e6468

* Allow seperated string literals * Expand upon aggregate literals format * Fixed incorrect option name rule * Removed stream service method type

Allow strings in aggregate text literal

5d9a700

Add Any support to text format grammar

f7613f6

jhump mentioned this pull request Feb 28, 2022

protoc: strange inconsistency in message literals in custom options for when a colon is needed protocolbuffers/protobuf#9551

Closed

westred1978 approved these changes Mar 6, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add context free proto file grammar #2

Add context free proto file grammar #2

ObsidianMinor commented Aug 24, 2019 •

edited

Loading

jhump Aug 28, 2019

jhump Aug 28, 2019

ObsidianMinor Aug 28, 2019

jhump Aug 28, 2019

ObsidianMinor Aug 28, 2019

jhump Aug 28, 2019

derekperkins commented Feb 20, 2020

teijeong commented Jul 17, 2020

Logofile commented Feb 1, 2022

jhump commented Feb 1, 2022

Logofile commented Feb 1, 2022


		end_statement = ";" ;

		aggregate_literal = "{" , { identifier , ":" , literal } , "}" ; (* whitespace and or comments seperate each field value *)

Add context free proto file grammar #2

Are you sure you want to change the base?

Add context free proto file grammar #2

Conversation

ObsidianMinor commented Aug 24, 2019 • edited Loading

jhump Aug 28, 2019

Choose a reason for hiding this comment

jhump Aug 28, 2019

Choose a reason for hiding this comment

ObsidianMinor Aug 28, 2019

Choose a reason for hiding this comment

jhump Aug 28, 2019

Choose a reason for hiding this comment

ObsidianMinor Aug 28, 2019

Choose a reason for hiding this comment

jhump Aug 28, 2019

Choose a reason for hiding this comment

derekperkins commented Feb 20, 2020

teijeong commented Jul 17, 2020

Logofile commented Feb 1, 2022

jhump commented Feb 1, 2022

Logofile commented Feb 1, 2022

ObsidianMinor commented Aug 24, 2019 •

edited

Loading