-
Notifications
You must be signed in to change notification settings - Fork 856
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disallow integers in keys #592
Comments
Base keys can not contain As for the alternative notations, I don't think it's really that big an issue. |
Sorry, I used the term key as a bag for base key and dot key. So
hm, I don't understand which alternatives you are referring to? 😅 I'm just pointing a double issue:
|
Interesting. I have recently implemented a Toml lexer & parser but did not notice this edge case. Perhaps it can be resolved by lexer states that are independent of the parser state?
This obviously is naive and simplified (no handling of inline-tables) but it is only meant to demonstrate the concept. btw if we can differentiate between TOP Level context (key) and value mode |
That's a bit playing with the concepts! 😅 As it's basically bringing parser knowledge into the lexer... and also as you said, inline tables are going to require more knowledge to correctly do it. In my case, my lexer has something like this: if (State == LexerSate.Key)
{
NextTokenForKey();
}
else
{
NextTokenForValue();
} and the parser is feeding the lexer with the correct state, and I'm "fine" with this solution. But this is annoying, because typically, I cannot test my lexer separately on a full TOML document and it is even barely unit testable on slices, as it would require my unit tests to have this parser knowledge... But I'm worried that my original issue dragged the discussion to some "implementation details" while I should have put an emphasis on the user experience ambiguity first (e.g |
Back to the high level concepts.I agree that from a simplicity perspective that it would be best to decouple the lexing and parsing state completely. If only due to the fact some language tools have completely separate lexing and parsing Some References to a similar (but more complex) scenario:
There is another concern that when creating a fault tolerant parser, |
Yes, and specially, in the case of TOML, we are lucky that the grammar doesn't require token previews |
So, I just thought I'd add my two cents, having already taken a stab at implementing a toml 0.5 parser. Basically, if it's lexed as number, it needs to be a base 10 integer, otherwise it'll reject it. I've actually just taken a glance at the spec, and I realise I've actually done some things incorrectly. However, floats an issue does come up. Reading the actual spec, I would 100% convert |
Ok, just went and changed it: 10 = "asd"
1.0 = "asd"
1.00.0 = "asd"
0x10 = "asd"
0o10 = "asd"
0b10 = "asd"
[1.00.00]
value = "asd" is now parsed as:
|
And that's another case to remove these integers in keys in TOML, as I'm pretty sure that many implementations got actually that wrong. That's why you can't use a bare lexer, you need to maintain a state, otherwise going through "oh I just parsed a float but it is actually not " is pretty fragile/clunky to recover from (specially with the dot, as the parser is usually in a very different state when it handles a dotted key) |
Yes. This is not going to change in TOML 1.0. We could revisit this post TOML 1.0, if someone's motivated enough to bring this up then. :)
|
Personally, I think that integer keys are a bit weird anyway, but I might just not have the use-case for it. The best action might be to clarify that It's implicit in the spec, but as this issue is showing, a couple of people already missed it, and it's not even version 1.0. |
Happy to accept a PR for this. :) |
Is this because you don't want to break compatibility (while I thought that 0.x could allow this) or you think that the cost of the ambiguity is worth the price that allowing integers in keys is bringing? I could run some stats on all TOML documents on github to find out how many are using integers in their keys if that's something that could balance a decision....
Yes, and I'm going to send a PR to https://github.com/BurntSushi/toml-test to add these edge cases... pretty sure that it is going to break a few libraries that were assuming full 0.5 compatibility. |
This -- the README has a note about 0.5.0 being "extremely stable". |
You are going finicky. 😛 My turn: you have omitted to mention the following phrase:
So the README doesn't say that it should stay stable at all cost 😉 But fair enough, I will find the time to send a PR to make the specs more explicit. |
@xoofx I'm not sure what you mean by finicky here. I assume you meant it in good faith. :) |
I was just teasing a bit, as I found the responses a bit terse regarding the problem I'm bringing here 😊 |
Reading the spec, if would think, that integers (!= string with digits) are not allowed as keys and [table]
1.0 = 2.0
0x155 = 0x155 is parsed as [table]
"1"."0" = 2.0
"0x155" = 0x155 and 1.0 = 2.0
"1"."0" = "whatever" is forbidden due to the duplicate key. (github output looks strange) |
Indeed. The point of the discussion here was just to highlight that "integers like" keys can be actually confusing and it makes the lexer dependent on the state of the parser, which is quite annoying. |
maybe github only support toml earlier version which not support dotted keys... |
(slight OT) @xoofx I usually try not to be terse or come across as dismissive but apologies if it came across as such. Sarcasm isn't something that gets communicated well unless folks share the same context which I didn't in this case. The ship has sailed for this. For now, it's better to not break compatibility with TOML 0.5 -> 1.0. In terms of actionable items here, I am splitting this ticket into 2.
|
Please excuse my ignorance, but doesn't this also affect keywords? Even with the proposed rule
I don't think there is a simple way to avoid this ambiguity. |
Devil's advocate says: It's not hard to tell what's on the left side of the equals sign and what's on the right side. Yes you can confuse yourself by naming your keys the same as Boolean values. But there's no good reason to do that. |
Regarding the post-1.0 discussion, I'd argue that integer-like bare keys should not be forbidden since they might actually be useful in some cases. Maybe someone has a file [December]
3 = "myself"
9 = ["Kirk Douglas", "John Malkovich"]
24 = "Jesus" That's just an example, but use cases like this should not be ruled out merely to make the lives of parser writers easier. Nor do I see a good reason to force people to use quotation marks in such cases. |
I was open to ideas/reasons to change the behavior. @ChristianSi's comment from the future (see end of post) has convinced me that it's not worth changing this behavior. There are valid use cases for this and we explicitly call this behavior allowed in the current specification. We won't be disallowing integers in keys in TOML. A clarification will be added to explain the dotted keys edge case (this is being tracked #616). Thanks for filing this issue @xoofx and to everyone who's contributed to this discussion! :) Hi @ChristianSi from the future! :P |
Well, post-1.0 is still a long way of, so the future seemed the right place to make that comment 😁 Also, this is the first time someone has reacted to a comment of mine before I even made it. Cheers, @pradyunsg ! |
Hey,
While implementing the lexer for TOML, I have found that allowing integers in a key makes the lexer parser state dependent. It is pretty uncommon to have this in simple languages as it happens usually in languages that have complicated context dependent parsing (e.g preprocessor macros in a C like language). Imo, it's an unfortunate choice for a simple language like TOML to incur contextual lexers.
Another weird side effect of this is that you can have in a key something that looks like a float, or an integer hexa
I don't know if this is something that could be revisited before 1.0 and change a key to
[a-zA-Z\-_][a-zA-Z\-_0-9]*
?The text was updated successfully, but these errors were encountered: