Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix!: several format string fixes and improvements #6703

Merged
merged 15 commits into from
Dec 6, 2024

Conversation

asterite
Copy link
Collaborator

@asterite asterite commented Dec 4, 2024

Description

Problem

Resolves #4642

Summary

This PR includes several fixes and improvements to format strings:

  1. Format strings (f"..."), as tokens, are now represented as fragments which can either be strings or interpolated pieces. The interpolated pieces also have a Span to know where they are. The advantage of this is that we don't need to use a regex to find the interpolations and we can provide better error messages when, for example, the interpolated variable doesn't exist. That said, the print oracle gets the raw string and needs to do the interpolation, and that still uses regex (but at least compile-time code won't need to do regex matching each time it finds a fmtstr).
  2. We now disallow non-identifiers in interpolations. This actually uncovered a few "bugs" in the standard library where we were using interpolations in a wrong way.
  3. Escape sequences like \n and \t are now allowed inside format strings.
  4. We also now allow escaping { and } and we do that just like in Rust: {{ to mean {, }} to mean }.
  5. Like in Rust, if you write } but there was no single { before it (we were not in an interpolation) you get an error that says you must use }} to write }.

Because {{ now means to write {, I had to slightly change the print oracle to understand this.

Additional Context

I'm marking this PR as a breaking change because it could break someone's code if they had incorrect interpolations, or if they had \n inside a format string and wanted to actually print "\" and "n" separately.

I left LSP features as a follow-up PR, mainly because I realized autocompletion doesn't automatically kick in when you are typing inside a string literal, so implementing completion there doesn't seem very useful. What we could do is at least implement "hover" in these pieces (go-to-definition already works).

A question: do print oracles exist somewhere else? For example in a web context (what I'm saying might not make sense, sorry). I'm just wondering if the print oracle logic of handling interpolations must be updated somewhere else too.

Documentation

This PR might need to document format string escape sequences... but I couldn't find this explained for strings so maybe all of that could be done in one go in a separate PR (if really needed now).

Check one:

  • No documentation needed.
  • Documentation included in this PR.
  • [For Experimental Features] Documentation to be submitted in a separate PR.

PR Checklist

  • I have tested the changes locally.
  • I have formatted the changes with Prettier and/or cargo fmt on default settings.

@jfecher
Copy link
Contributor

jfecher commented Dec 4, 2024

However, if you write f"{foo.bar}" then I think that should error but it doesn't (you get "{foo.bar}" printed) so given that we don't handle such errors I didn't want to handle any error at all.

We should definitely error here eventually, it's quite an annoyance while writing format strings

@asterite
Copy link
Collaborator Author

asterite commented Dec 4, 2024

Awesome! I'll tackle that next, it should be relatively easy to do.

I also noticed that if you write f"Hello {x}" and x gives an error (say, it doesn't exist) then the error is on the entire format string. I'll try to improve that too (in yet another PR).

Copy link
Contributor

github-actions bot commented Dec 4, 2024

Peak Memory Sample

Program Peak Memory %
keccak256 81.50M 0%
workspace 121.95M -1%
regression_4709 333.71M 0%
ram_blowup_regression 2.55G 0%

@asterite
Copy link
Collaborator Author

asterite commented Dec 4, 2024

Oooh... a test is failing because it has this:

        let s1 = f"x is {x}, fake interpolation: \{y}, y is {y}";

I guess \{ was used to escape that { (though that \ is still printed). In Rust to do that you write {{. Which way do we want to use?

@jfecher
Copy link
Contributor

jfecher commented Dec 4, 2024

Let's stick with Rust in this case

Copy link
Contributor

@aakoshh aakoshh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@asterite asterite changed the title fix: honor escape characters in format strings fix: several format string fixes and improvements Dec 5, 2024
@asterite asterite changed the title fix: several format string fixes and improvements fix!: several format string fixes and improvements Dec 5, 2024
Copy link
Contributor

github-actions bot commented Dec 5, 2024

Changes to Brillig bytecode sizes

Generated at commit: 0d357023f20128a5e906f14121c0b72b34fc1175, compared to commit: 5ccde8196400a9b99e5c3c4d9af0e3136e72b4cb

🧾 Summary (10% most significant diffs)

Program Brillig opcodes (+/-) %
uhashmap +26 ❌ +0.18%
hashmap +34 ❌ +0.16%

Full diff report 👇
Program Brillig opcodes (+/-) %
uhashmap 14,178 (+26) +0.18%
hashmap 21,899 (+34) +0.16%

Copy link
Contributor

github-actions bot commented Dec 5, 2024

Changes to number of Brillig opcodes executed

Generated at commit: 0d357023f20128a5e906f14121c0b72b34fc1175, compared to commit: 5ccde8196400a9b99e5c3c4d9af0e3136e72b4cb

🧾 Summary (10% most significant diffs)

Program Brillig opcodes (+/-) %
hashmap -20 ✅ -0.04%
uhashmap -348 ✅ -0.24%

Full diff report 👇
Program Brillig opcodes (+/-) %
hashmap 55,540 (-20) -0.04%
uhashmap 146,948 (-348) -0.24%

@asterite
Copy link
Collaborator Author

asterite commented Dec 5, 2024

Asking for a re-review because this PR now does a lot more than what I originally planned.

Copy link
Contributor

@aakoshh aakoshh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@asterite asterite added this pull request to the merge queue Dec 6, 2024
Merged via the queue into master with commit b70daf4 Dec 6, 2024
50 checks passed
@asterite asterite deleted the ab/fmt-string-escape-chars branch December 6, 2024 15:12
AztecBot added a commit to AztecProtocol/aztec-packages that referenced this pull request Dec 7, 2024
…#6650)

fix: git dependency trailing slash (noir-lang/noir#6725)
chore: optimise older opcodes in reverse order (noir-lang/noir#6476)
chore: add script to check for critical libraries supporting a given Noir version (noir-lang/noir#6697)
fix!: several format string fixes and improvements (noir-lang/noir#6703)
fix: print ssa blocks without recursion (noir-lang/noir#6715)
chore: redo typo PR by Madmaxs2 (noir-lang/noir#6721)
chore: add a few regression tests for #6674 (noir-lang/noir#6687)
AztecBot added a commit to AztecProtocol/aztec-packages that referenced this pull request Dec 7, 2024
fix: git dependency trailing slash (noir-lang/noir#6725)
chore: optimise older opcodes in reverse order (noir-lang/noir#6476)
chore: add script to check for critical libraries supporting a given Noir version (noir-lang/noir#6697)
fix!: several format string fixes and improvements (noir-lang/noir#6703)
fix: print ssa blocks without recursion (noir-lang/noir#6715)
chore: redo typo PR by Madmaxs2 (noir-lang/noir#6721)
chore: add a few regression tests for #6674 (noir-lang/noir#6687)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Format and raw strings ignore escape characters
4 participants