-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: deserialize CairoPie from ZIP archives #1533
Feature: deserialize CairoPie from ZIP archives #1533
Conversation
b696acb
to
f960d45
Compare
Hi @odesenfans ! |
Hi @pefontana ! We’re currently working on porting the Starknet bootloader on the Rust VM, and a part of our test suite is to run PIEs generated with the Python VM, including Starknet OS PIEs. Furthermore, there’s no other method to load PIEs from files afaik. |
If it's for the test suite, can't a test helper handle the unzipping and loading? I find specially odd the fact that we load the memory dump when the VM is supposed to be producing it. Besides that, zip files need to be handled with care in general due to the risk of zip bombs. |
@Oppen :
|
I'm not sure that's necessarily true. Requirements and design choices can change, specially in reimplementations.
I see. Still, I think the ZIP handling can be done downstream by the library user, as long as you have a mechanism to pass the memory to the bootloader hints. Related, does this mean non-determinism must always produce deterministic values? Otherwise, re-executing with a different VM may lead to unfair rejection if it looks for differences. And if we go by assuming you use the same VM, then what Python does here does not matter. What am I missing? On a different issue: please don't mix code relocation with a feature PR. It makes it harder to review. Specifically I mean the code moved from |
Thanks for the comment. Regarding the ZIP file, my point is mostly "There are Cairo PIEs generated as ZIP files in the Cairo ecosystem so the For the rest, indeed our use case is a temporary one (we will use PIEs generated with the Rust VM eventually), but "temporary" is always such a vague term that I believe there is merit in stabilizing this. The bootloader is a deterministic program, we're not yet there but I don't see any reason why it should generate a different output on different VMs as long as the hints have the same side effects. We're still working on that. As for moving the code, I'll move the code relocation to a different branch then. |
I guess it may make sense. Let's get @pefontana and @lferrigno in the loop to see if we reach some agreement.
But the result of running the bootloader is the result of running a block of transactions that may or may not be deterministic, right? I mean, nondeterminism is a feature in Cairo, so is it still correct to assume two executions of possibly different VMs will always produce the same traces? Maybe we can consider it best effort and put some kind of disclaimer about it?
Thanks! |
Yes. I need to ask more details about this, but anyway being able to check deterministic programs is already pretty good. |
b265e17
to
b17ca2b
Compare
I moved the code back and fixed rebase conflicts. Waiting for additional reviews then :) |
Hi @odesenfans !
|
Fixed it, the CI passes on our fork. I did have to add quite a few |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1533 +/- ##
==========================================
+ Coverage 96.56% 96.60% +0.04%
==========================================
Files 96 95 -1
Lines 38494 38798 +304
==========================================
+ Hits 37170 37482 +312
+ Misses 1324 1316 -8 ☔ View full report in Codecov by Sentry. |
461e1ba
to
3a30789
Compare
Rebased on top of main + added a |
Great @odesenfans ! And I think we need another rebase with main and we will be good to go! |
0402e81
to
1db8a29
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the method from_zip_archive
the felt_bytes
value is built from the parsed metadata's prime. This value is then used by read_memory_file
to read the value off of each memory cell, which is then parsed by the method maybe_relocatable_from_le_bytes
, which asumes that the byte slice has at least 8 elements (or else line 114 would fail). If the parsed prime is not long enough then this will panic. For example, if we replace the prime in the metadata file of the fibonacci example with 7 the method CairoPie::from_file
will panic with 'range end index 8 out of range for slice of length 1’’.
Alternatively, we can use a constant for the address size instead of checking the byte length of the prime and instead return an error if the parsed prime is not CAIRO_PRIME. This would share the same behaviour as the program json file parsing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be tested with files that use builtins to check that the parsing of BuitlinAdditionalData
doesn't fail. I tried to deserialize the cairo pie obtained from running cairo_programs/common_signature.cairo (which uses the signature builtin) using CairoPie::from_file
and was met with the following error:
Parse(Error("data did not match any variant of untagged enum BuiltinAdditionalData", line: 1, column: 182))
f8894e9
to
1c27203
Compare
@juanbono I just pushed a fix 🤞 |
Problem: the Python VM generates Cairo PIEs as ZIP archives containing several JSON files and the memory as a binary file. We do not have a solution yet to deserialize these files into CairoPie objects. Solution: add a new `CairoPie::from_file(path)` method that reads the ZIP file and extracts its contents.
Added a `from_bytes` class method to build a `CairoPie` method in addition to the existing `from_file` method.
Problem: Deserializing the PIE additional data as a hashmap of `BuiltinAdditionalData` enums because of an issue with deserializing untagged unions in `serde` (see serde-rs/json#1103). Solution: add a new `AdditionalData` struct with explicit fields for each builtin, circumventing the untagged union issue. This solution has the advantage of always associating the correct data type for each builtin (it's not possible anymore to associate a builtin with a different data type), but requires modifications if a new builtin is added.
Problem: the ECDSA/signature builtin additional data is stored internally as a hashmap, but the Python VM stores it as a vector of tuples. Solution: Add a `SignatureBuiltinAdditionalData` struct and implement a custom deserializer for it that can take either a hashmap or a vector.
Problem: the ECDSA data felts are serialized as numbers and not strings. Solution: call `deserialize_felt_from_number` in the implementation of the visitor for sequences.
+ additional tests for coverage + removed the deserialization from hashmap for the signature builtin which was broken anyway.
Problem: the new implementation of `CairoPie` using the `CairoPieAdditionalData` struct makes it hard to reproduce the exact same behaviour as `cairo-lang` when handling builtins with no data. While `cairo-lang` will generate a null value and include it in the JSON file, we can only (easily) generate a null value for each builtin or for none of them. Solution: make the comparator script more flexible by filtering out null values from JSON contents.
c10101b
to
7773780
Compare
I pushed a fix for the latest CI failure, which was complaining about I think this added complexity makes little sense so I modified the PIE comparison script to just ignore null values from the comparison. Let me know what you think. |
Outputting the same serialized cairo pies as |
I think there is, although it's a bit more work. I just wanted to check with you if it was worth the extra effort for null values. I underline that the only difference is that |
Added a compatibility method to `CairoPieAdditionalData` to use an intermediate hashmap representation and insert null values for builtins used by the program that do not have additional data at the end of the execution. Reverted changes to cairo_pie_comparator.py.
@fmoletta I fixed the issue. The solution I chose was to use an intermediate transformation into a hashmap when creating the ZIP archive. Null values are added there as needed. Let me know if this works for you. |
This solution involves converting the HashMap representation into this new |
It is definitely redundant. The problem is that I need the Another possibility is to implement What do you think? Any suggestion? |
let felt = deserialize_scientific_notation(n); | ||
if felt.is_some() { | ||
return Ok(felt); | ||
if let Some(x) = felt { | ||
return Some(x); | ||
} | ||
|
||
Err(de::Error::custom(String::from( | ||
"felt_from_number parse error", | ||
))) | ||
None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is equivalent to just:
deserialize_scientific_notation(n)
Thank you for your work! But we decided to merge #1729 which solves the issue. |
Feature: deserialize CairoPie from ZIP archives
Description
Problem: the Python VM generates Cairo PIEs as ZIP archives containing several JSON files and the memory as a binary file. We do not have a solution yet to deserialize these files into CairoPie objects.
Solution: add a new
CairoPie::from_file(path)
method that reads the ZIP file and extracts its contents.Checklist