Feature: deserialize CairoPie from ZIP archives #1533

odesenfans · 2024-01-03T11:06:17Z

Feature: deserialize CairoPie from ZIP archives

Description

Problem: the Python VM generates Cairo PIEs as ZIP archives containing several JSON files and the memory as a binary file. We do not have a solution yet to deserialize these files into CairoPie objects.

Solution: add a new CairoPie::from_file(path) method that reads the ZIP file and extracts its contents.

Checklist

Linked to Github Issue
Unit tests added
Integration tests added.
This change requires new documentation.
- Documentation has been added/updated.
- CHANGELOG has been updated.

pefontana · 2024-01-03T20:33:25Z

Hi @odesenfans !
I don't get, why we need this
Why you should need to read the files generated by the Python vm?

odesenfans · 2024-01-03T22:06:49Z

Hi @pefontana ! We’re currently working on porting the Starknet bootloader on the Rust VM, and a part of our test suite is to run PIEs generated with the Python VM, including Starknet OS PIEs. Furthermore, there’s no other method to load PIEs from files afaik.

Oppen · 2024-01-04T14:38:07Z

If it's for the test suite, can't a test helper handle the unzipping and loading? I find specially odd the fact that we load the memory dump when the VM is supposed to be producing it. Besides that, zip files need to be handled with care in general due to the risk of zip bombs.

odesenfans · 2024-01-04T14:45:47Z

@Oppen :

The Python VM produces a ZIP file, so IMO the loader needs to take a ZIP file as input. Sure, it's a problem, but I guess we have to live with that design choice.
The bootloader needs the memory of the PIE when reexecuting it, I guess to detect differences between the initial execution and the one on top of the bootloader. That's just a guess though. What I'm sure of is that I need to relocate the PIE memory in the bootloader hints, so I need to load it from the ZIP file.

Oppen · 2024-01-05T19:03:18Z

The Python VM produces a ZIP file, so IMO the loader needs to take a ZIP file as input. Sure, it's a problem, but I guess we have to live with that design choice.

I'm not sure that's necessarily true. Requirements and design choices can change, specially in reimplementations.

2. The bootloader needs the memory of the PIE when reexecuting it, I guess to detect differences between the initial execution and the one on top of the bootloader. That's just a guess though. What I'm sure of is that I need to relocate the PIE memory in the bootloader hints, so I need to load it from the ZIP file.

I see. Still, I think the ZIP handling can be done downstream by the library user, as long as you have a mechanism to pass the memory to the bootloader hints.

Related, does this mean non-determinism must always produce deterministic values? Otherwise, re-executing with a different VM may lead to unfair rejection if it looks for differences. And if we go by assuming you use the same VM, then what Python does here does not matter. What am I missing?

On a different issue: please don't mix code relocation with a feature PR. It makes it harder to review. Specifically I mean the code moved from deserialize_program.rs to deserialize_utils.rs.

odesenfans · 2024-01-08T21:32:18Z

Thanks for the comment.

Regarding the ZIP file, my point is mostly "There are Cairo PIEs generated as ZIP files in the Cairo ecosystem so the CairoPie struct should have a compatible loader". We can document that it should only be used with sanitized inputs, hide it behind a feature flag (ex: python-vm-compat), but I think it makes sense to have it here.

For the rest, indeed our use case is a temporary one (we will use PIEs generated with the Rust VM eventually), but "temporary" is always such a vague term that I believe there is merit in stabilizing this. The bootloader is a deterministic program, we're not yet there but I don't see any reason why it should generate a different output on different VMs as long as the hints have the same side effects. We're still working on that.

As for moving the code, I'll move the code relocation to a different branch then.

Oppen · 2024-01-08T22:26:28Z

Regarding the ZIP file, my point is mostly "There are Cairo PIEs generated as ZIP files in the Cairo ecosystem so the CairoPie struct should have a compatible loader". We can document that it should only be used with sanitized inputs, hide it behind a feature flag (ex: python-vm-compat), but I think it makes sense to have it here.

I guess it may make sense. Let's get @pefontana and @lferrigno in the loop to see if we reach some agreement.

For the rest, indeed our use case is a temporary one (we will use PIEs generated with the Rust VM eventually), but "temporary" is always such a vague term that I believe there is merit in stabilizing this. The bootloader is a deterministic program, we're not yet there but I don't see any reason why it should generate a different output on different VMs as long as the hints have the same side effects. We're still working on that.

But the result of running the bootloader is the result of running a block of transactions that may or may not be deterministic, right? I mean, nondeterminism is a feature in Cairo, so is it still correct to assume two executions of possibly different VMs will always produce the same traces? Maybe we can consider it best effort and put some kind of disclaimer about it?

As for moving the code, I'll move the code relocation to a different branch then.

Thanks!

odesenfans · 2024-01-08T23:11:50Z

But the result of running the bootloader is the result of running a block of transactions that may or may not be deterministic, right? I mean, nondeterminism is a feature in Cairo, so is it still correct to assume two executions of possibly different VMs will always produce the same traces? Maybe we can consider it best effort and put some kind of disclaimer about it?

Yes. I need to ask more details about this, but anyway being able to check deterministic programs is already pretty good.

odesenfans · 2024-01-09T22:37:19Z

I moved the code back and fixed rebase conflicts. Waiting for additional reviews then :)

pefontana · 2024-01-15T22:46:26Z

Hi @odesenfans !
Thanks for the PR, sorry for the late response, some people of the team were at holidays
I think it is okay to merged it, just:

Can you add a little documentation in the CairoPie::from_file(ZIP) method, documenting that it is used for bootloader execution
Also, some CI tests are failing, can you fix them? If you have any trouble we can help you with this

odesenfans · 2024-01-18T10:50:12Z

Fixed it, the CI passes on our fork. I did have to add quite a few cfg(feature = "std") though (for wasm/no-std builds), and I'm wondering whether adding a distinct feature flag for this would make sense. Let me know.

codecov · 2024-01-18T23:23:31Z

Codecov Report

Attention: Patch coverage is 95.86957% with 19 lines in your changes are missing coverage. Please review.

Project coverage is 96.60%. Comparing base (c692b75) to head (c10101b).
Report is 1 commits behind head on main.

❗ Current head c10101b differs from pull request most recent head aa646b2. Consider uploading reports for the commit aa646b2 to get more accurate results

Files	Patch %	Lines
vm/src/vm/runners/cairo_pie.rs	95.58%	18 Missing ⚠️
vm/src/serde/deserialize_program.rs	97.72%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1533      +/-   ##
==========================================
+ Coverage   96.56%   96.60%   +0.04%     
==========================================
  Files          96       95       -1     
  Lines       38494    38798     +304     
==========================================
+ Hits        37170    37482     +312     
+ Misses       1324     1316       -8

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

odesenfans · 2024-01-23T16:09:02Z

Rebased on top of main + added a from_bytes() method, would appreciate it if someone could approve the CI run.

pefontana · 2024-01-24T21:38:44Z

Great @odesenfans !
Can you add a line to the Changelog.md describing the feature added?
Like here
https://github.com/lambdaclass/cairo-vm/pull/1559/files#diff-06572a96a58dc510037d5efa622f9bec8519bc1beab13c9f251e97e657a9d4ed

And I think we need another rebase with main and we will be good to go!

fmoletta

In the method from_zip_archive the felt_bytes value is built from the parsed metadata's prime. This value is then used by read_memory_file to read the value off of each memory cell, which is then parsed by the method maybe_relocatable_from_le_bytes, which asumes that the byte slice has at least 8 elements (or else line 114 would fail). If the parsed prime is not long enough then this will panic. For example, if we replace the prime in the metadata file of the fibonacci example with 7 the method CairoPie::from_file will panic with 'range end index 8 out of range for slice of length 1’’.

Alternatively, we can use a constant for the address size instead of checking the byte length of the prime and instead return an error if the parsed prime is not CAIRO_PRIME. This would share the same behaviour as the program json file parsing.

fmoletta

This should be tested with files that use builtins to check that the parsing of BuitlinAdditionalData doesn't fail. I tried to deserialize the cairo pie obtained from running cairo_programs/common_signature.cairo (which uses the signature builtin) using CairoPie::from_file and was met with the following error:

Parse(Error("data did not match any variant of untagged enum BuiltinAdditionalData", line: 1, column: 182))

odesenfans · 2024-04-04T15:26:21Z

@juanbono I just pushed a fix 🤞

Problem: the Python VM generates Cairo PIEs as ZIP archives containing several JSON files and the memory as a binary file. We do not have a solution yet to deserialize these files into CairoPie objects. Solution: add a new `CairoPie::from_file(path)` method that reads the ZIP file and extracts its contents.

Added a `from_bytes` class method to build a `CairoPie` method in addition to the existing `from_file` method.

Problem: Deserializing the PIE additional data as a hashmap of `BuiltinAdditionalData` enums because of an issue with deserializing untagged unions in `serde` (see serde-rs/json#1103). Solution: add a new `AdditionalData` struct with explicit fields for each builtin, circumventing the untagged union issue. This solution has the advantage of always associating the correct data type for each builtin (it's not possible anymore to associate a builtin with a different data type), but requires modifications if a new builtin is added.

Problem: the ECDSA/signature builtin additional data is stored internally as a hashmap, but the Python VM stores it as a vector of tuples. Solution: Add a `SignatureBuiltinAdditionalData` struct and implement a custom deserializer for it that can take either a hashmap or a vector.

Problem: the ECDSA data felts are serialized as numbers and not strings. Solution: call `deserialize_felt_from_number` in the implementation of the visitor for sequences.

+ additional tests for coverage + removed the deserialization from hashmap for the signature builtin which was broken anyway.

Problem: the new implementation of `CairoPie` using the `CairoPieAdditionalData` struct makes it hard to reproduce the exact same behaviour as `cairo-lang` when handling builtins with no data. While `cairo-lang` will generate a null value and include it in the JSON file, we can only (easily) generate a null value for each builtin or for none of them. Solution: make the comparator script more flexible by filtering out null values from JSON contents.

odesenfans · 2024-04-07T22:37:54Z

I pushed a fix for the latest CI failure, which was complaining about null values missing from additional_data.json in the PIEs generated with this PR. The problem is simple: this PR introduces the use of CairoPieAdditionalData for the additional_data field instead of a hashmap. This makes it harder to reproduce the behavior of cairo-lang that will output a null for each builtin used by the program and nothing for the others. Implementing the same behaviour in Rust would require tracking the builtins used by the program directly in CairoPieAdditionalData with a non-serialized field and output nulls accordingly.

I think this added complexity makes little sense so I modified the PIE comparison script to just ignore null values from the comparison. Let me know what you think.

fmoletta · 2024-04-08T14:57:22Z

I pushed a fix for the latest CI failure, which was complaining about null values missing from additional_data.json in the PIEs generated with this PR. The problem is simple: this PR introduces the use of CairoPieAdditionalData for the additional_data field instead of a hashmap. This makes it harder to reproduce the behavior of cairo-lang that will output a null for each builtin used by the program and nothing for the others. Implementing the same behaviour in Rust would require tracking the builtins used by the program directly in CairoPieAdditionalData with a non-serialized field and output nulls accordingly.

I think this added complexity makes little sense so I modified the PIE comparison script to just ignore null values from the comparison. Let me know what you think.

Outputting the same serialized cairo pies as cairo_lang is vital for the cairo-vm. I don't think a PR adding Deserialization should disturb this, is there another way to fix this?

odesenfans · 2024-04-08T15:04:50Z

Outputting the same serialized cairo pies as cairo_lang is vital for the cairo-vm. I don't think a PR adding Deserialization should disturb this, is there another way to fix this?

I think there is, although it's a bit more work. I just wanted to check with you if it was worth the extra effort for null values. I underline that the only difference is that null values are omitted from additional_data.json, while they are included in cairo-lang for all builtins used by the program. The rest is unchanged. If you think that these null values should still be there then I'll find a way to avoid changing that.

Added a compatibility method to `CairoPieAdditionalData` to use an intermediate hashmap representation and insert null values for builtins used by the program that do not have additional data at the end of the execution. Reverted changes to cairo_pie_comparator.py.

odesenfans · 2024-04-08T21:27:54Z

@fmoletta I fixed the issue. The solution I chose was to use an intermediate transformation into a hashmap when creating the ZIP archive. Null values are added there as needed. Let me know if this works for you.

fmoletta · 2024-04-09T14:52:05Z

This solution involves converting the HashMap representation into this new CairoPieAdditonalData struct and then back to a HashMap in order to serialize, this seems quite redundant, can remove the intermediary?

odesenfans · 2024-04-09T15:09:14Z

This solution involves converting the HashMap representation into this new CairoPieAdditonalData struct and then back to a HashMap in order to serialize, this seems quite redundant, can remove the intermediary?

It is definitely redundant. The problem is that I need the CairoPieAdditionalData struct when deserializing; deserialization was broken because deserialization as an untagged enum just would not work. One solution could be to keep the hashmap in CairoPie and use CairoPieAdditionalData only when deserializing. But IMO having a typed struct instead of a hashmap in CairoPie is better.

Another possibility is to implement FromIterator on CairoPieAdditionalData to avoid the first hashmap conversion.

What do you think? Any suggestion?

Oppen · 2024-04-15T23:22:03Z

vm/src/serde/deserialize_program.rs

            let felt = deserialize_scientific_notation(n);
-            if felt.is_some() {
-                return Ok(felt);
+            if let Some(x) = felt {
+                return Some(x);
            }

-            Err(de::Error::custom(String::from(
-                "felt_from_number parse error",
-            )))
+            None


This is equivalent to just:

deserialize_scientific_notation(n)

fmoletta · 2024-05-07T15:22:13Z

Thank you for your work! But we decided to merge #1729 which solves the issue.
Feel free to open an issue/PR if there is something about this solution that doesn't work for you!

odesenfans requested review from igaray, Oppen, fmoletta, entropidelic, juanbono and pefontana as code owners January 3, 2024 11:06

odesenfans force-pushed the ml/deserialize-cairo-pie branch from b696acb to f960d45 Compare January 3, 2024 11:21

odesenfans force-pushed the ml/deserialize-cairo-pie branch from b265e17 to b17ca2b Compare January 9, 2024 22:36

odesenfans force-pushed the ml/deserialize-cairo-pie branch 2 times, most recently from 461e1ba to 3a30789 Compare January 23, 2024 16:07

pefontana previously approved these changes Jan 24, 2024

View reviewed changes

odesenfans dismissed pefontana’s stale review via 0402e81 January 25, 2024 10:22

odesenfans force-pushed the ml/deserialize-cairo-pie branch from 0402e81 to 1db8a29 Compare January 25, 2024 10:23

pefontana previously approved these changes Jan 25, 2024

View reviewed changes

fmoletta reviewed Jan 25, 2024

View reviewed changes

fmoletta requested changes Jan 25, 2024

View reviewed changes

odesenfans force-pushed the ml/deserialize-cairo-pie branch from f8894e9 to 1c27203 Compare April 4, 2024 15:25

Olivier Desenfans and others added 17 commits April 8, 2024 00:34

Feature: deserialize Cairo PIE from bytes

041fce4

Added a `from_bytes` class method to build a `CairoPie` method in addition to the existing `from_file` method.

fix clippy + revert Cargo.toml

0200a4c

changelog

41038e9

improve test coverage

ace9ad8

fail deserialization if prime is not Cairo prime

1d99f4f

newline

93b9604

get wasm tests to compile

2c7f8c2

rename AdditionalData -> CairoPieAdditionalData

b1340c5

relax constraints on feature 'std'

b4fe922

revert Cargo.lock

58176c2

Fix: deserialize signature builtin additional data

30926c4

Problem: the ECDSA data felts are serialized as numbers and not strings. Solution: call `deserialize_felt_from_number` in the implementation of the visitor for sequences.

Fix: serialize ECDSA builtin data properly for Cairo PIE

4422b24

+ additional tests for coverage + removed the deserialization from hashmap for the signature builtin which was broken anyway.

Fix: revert Cargo.lock

fdb955b

odesenfans force-pushed the ml/deserialize-cairo-pie branch from c10101b to 7773780 Compare April 7, 2024 22:34

Oppen reviewed Apr 15, 2024

View reviewed changes

fmoletta closed this May 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: deserialize CairoPie from ZIP archives #1533

Feature: deserialize CairoPie from ZIP archives #1533

odesenfans commented Jan 3, 2024

pefontana commented Jan 3, 2024

odesenfans commented Jan 3, 2024

Oppen commented Jan 4, 2024

odesenfans commented Jan 4, 2024

Oppen commented Jan 5, 2024

odesenfans commented Jan 8, 2024

Oppen commented Jan 8, 2024

odesenfans commented Jan 8, 2024

odesenfans commented Jan 9, 2024

pefontana commented Jan 15, 2024

odesenfans commented Jan 18, 2024

codecov bot commented Jan 18, 2024 •

edited

Loading

odesenfans commented Jan 23, 2024

pefontana commented Jan 24, 2024

fmoletta left a comment

fmoletta left a comment

odesenfans commented Apr 4, 2024

odesenfans commented Apr 7, 2024

fmoletta commented Apr 8, 2024

odesenfans commented Apr 8, 2024

odesenfans commented Apr 8, 2024

fmoletta commented Apr 9, 2024

odesenfans commented Apr 9, 2024

Oppen Apr 15, 2024

fmoletta commented May 7, 2024 •

edited

Loading

Feature: deserialize CairoPie from ZIP archives #1533

Feature: deserialize CairoPie from ZIP archives #1533

Conversation

odesenfans commented Jan 3, 2024

Feature: deserialize CairoPie from ZIP archives

Description

Checklist

pefontana commented Jan 3, 2024

odesenfans commented Jan 3, 2024

Oppen commented Jan 4, 2024

odesenfans commented Jan 4, 2024

Oppen commented Jan 5, 2024

odesenfans commented Jan 8, 2024

Oppen commented Jan 8, 2024

odesenfans commented Jan 8, 2024

odesenfans commented Jan 9, 2024

pefontana commented Jan 15, 2024

odesenfans commented Jan 18, 2024

codecov bot commented Jan 18, 2024 • edited Loading

Codecov Report

odesenfans commented Jan 23, 2024

pefontana commented Jan 24, 2024

fmoletta left a comment

Choose a reason for hiding this comment

fmoletta left a comment

Choose a reason for hiding this comment

odesenfans commented Apr 4, 2024

odesenfans commented Apr 7, 2024

fmoletta commented Apr 8, 2024

odesenfans commented Apr 8, 2024

odesenfans commented Apr 8, 2024

fmoletta commented Apr 9, 2024

odesenfans commented Apr 9, 2024

Oppen Apr 15, 2024

Choose a reason for hiding this comment

fmoletta commented May 7, 2024 • edited Loading

codecov bot commented Jan 18, 2024 •

edited

Loading

fmoletta commented May 7, 2024 •

edited

Loading