-
Notifications
You must be signed in to change notification settings - Fork 388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add eth_getRequiredBlockState
method
#455
base: main
Are you sure you want to change the base?
Conversation
5262430
to
72869ce
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds really cool, and it would be awesome to run traces at arbitrary history without running a local archive node!
However, there is no guarantee to be able to compute a new block state root for this post-execution | ||
state. For example, with the aim to check against the state root in the block header of that block | ||
and thereby audit the state changes that were applied. | ||
|
||
This is because the state changes may involve an arbitrary number of state deletions. State | ||
deletions may change the structure of the merkle trie in a way that requires knowledge of | ||
internal nodes that are not present in the proofs obtained by `eth_getProof` JSON-RPC method. | ||
Hence, while the complete post-block trie can sometimes be created, it is not guaranteed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, I was worried about this.
Being able to consistently verify the post-state hash doesn't really feel optional to me. I can't trust the result of a local block execution that doesn't also prove that it got the same result as the network did. In the context of the Portal Network, nodes must validate data and only gossip data that we can prove locally.
EVM execution engines should be able to handle the case of missing trie nodes at the refactor, and returning the information needed to collect the data. They must handle the case, if they want to run against a partial trie database, like if they're running Beam Sync.
I am only familiar enough with py-evm to give the example:
https://github.com/ethereum/py-evm/blob/d751dc8c9c8199a16043a483b19c9f4d7a592202/eth/db/account.py#L605-L661
The MissingTrieNode
exceptions are the relevant moments when the EVM realizes that it's missing some intermediate nodes that are required to prove the final state root, if you're only running with the state proof as defined in the current spec.
As a prototype, I suppose you could literally run py-evm with the unverified proofs, then retrieve the missing trie nodes over devp2p one by one (there usually aren't too many, from what I've seen).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.. then retrieve the missing trie nodes over devp2p one by one
For the missing trie node(s), I don't see a clear mechanism to obtain that node. For removed nodes that require knowledge of a sibling node, how could that sibling node be obtained? As it is mid-traversal, the terminal keys of the affected sibling are not trivially known. So eth_getProof(block_number, key_involving_sibling_node)
cannot be used. A new method get_trie_node_at_block_height(block_number, trie_node_hash)
could be written for a node, but this is nontrivial. More context here:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.. nodes must validate data and only gossip data that we can prove locally.
This is preserved (gossiped data is validated). The gossiped data is the merkle proofs of the state, so this is validated, and the block is also validated.
So the source of error here is a bug in the EVM implementation.
flowchart TD
Block[Block, secured by header: Program to run] -- gossiped --> Pre
State[State, secured by merkle proofs: Input to program] -- gossiped --> Pre
Pre[Program with inputs] -- Load into EVM environment --> EVM[EVM executes]
EVM -- bug here --> Post[Post block state]
Post -.-> PostRoot[Post-block state root useful for these bugs]
EVM -- bug here --> Trace[debug_traceBlock output]
I agree EVM bugs are possible and having the post-state root would be nice.
However:
- EVM impls are often shared across different client types, reducing bug risk.
- Even having the post-block state does not guarantee that
debug_traceBlock
output is correct. It contains many
details that are not covered by a post-block state root. So one still needs to check that the EVM doesn't have bugs. - Test suites can be run to compare debug_TraceBlock against the same result from an archive node (or different EVM implementation hooked into the portal network). This protects against EVM errors that result in bad state, and errors that result in bad traces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, I was worried about this.
Being able to consistently verify the post-state hash doesn't really feel optional to me. I can't trust the result of a local block execution that doesn't also prove that it got the same result as the network did. In the context of the Portal Network, nodes must validate data and only gossip data that we can prove locally.
EVM execution engines should be able to handle the case of missing trie nodes at the refactor, and returning the information needed to collect the data. They must handle the case, if they want to run against a partial trie database, like if they're running Beam Sync.
I am only familiar enough with py-evm to give the example: https://github.com/ethereum/py-evm/blob/d751dc8c9c8199a16043a483b19c9f4d7a592202/eth/db/account.py#L605-L661
The
MissingTrieNode
exceptions are the relevant moments when the EVM realizes that it's missing some intermediate nodes that are required to prove the final state root, if you're only running with the state proof as defined in the current spec.As a prototype, I suppose you could literally run py-evm with the unverified proofs, then retrieve the missing trie nodes over devp2p one by one (there usually aren't too many, from what I've seen).
Ive proposed an idea here and believe our proposal on ZK proofs of the last state can indeed can help address the challenge mentioned. ZK proofs, specifically Zero-Knowledge Succinct Non-Interactive Argument of Knowledge (ZK-SNARKs), have the potential to provide proofs of complex computations and statements: sogolmalek/EIP-x#6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... our proposal on ZK proofs of the last state can indeed can help address the challenge mentioned
Let me see if I understand your proposal. The data that a peer receives could consist of the:
- (existing) The block pre-state (state required for the block, secured by merkle root in prior block and a header accumulator)
- (existing) The block (to allow the user to replay the block for any purpose)
- (new) A proof of block execution (ZK-EVM) consisting of a ZK-SNARK proof for the set of block post-state values (values that were accessed by the block).
The user re-executes the block and arrives at post-block state values. Those are compared to the values in the ZK proof. If they are the same, the chance that the EVM and the ZK-EVM both having a bug is unlikely and the state transition is likely sound.
So the ZK proof is equivalent to replaying the block using a different EVM implementation. That is, the same as getting the post-block state from a Rust implementation and comparing to the state produced by a Go implementation.
In this respect, the presence of the ZK proof does not seem to introduce additional safety. Perhaps I overlooked a component?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your detailed arguments. I'm just trying to make sure i understand points well and hope i didnt go far wrong with my following arguments. love to learn more here.
Our approach centers around the principle of minimizing data reliance while ensuring the accuracy and reliability of state transitions.While your understanding of our proposal is mostly accurate, there are a few points that could be considered when evaluating the introduced ZK proofs:
The process of re-executing the block on a different EVM implementation (Rust vs. Go, as you mentioned) can be cumbersome and resource-intensive.
another point is Attack Vectors and Determinism: While replaying the block using different EVM implementations is conceptually similar, it introduces the potential for inconsistencies between different implementations due to subtle differences in execution logic or determinism. furthermore, I believe ZK can ensure way more Efficiency and Scaling. Re-executing the block using different EVM implementations might be feasible on a small scale, but it becomes much more challenging as the network scales and the number of transactions increases. ZK proofs, on the other hand, can be generated and verified more efficiently, making them more suitable for large-scale systems.
As you've mentioned, the problem with key deletions is that sometimes sibling nodes in the trie are required to reconstruct the trie structure for verification purposes. These sibling nodes might not be available through the eth_getProof data, leading to incomplete trie reconstructions.
I think, a ZK proof of the last state involves generating a cryptographic proof that certifies the correctness of the entire state transition process, including deletions and modifications, can help. This proof can be designed to include information about sibling nodes that are required for verification. In essence, the ZK proof encapsulates the entire state transition, and by design, it must account for all the necessary data, including sibling nodes, to be valid.
Also By obtaining and validating a ZK proof of the last state, we can ensure that all required data for verifying the state transition, including missing sibling nodes, is included in the proof. This provides a complete and comprehensive validation mechanism that mitigates the challenge posed by missing nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some additional context is that eth_getRequiredBlockState
is designed to provide the minimum information required to re-execute an old block. The goal is to be able to download individual blocks and re-execute them in order to inspect every EVM step in that block. That is, the goal is to run the EVM (to gain historical insight).
With a ZK EVM, one can demonstrate to a peer that the block post-state is valid with respect to a block. This means that they do not have to re-execute the EVM themselves. This is beneficial for a light client that wants to trustlessly keep up to date with the latest Ethereum state. That is, the goal is to not run the EVM (to save resources).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For any key that has an inclusion proof in
n - 1
, and an exclusion proof in blockn
, retain the proof for that key.
4. Store these additional exclusion proofs in theRequiredBlockState
data structure.
I think the idea is a good direction, but after discussing with @gballet I have the feeling sending the exclusion proofs themselves are not enough for the user to perform deletion locally. Because the sibling of the node being deleted could be a subtrie. So the server should use this approach (or another) to give the full prestate that is required for execution of the block, this means figuring out which extra nodes are needed to be returned for such a deletion.
Geth has the prestateTracer which I noticed suffers from the same issue after reading up on this ticket. I'd be keen to fix it. Nevermind, the prestateTracer doesn't actually return intermediary nodes, only accounts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At present this spec contains sufficient data to execute the block, but not update the proofs to get the post-block root.
For context, the main issue is that sometimes the trie depth is changed, which impacts cousin-nodes. More on this here: https://github.com/perama-v/archors/blob/main/crates/multiproof/README.md
To enable that, I have been prototyping a mechanism that allows this using only available JSON-RPC methods (eth_getProof) without hacking an execution client. This method calls get_proof on the subsequent block gets these edge-case nodes and tacks them into RequiredBlockState as "oracle data". The data allows the trie update to complete and the post block root compute, verifying that the oracle data was in fact correct.
I learned that @s1na has taken a different approach, which is to modify Geth to record all trie nodes touched during a block execution (including post-root computation). That method is sufficient to get trie nodes required to compute the post-block root (but requires modifying the execution client). It is a nice approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Essentially the oracle approach says: Look, some changes to the surrounding trie will happen, and they may be complex/extensive. One can "look in to the future" at this exact point in the traversal and just get the nodes for the affected part (leafy-end) of the traversal. The details of the surrounding trie don't need to be computed and aren't important because we can check that what we got from the future was in fact correct.
Bit of a hacky approach. I can see that this might be difficult to implement if not using eth_getProof.
After discussion with @s1na I have moved the trie nodes from all account and storage proofs to a single bag-of-nodes. This is sufficient for navigation (root hash plus path). This makes it easier for implementers because the proof structure does not need to be known to create the data structure. |
|
||
A block hash accessed by the "BLOCKHASH" opcode. | ||
```python | ||
class RecentBlockHash(Container): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be made verifiable by including the whole header here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you expand on what you mean here?
The assumption here is that a user has a mechanism to verify the canonicality of blockhashes.
More notes here: https://github.com/perama-v/archors#blockhash-opcode
One design goal is to keep RequiredBlockState as small as possible. This is because a static collection of all RequiredBlockState's would allow for a distributed archive node. The current estimate puts the size at 30TB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So my point addresses the following fault case you describe in your notes:
A block hash is wrong: the portal node can audit all block hashes against its master accumulator prior to tracing.
If we want to solve it at the RPC level: instead of sending only block number and block hash, send the whole block header. Then the verifier can hash the header (which contains the number) and be sure about the hash.
I understand the size concern. However if the clients have to audit an external source anyway it might be worth it. Otherwise if in most use-cases the client has easy access to latest 256 blocks then we can leave out this info from eth_getRequiredBlockState
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A "wrong" hash refers to a hash that is not canonical. So, the wrong hash could be a correctly computed hash of a noncanonical block. E.g. a fabricated block or an uncle block. Having the whole block included doesn't get one closer to verifying canonicality.
Description
Introduces a method
eth_getRequiredBlockState
that returns all state required to execute a single historical block.Changes made
eth_getRequiredBlockState
method that returns aRequiredBlockState
data type as bytes.RequiredBlocksState
- Includes sections for motivation, ssz data format, algorithms and security.
Overview
An overview is provided below. Please see the more detailed sections in the PR.
Re-execution of historical transactions is important for accounting. A block is a program that was applied to the Ethereum
state at a particular moment in time. In retrospect, one can see which states were involved in that program.
These states can be inventoried, and accompanied by a merkle proof for that moment in time. The aggregation of all
the states that the block required, results in data that is both necessary and sufficient to re-execute that block.
This data is termed
RequiredBlockState
, and a specification is included in this PR.Motivation
An archive node can be trustlessly distributed in a peer to peer network. Nodes that choose to implement
eth_getRequiredBlockState
provide a mechanism to "export a distributable archive node".A side benefit is the potential for node providers to save bandwitdh costs to serve
debug_traceTransaction
functionality to users (2-4 orders of magnitude). The provider can serveeth_getRequiredBlockState
, and the user re-executes the block locally. This may also help bootstrap a distributed content delivery network.Data format
SSZ encoded structure with snappy compression, as is seen in the Ethereum consensus specification.
The data format has been tested and is approximately 167 KB/Mgas. If state for every historical block was created in this format, the total size is estimated to be ~30TB. Individuals would store a subset of this quantity. See more analysis here: https://github.com/perama-v/archors.
Algorithms
Descriptions for the creation and use of the data are included.
Test cases
The specification has been implemented as a library and CLI application that can call a node and construct
RequiredBlockState
for blocks. This was used to generate the test cases in this PR.Test case generator: https://github.com/perama-v/archors/tree/main/bin/stator.
The archors library also contains examples showing the re-execution of a historical block using revm, the block and the
RequiredBlockState
.Security - trustlessness
The cornerstone of this method is the assumptino that users (recipients of
RequiredBlockState
) can verify the canonicality of blockhashes. This may be achieved in two waysRequiredBlockState
enables a non-archive node to selectively be an archive node for arbitrary specific blocks.Security - node
An execution client is not required to implement
eth_getRequiredBlockState
to participate in the Ethereum protocol.A node that does implement the method may choose to support the method for a subset of blocks (e.g., a non-archive node may support the method for the same range of blocks that it supports
debug_traceBlock
for).An archive node that implements the method and supports all blocks must have access to the merkle trie at arbitrary heights. This is equivalent to supporting
eth_getProof
at arbitrary heights.