-
Notifications
You must be signed in to change notification settings - Fork 59
feat(hash_builder): add full-branch-updates feature for storing all branch nodes
#124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(hash_builder): add full-branch-updates feature for storing all branch nodes
#124
Conversation
CodSpeed Performance ReportMerging this PR will degrade performance by 12.58%Comparing Summary
Performance Changes
|
1ff2437 to
37cb96c
Compare
37cb96c to
8de5f1e
Compare
7f64ea9 to
3fc8c1e
Compare
2e91a3d to
153c561
Compare
Previously, branch nodes were only stored in `updated_branch_nodes` if they had children that were themselves branches (non-empty tree_masks/hash_masks). This caused branch nodes with only leaf children to be silently omitted from updates, breaking incremental trie sync for: - Small contracts with few storage slots - Sparse trie regions where all children are leaves - Fresh tries built without existing database data The fix adds a third condition: store the branch if its parent references it as a hashed child (which we just set in the parent's hash_mask). This ensures all branch nodes that would be looked up during trie traversal are present. Changes: - Root node (len == 0) is always stored - Non-root nodes are stored if they have branch children OR if the parent references this child in its hash_mask - Updated test expectations for tree_mask (now correctly marks stored children) - Added regression tests for the bug scenarios - Un-ignored tests that now pass with the fix Closes: #25 Co-authored-by: Amp <[email protected]> Amp-Thread-ID: https://ampcode.com/threads/T-019bf64d-c82f-7682-9933-a37c563468a6
153c561 to
6e6270a
Compare
|
cc @Rjected + @mediocregopher + @shekhirin for visibility PR builds on test suite expansion in #123 |
|
@zerosnacks not storing the root node or branch nodes with only leaf children are deliberate design decions; storing a branch on disk requires write IO, and uses more space, so there's a tradeoff in terms of how much read IO it saves us later. If you can show benchmarks which challenge this design then we could be open to this change, but as it is I would say this behavior is intentional. EDIT: I guess the above applies to reth specifically, which uses the HashBuilder heavily. If other projects want to use the HashBuilder differently we can also talk about making this behavior optional. |
complete-updates feature for storing all branch nodes
f5917ba to
c496a04
Compare
Makes the behavior of storing branch nodes with only leaf children opt-in via the `complete-updates` feature flag. This addresses feedback that the change would introduce unnecessary write I/O overhead for reth. Without this feature (default): branch nodes with only leaf children are omitted from `updated_branch_nodes` as an optimization to reduce disk writes. With this feature: all branch nodes referenced by hash are stored, which is required for complete incremental trie sync where consumers need the full set of nodes to reconstruct the trie. The feature uses compile-time `#[cfg]` checks, so there is zero runtime overhead for the default path. Co-authored-by: Amp <[email protected]> Amp-Thread-ID: https://ampcode.com/threads/T-019bfb37-3ca2-7078-8ab0-776518763f83
c973d95 to
16b7b5d
Compare
complete-updates feature for storing all branch nodesfull-branch-updates feature for storing all branch nodes
5371f61 to
3e40407
Compare
…odes Makes the behavior of storing branch nodes with only leaf children opt-in via the `full-branch-updates` feature flag. This addresses feedback that the change would introduce unnecessary write I/O overhead for reth. Without this feature (default): branch nodes with only leaf children are omitted from `updated_branch_nodes` as an optimization to reduce disk writes. With this feature: all branch nodes referenced by hash are stored, which is required for complete incremental trie sync where consumers need the full set of nodes to reconstruct the trie. The feature uses compile-time `#[cfg]` checks, so there is zero runtime overhead for the default path. Co-authored-by: Amp <[email protected]> Amp-Thread-ID: https://ampcode.com/threads/T-019bfb37-3ca2-7078-8ab0-776518763f83
3e40407 to
c5ce39f
Compare
|
@mediocregopher I've updated the PR to add a new opt-in feature flag The CodSpeed regression appears to be flakiness on their side, I am unable to reproduce it locally. |
| let store_in_db_trie = store_in_db_trie || len == 0 || { | ||
| let parent_index = len - 1; | ||
| let flag = TrieMask::from_nibble(current.get_unchecked(parent_index)); | ||
| (self.hash_masks[parent_index] & flag) != TrieMask::default() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zerosnacks just to make sure you're aware, this is not handling cases of branch nodes which are children of extension nodes. We never store the extension's hash in its parent branch, and therefore this wouldn't set store_in_db_trie for the child branch (also because I think parent_index isn't handling the extension case either)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, I'll close this PR for now as I lack sufficient context and this is sensitive code
Context
Builds on top of #123 to address issue reported in #25.
Summary
Adds an opt-in
full-branch-updatesfeature flag that stores all branch nodes referenced by hash inupdated_branch_nodes, including those with only leaf children. This is required for complete incremental trie sync.Supersedes #25.
The Problem
The current
store_branch_nodeonly stores a branch if it has branch children:This means branch nodes whose children are all leaves are omitted from updates. This is intentional as a write I/O optimization for reth, but breaks incremental sync for other consumers.
Affected scenarios:
The Solution
Add a
full-branch-updatesfeature flag that enables a third condition: store the branch if its parent references it as a hashed child.Key design decisions:
#[cfg]checks - zero runtime overhead for the default pathfeatures = ["full-branch-updates"]Tests
Tests requiring the new behavior are gated behind the
full-branch-updatesfeature:test_updates_root- Verifies root branch is stored for simple two-leaf trietest_trie_updates_branch_nodes- Verifies expected number of branch updatestest_small_storage_trie_updates- Tests the exact failing scenario from fix root updates bug in hash_builder #25test_deep_divergence_empty_values- Edge case with empty leaf valuestest_mask_propagation_across_depths- Multi-depth mask propagationtest_mask_isolation_successive_subtrees- Verifies mask isolation across subtreesarbitrary_branch_updates_complete- Proptest verifying completeness (assertion only with feature)Impact
updated_branch_nodes)