Skip to content

Conversation

@pthmas
Copy link
Contributor

@pthmas pthmas commented Jan 16, 2026

Overview

  • Adds a height-based pruning mechanism to the ev-node store:
    • Prunes /h/{height}, /d/{height}, /c/{height}, /i/{hash} up to a target height.
    • Also prunes per-height DA metadata (height → DA height mapping) for pruned heights.
    • Leaves /s/{height} (state), /t (current height), and global metadata keys untouched.
    • Tracks progress via a last-pruned-block-height metadata key so pruning is monotonic and idempotent.
  • Integrates pruning into block production:
    • After a block is committed, at configurable intervals, the executor computes a prune target and calls store.PruneBlocks synchronously.
  • Wires pruning into the go-header store:
    • Sets the go-header pruning window to PruningKeepRecent * BlockTime so header/data retention matches ev-node’s window.
  • Adds pruning for EVM execution metadata:
    • EVMStore exposes PruneExecMeta to delete per-height ExecMeta entries up to a target height.
    • The EVM executor implements an optional ExecMetaPruner interface, and the block executor calls it from the same pruning hook.
    • Non-EVM (e.g. ABCI) executors don’t implement this interface, so they are unaffected.

Config

New node config fields / flags:

  • --evnode.node.pruning_enabled
  • --evnode.node.pruning_keep_recent
  • --evnode.node.pruning_interval

Pruning actually runs only when all three are set to non-trivial values (enabled, keep_recent > 0, interval > 0).

Design trade-offs

  • Runs in the critical path:
    Pruning is called synchronously in produceBlock. On pruning heights, block production will incur extra latency proportional to the amount of data pruned and underlying storage performance.

  • Best-effort pruning:
    If pruning fails, we log the error but do not fail the block. This avoids halting the node but means disk usage can grow silently if pruning keeps failing; operators will need to watch logs/metrics.

  • Scope:
    This PR prunes ev-node’s own store, the go-header store, and EVM ExecMeta. Pruning of ABCI state (ev-abci) is out of scope and will be added in a follow-up once ev-abci exposes its own pruning API.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 16, 2026

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped✅ passed✅ passedJan 28, 2026, 3:42 PM

@claude
Copy link
Contributor

claude bot commented Jan 16, 2026

Claude finished @pthmas's task in 2m 20s —— View job


Comprehensive Code Review

I've completed a thorough analysis of the block pruning feature across all focus areas. Here's my detailed review:


✅ Strengths

Architecture & Design

  • Clean separation of concerns: Pruning is well-separated from rollback logic, with clear interfaces (Pruner in pkg/store/types.go:85-94)
  • Idempotent design: The last-pruned-block-height tracking ensures pruning can be safely retried
  • Modular pruning: EVM execution metadata pruning via optional ExecMetaPruner interface keeps non-EVM executors unaffected
  • Good test coverage: Both unit tests and edge case coverage for store pruning and EVM metadata pruning

Configuration

  • Sensible defaults: Pruning disabled by default (archive mode) in pkg/config/defaults.go:72-74
  • Clear validation logic: Config validation prevents contradictory settings (pkg/config/config.go:389-401)

🔴 Critical Issues

1. Blocking Synchronous Pruning in Critical Path ⚠️ HIGH PRIORITY

Location: block/internal/executing/executor.go:553-575

The pruning runs synchronously during block production, which can cause significant latency spikes:

if e.config.Node.PruningEnabled && ... {
    if newHeight%e.config.Node.PruningInterval == 0 {
        if err := e.store.PruneBlocks(e.ctx, targetHeight); err != nil {
            e.logger.Error().Err(err).Msg("failed to prune old block data")
        }
        // EVM metadata pruning also blocks here
    }
}

Problems:

  • Blocks can contain thousands of entries to prune, causing multi-second pauses
  • Block production latency becomes unpredictable on pruning heights
  • Under heavy load, this could cause consensus timeouts or missed block slots
  • The PR description acknowledges this as a "trade-off" but doesn't justify why asynchronous pruning wasn't chosen

Recommendation: Run pruning asynchronously in a background goroutine with proper synchronization:

go func() {
    if err := e.store.PruneBlocks(ctx, targetHeight); err != nil {
        e.logger.Error().Err(err).Msg("failed to prune old block data")
    }
}()

Fix this issue →


2. Silent Failure Mode ⚠️ HIGH PRIORITY

Location: block/internal/executing/executor.go:560 and 570

Pruning errors are only logged, not surfaced to operators:

if err := e.store.PruneBlocks(e.ctx, targetHeight); err != nil {
    e.logger.Error().Err(err).Uint64("target_height", targetHeight).Msg("failed to prune old block data")
    // Continues without any alert or metric
}

Problems:

  • Operators won't know pruning is failing until disk fills up
  • No metrics exposed for pruning success/failure rates
  • No alerting mechanism for persistent failures
  • "Best-effort" approach mentioned in PR description is insufficient for production

Recommendations:

  1. Add Prometheus metrics for pruning operations:
    • evnode_pruning_runs_total{status="success|failure"}
    • evnode_pruning_duration_seconds
    • evnode_pruning_last_success_timestamp
    • evnode_pruning_target_height
  2. Consider failing block production after N consecutive pruning failures
  3. Add a health check endpoint that exposes pruning status

Add pruning metrics →


3. Missing Concurrency Control

Location: pkg/store/store.go:296-381

The PruneBlocks method has no mutex protection despite being called during active block production:

func (s *DefaultStore) PruneBlocks(ctx context.Context, height uint64) error {
    // No mutex lock here - concurrent reads/writes possible
    batch, err := s.db.Batch(ctx)
    // ...
}

Problems:

  • Concurrent GetBlockData() calls during pruning could return inconsistent results
  • Race conditions possible if pruning runs while syncing is active
  • No documentation of thread-safety guarantees

Recommendation: Add mutex protection or document that pruning must be externally synchronized.

Fix concurrency →


4. Inconsistent Height-Based Endianness ⚠️ MEDIUM PRIORITY

Location: pkg/store/store.go:385-388 vs execution/evm/store.go:103-105

Two different byte encodings are used for height serialization:

// pkg/store/store.go uses LittleEndian
func encodeHeight(height uint64) []byte {
    heightBytes := make([]byte, heightLength)
    binary.LittleEndian.PutUint64(heightBytes, height)
    return heightBytes
}

// execution/evm/store.go uses BigEndian
func execMetaKey(height uint64) ds.Key {
    heightBytes := make([]byte, 8)
    binary.BigEndian.PutUint64(heightBytes, height)
    return ds.NewKey(evmStorePrefix + "execmeta/" + string(heightBytes))
}

Problems:

  • Inconsistent patterns make the codebase harder to maintain
  • Could cause bugs if code is copied between modules
  • No clear rationale for different encodings

Recommendation: Standardize on one encoding (recommend LittleEndian for consistency with existing store code).

Standardize encoding →


🟡 High-Priority Improvements

5. Missing go-header Store Pruning Configuration

Location: pkg/sync/sync_service.go

The PR description claims "Wires pruning into the go-header store" but I don't see where goheaderstore.Store is configured with pruning parameters. The go-header library supports pruning via StoreParams.StoreCacheSize and StoreParams.IndexCacheSize, but these aren't set based on the new pruning config.

Recommendation: Add explicit go-header pruning configuration:

storeParams := goheaderstore.DefaultParameters()
if conf.Node.PruningEnabled {
    // Convert ev-node's time-based window to go-header's count-based window
    pruningWindow := conf.Node.PruningKeepRecent * uint64(conf.Node.BlockTime.Duration)
    storeParams.PruningInterval = time.Duration(conf.Node.PruningInterval) * conf.Node.BlockTime.Duration
    storeParams.PruningWindow = time.Duration(pruningWindow)
}

Configure go-header pruning →


6. Incomplete Metadata Pruning

Location: pkg/store/store.go:352-361

Only DA-related metadata is pruned. Other metadata keys may also need pruning:

// Currently pruned:
GetHeightToDAHeightHeaderKey(h)
GetHeightToDAHeightDataKey(h)

// Potentially unpruned:
- LastBatchDataKey (?)
- Other per-height metadata keys not listed in keys.go

Recommendation: Audit all metadata keys and document which should be pruned vs. retained.


7. Configuration Comment Inconsistency

Location: pkg/config/config.go:269

The comment says "Must be > 0 when pruning is enabled" but the validation at line 399 checks == 0:

// Line 269:
PruningKeepRecent uint64 `... comment:"... Must be > 0 when pruning is enabled..."`

// Line 399:
if c.Node.PruningKeepRecent == 0 {
    return fmt.Errorf("pruning_keep_recent must be > 0 when pruning is enabled")
}

This is correct but the comment could be clearer: "Number of most recent blocks to retain when pruning is enabled. Must be > 0 when pruning is enabled; set pruning_enabled=false to keep all blocks (archive mode)."

The same applies to line 270 for PruningInterval.


🟢 Code Quality Observations

8. Error Handling

Good:

  • Proper error wrapping with context: fmt.Errorf("failed to prune...: %w", err)
  • ErrNotFound handling in pruning loop (lines 327-330, 334-349)

Could improve:

  • Add structured logging fields consistently (some logs have Uint64("height", h), others don't)

9. Test Coverage

Good:

  • TestPruneBlocks_RemovesOldBlockDataOnly verifies state snapshots aren't touched
  • TestPruneExecMeta_TracksLastPrunedHeight tests idempotency
  • Both positive and negative test cases

Missing:

  • No integration test for pruning during active block production
  • No test for concurrent pruning + block production race conditions
  • No test for pruning with very large height ranges (performance test)

Recommendation: Add integration test:

func TestPruningDuringBlockProduction(t *testing.T) {
    // 1. Produce blocks 1-100
    // 2. Enable pruning with interval=10, keep_recent=20
    // 3. Produce blocks 101-120 (triggers pruning at 110, 120)
    // 4. Verify blocks 1-80 are pruned, 81-120 remain
    // 5. Verify no data corruption or race conditions
}

10. Documentation

Good:

  • Clear function documentation for PruneBlocks and PruneExecMeta
  • Config field comments explain behavior

Missing:

  • No CLAUDE.md update explaining pruning feature
  • No operator guide for choosing appropriate pruning_keep_recent values
  • No migration guide for existing nodes enabling pruning

🔵 Performance Considerations

11. Batch Deletion Efficiency

The current implementation deletes entries one-by-one in a loop:

for h := lastPruned + 1; h <= height; h++ {
    // 6 separate Delete() calls per height
    batch.Delete(ctx, ds.NewKey(getHeaderKey(h)))
    batch.Delete(ctx, ds.NewKey(getDataKey(h)))
    batch.Delete(ctx, ds.NewKey(getSignatureKey(h)))
    batch.Delete(ctx, ds.NewKey(getMetaKey(...)))
    batch.Delete(ctx, ds.NewKey(getMetaKey(...)))
    batch.Delete(ctx, ds.NewKey(getIndexKey(...)))
}

For PruningKeepRecent=1000 and PruningInterval=100, this deletes 600 entries (100 heights × 6 keys) per pruning run.

Potential optimization: Use prefix-based deletion if the underlying datastore supports it (e.g., BadgerDB's DropPrefix). This is especially beneficial for large pruning ranges.


12. Missing Pruning Progress Logging

For large pruning ranges, there's no progress indication:

for h := lastPruned + 1; h <= height; h++ {
    // No progress log every N heights
}

Recommendation: Add periodic progress logging:

for h := lastPruned + 1; h <= height; h++ {
    // ... pruning logic ...
    if (h-lastPruned)%1000 == 0 {
        e.logger.Info().Uint64("pruned_up_to", h).Msg("pruning progress")
    }
}

📋 Summary

Must Fix Before Merge:

  1. ⚠️ Make pruning asynchronous to avoid blocking block production
  2. ⚠️ Add pruning metrics for observability
  3. Configure go-header pruning as claimed in PR description
  4. Add mutex protection or document thread-safety requirements

Should Fix:

  1. Standardize height encoding (BigEndian vs LittleEndian)
  2. Audit and document all metadata pruning
  3. Fix config comment for PruningKeepRecent
  4. Add integration tests for concurrent scenarios

Nice to Have:

  1. Add pruning progress logging for large ranges
  2. Update CLAUDE.md with operator guidance
  3. Consider prefix-based deletion optimization

Overall Assessment

This is a well-structured implementation with good test coverage and clean separation of concerns. The main concerns are:

  • Production readiness: Synchronous pruning and silent failures make this risky for production
  • Observability: Need metrics to detect pruning issues
  • Concurrency: Need to address potential race conditions

With the critical issues addressed, this will be a solid foundation for disk space management.


Copy link
Member

@julienrbrt julienrbrt Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given we have a lot of different stores and keys. Are we planning to prune as well:

(and eventually ev-abci store)

It will be inconsistent if we don't allow height pruning of all those stores (at once).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated this PR to prune metadats, go-header store and evm store ✅
I'm happy to work on prunning ev-abci next, but I would probably do it in another PR to prevent this PR from getting too big if that's ok.

@pthmas pthmas changed the title Block Prunning feat: block Prunning Jan 26, 2026
@pthmas pthmas changed the title feat: block Prunning feat: block Pruning Jan 26, 2026
@pthmas pthmas force-pushed the pierrick/prunning branch from 3e1e8e0 to ed66fe6 Compare January 26, 2026 15:01
@pthmas pthmas force-pushed the pierrick/prunning branch from 7708393 to a5110b1 Compare January 28, 2026 10:03
@pthmas pthmas force-pushed the pierrick/prunning branch 5 times, most recently from 369a149 to 26628f5 Compare January 28, 2026 15:34
@pthmas pthmas force-pushed the pierrick/prunning branch from 26628f5 to 4d65b1e Compare January 28, 2026 15:41
@codecov
Copy link

codecov bot commented Jan 28, 2026

Codecov Report

❌ Patch coverage is 20.73171% with 65 lines in your changes missing coverage. Please review.
✅ Project coverage is 55.37%. Comparing base (4252154) to head (4d65b1e).

Files with missing lines Patch % Lines
pkg/store/store.go 20.00% 24 Missing and 12 partials ⚠️
pkg/store/tracing.go 0.00% 11 Missing ⚠️
block/internal/executing/executor.go 0.00% 8 Missing and 1 partial ⚠️
pkg/config/config.go 37.50% 4 Missing and 1 partial ⚠️
pkg/sync/sync_service.go 33.33% 3 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2984      +/-   ##
==========================================
- Coverage   55.55%   55.37%   -0.18%     
==========================================
  Files         116      116              
  Lines       11477    11557      +80     
==========================================
+ Hits         6376     6400      +24     
- Misses       4401     4439      +38     
- Partials      700      718      +18     
Flag Coverage Δ
combined 55.37% <20.73%> (-0.18%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants