Skip to content

refactor(v2): prune performance #1034

@kocubinski

Description

@kocubinski

A pruning event deletes all tree nodes invalidated (orphaned) at an arbitrary prune version, less than the tree's current version. Given the earliest version of the tree on disk m, and prune version n, v2 calculates the set difference all_nodes(m -> n) - orphaned_at(n) and, importantly, writes the set to disk in a new shard. For a B+Tree, inserting is markedly faster than deleting. After completion the old shard(s) are dropped, and complete version of the tree at version n is in shard n.

There are some opportunities in the current implementation for performance improvements.

Rate Limiting

The current rate limit mechanism is global pruning limit. This works but we can probably do better.

Right now pruning begins at prune_height + keep_versions even though a new shard is created at prune_height+1. Pruning could be started at prune_height and rate limited (slowed way down) to reduce I/O pressure. This is probably as simple as smaller batches and a configurable pause between to keep to a target insert rate.

SQLite JOIN

The current set difference calculation is CPU and memory intensive. It naively loads all orphans into a map and filters branch and leaf nodes against that map. The reason for this design is to keep leaf and branch orphan writes fast, there is no index on orphan tables. If there were a SQLite JOIN statement could be used to populate the new shard, which is probably vastly more efficient.

Pruning could build the orphan indexes on (version, sequence) then execute the JOIN statements. Of course this still costs CPU and I/O, so it's not clear where the infection point (on tree size) is for it to be more efficient, but it probably is in some cases.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions