Database backend hot-swapping methods #209

hinto-janai · 2024-07-01T16:52:54Z

What

Currently, cuprate_database's backend can be changed by compiling it with different feature flags, i.e.:

cargo build --features {heed,redb}

This issue is for discussing various methods cuprate_database could use to hot-swap backends at runtime.

Why

This would allow end-users to choose a backend at runtime, e.g. via config or CLI.

Method 1: `dyn Env`

The concrete object that represents the database environment is cuprate_database::ConcreteEnv.

This is a non-generic object; it's just a struct with some internals that switch depending on the backend feature flag.

This struct implements trait Env, the database environment trait, from where all other database operations can occur.

Passing around a dyn Env that all backends implement would solve these issues but there's a few problems:

trait Env is not object-safe because...
It uses associated types/constants because...
It must specify certain concrete types (e.g. the transaction type) because...
Env's are only compatible with their own types

For example, even though heed::RoTxn and redb::ReadTransaction both implement trait TxRo, you cannot pass a heed::RoTxn to redb and expect it to work. This means it cannot be object safe, and that types are not compatible with each other.

Another problem is performance; dyn will dynamically dispatch at runtime for each call, this compounds as the other traits (TxRo, DatabaseRo, etc) will probably have to be behind dyn as well.

Pros

Uses the type system
Most maintainable

Cons

Slowest method
Probably not possible without large changes

Method 2: `enum` for each `trait`

This is the same idea dyn, except there is a concrete enum that defines all backends.

There would have to be an enum for each trait and the backend's specific type, e.g.:

enum EnvEnum {
	Heed(heed::Env),
	Redb(redb::Database),
}

enum TxRoEnum<'a> {
	Heed(heed::RoTxn<'a>),
	Redb(redb::ReadTransaction),
}

/* continue for each trait */

and cuprate_database would expose EnvEnum where users would have to match at every layer.

Pros

Faster than dyn
Doesn't run into the object safety problem

Cons

Terrible maintainability
Terrible usability

Method 3: Branching at the high level

Another method is shifting the responsibility for "hot-swapping" upwards, i.e. instead of making cuprate_database hot-swap, the crates building on-top will do so.

This comes with the pro that the "branch" to determine which backend is used only needs to be done once.

The con is that each crate building on-top must take on this responsibility (although, there's only 2 currently, cuprate-blockchain and cuprate-txpool).

For example, cuprate_blockchain::service could look something like this:

// storage/blockchain/src/service/free.rs
pub fn init(config: Config) -> Result<(DatabaseReadHandle, DatabaseWriteHandle), InitError> {
    let db = if config.backend == Backend::Heed {
        /* init heed backend */
    } else {
        /* init redb backend */
    };

    /* spawn threadpool with backend */
}

// storage/blockchain/src/service/read.rs
pub struct DatabaseReadHandle {
    // old field
    // env: Arc<ConcreteEnv>,

    // new field
    spawn_fn: fn(BCReadRequest) -> InfallibleOneshotReceiver,
}

The blockchain read/write handle now only holds a function pointer that spawns some work to be done inside the rayon threadpool, instead of owning the Env itself
Each handler function would have to take in <E: Env> instead of ConcreteEnv

Pros

Fastest method (one branch at init())

Problems

Who owns the Arc<Env> now? rayon doesn't have custom storage, recreating handler logic for rayon threads instead of as-needed spawning means we lose (or have to re-create) rayon work stealing logic

The text was updated successfully, but these errors were encountered:

Boog900 · 2024-07-06T00:22:30Z

Who owns the Arc now? rayon doesn't have custom storage, recreating handler logic for rayon threads instead of as-needed spawning means we lose (or have to re-create) rayon work stealing logic

If we were instead to make the type:

// storage/blockchain/src/service/read.rs
pub struct DatabaseReadHandle {
    // old field
    // env: Arc<ConcreteEnv>,

    // new field
    spawn_fn: Box<dyn Fn(BCReadRequest) -> InfallibleOneshotReceiver>,
}

Then we could make the closure hold the Arc<Env>.

jomuel · 2024-07-06T08:32:46Z

As discussed with @Boog900, I'd like to pick this up. I'd review the code today or tomorrow and then discuss further details with you in the Cuprate group chat.

Boog900 · 2024-07-09T14:01:48Z

I had another idea that is pretty much an extension of method 3, but solves the issue of binary bloat.

Remove ConcreteEnv and replace all usages with <E: Env>.
Expose the heed and redb backends Env in cuprate_database but under feature flags that are not enabled by default.
cuprate_blockchain Would also have feature flags for different DB backends, however these feature flags give you the ability to swap to that DB, it does not force its usage.
cuprate_blockchain would then start the chosen backend like described by hinto, if a backend is chosen that hasn't been enabled by the feature flags the init function should panic.

hinto-janai added A-storage Related to storage. C-discussion General discussion or questions. labels Jul 1, 2024

Boog900 assigned jomuel Jul 6, 2024

Boog900 mentioned this issue Jul 26, 2024

Storage: split the DB service abstraction #237

Merged

jomuel mentioned this issue Aug 1, 2024

Database hotswap #242

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Database backend hot-swapping methods #209

Database backend hot-swapping methods #209

hinto-janai commented Jul 1, 2024 •

edited

Loading

Boog900 commented Jul 6, 2024

jomuel commented Jul 6, 2024

Boog900 commented Jul 9, 2024

Database backend hot-swapping methods #209

Database backend hot-swapping methods #209

Comments

hinto-janai commented Jul 1, 2024 • edited Loading

What

Why

Method 1: dyn Env

Pros

Cons

Method 2: enum for each trait

Pros

Cons

Method 3: Branching at the high level

Pros

Problems

Boog900 commented Jul 6, 2024

jomuel commented Jul 6, 2024

Boog900 commented Jul 9, 2024

hinto-janai commented Jul 1, 2024 •

edited

Loading

Method 1: `dyn Env`

Method 2: `enum` for each `trait`