automated testing for action invariants #31

davepacheco · 2022-07-18T16:50:55Z

Steno expects action implementations to meet a few invariants (these come from distributed sagas in general):

actions are idempotent
undo actions are idempotent
"undo" should be commutative with the action. That is, if an undo action is executed followed by a re-execution of the action itself, the net result should be as though the undo action ran last.
what others?

We cannot test that actions are correct or even that they always obey these invariants. But we could empirically test a lot of different cases, like:

after an action runs to completion, run it again. It should succeed again and the resulting external state should be equivalent. (Steno can't know the second part but the consumer can help here.)
after an undo action runs to completion, run it again. It should succeed again and the external state should be equivalent.
run the action again after the undo action completes. It should succeed again and the external state should be equivalent.

If we could inject a retryable error into either the action or undo action, then we could also test that they can fail (multiple times in a row), then remove the injected error, and then run it again and have them succeed.

All of this requires that the consumer be able to tell us whether the external states of the world are equivalent (or, maybe equivalently, whether they've changed).

davepacheco · 2022-07-18T17:04:52Z

Here's a very rough napkin sketch of an idea:

/// Provides helpers used by Steno for automated testing of action invariants
trait ActionContextTestable {
    type Snapshot: TestableSnapshot;
    async fn snapshot(&self) -> Self::Snapshot;

    type InjectedError; // probably needs to be convertible to errors similar to what we do for Actions
    async fn inject_retryable_error(&self, error: Self::InjectedError) -> Result<(), SomeKindOfError>;
}

trait TestableSnapshot {
    type Error; // probably needs to be convertible to Steno's error, similar to what we do for Actions
    async fn has_changed_since(&self, snapshot: Self) -> Result<bool, Self::Error>
}

This is basically letting the consumer plug in a way to tell us whether the external state of the world has changed and also to inject retryable errors.

With this, it feels like we should be able write a function with a signature like this:

fn test_saga_actions<T, SagaType>(
    // probably some SEC-related argument,
    saga_template: SagaTemplate<SagaType>, // will be a DAG post-#29
    saga_params: SagaType::ParamsType,
    user_context: T,
)
    where
        T: ActionContextTestable,
        SagaType: SagaType<ContextType=T>
{}

and the idea would be that we run through the saga, and for each action, we take a snapshot, run the action, take another snapshot, run the action again, take another snapshot, etc. and compare various snapshots to see if the external state is changed. The details here are a little tricky. If we add an async fn rollback(&self, snapshot: Self::Snapshot) to ActionContextTestable, then Steno has a lot of flexibility in how to implement this. But implementing rollback might be both complicated and error-prone. It'd be interesting to think about whether we could sequence the testing such that we never needed to rollback. Maybe implementing rollback_to_initial_snapshot() would be simpler and easier.

davepacheco · 2022-07-18T17:46:44Z

Talking with @bnaecker it occurred to me that when we first automate testing for this stuff in Omicron, we may find a number of actions that violate some of the expected invariants. It probably won't be tenable to block integration of these tests on fixing all the existing actions. Instead, I think we'll want to document the invariants we're testing and what specific edge cases they're needed for -- that is, what's the impact of violating them? That can help us prioritize addressing whatever failures we see.

Provides a simple API for instructing a node to "execute twice". This provides a "bare-minimum" helper API for testing idempotency within a saga. When combined with #67 - which was used to test unwind safety - it should be possible to test that all actions / undo actions within a saga are idempotent, at least across being called twice. Part of #31

davepacheco mentioned this issue Oct 10, 2022

"Testing sagas" is not easy in Omicron, but it should be oxidecomputer/omicron#1799

Closed

smklein mentioned this issue Dec 26, 2022

Add API for injecting 'repetitions' into saga executor #88

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

automated testing for action invariants #31

automated testing for action invariants #31

davepacheco commented Jul 18, 2022

davepacheco commented Jul 18, 2022

davepacheco commented Jul 18, 2022

automated testing for action invariants #31

automated testing for action invariants #31

Comments

davepacheco commented Jul 18, 2022

davepacheco commented Jul 18, 2022

davepacheco commented Jul 18, 2022