Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

automated testing for action invariants #31

Open
davepacheco opened this issue Jul 18, 2022 · 2 comments
Open

automated testing for action invariants #31

davepacheco opened this issue Jul 18, 2022 · 2 comments

Comments

@davepacheco
Copy link
Collaborator

Steno expects action implementations to meet a few invariants (these come from distributed sagas in general):

  • actions are idempotent
  • undo actions are idempotent
  • "undo" should be commutative with the action. That is, if an undo action is executed followed by a re-execution of the action itself, the net result should be as though the undo action ran last.
  • what others?

We cannot test that actions are correct or even that they always obey these invariants. But we could empirically test a lot of different cases, like:

  • after an action runs to completion, run it again. It should succeed again and the resulting external state should be equivalent. (Steno can't know the second part but the consumer can help here.)
  • after an undo action runs to completion, run it again. It should succeed again and the external state should be equivalent.
  • run the action again after the undo action completes. It should succeed again and the external state should be equivalent.

If we could inject a retryable error into either the action or undo action, then we could also test that they can fail (multiple times in a row), then remove the injected error, and then run it again and have them succeed.

All of this requires that the consumer be able to tell us whether the external states of the world are equivalent (or, maybe equivalently, whether they've changed).

@davepacheco
Copy link
Collaborator Author

Here's a very rough napkin sketch of an idea:

/// Provides helpers used by Steno for automated testing of action invariants
trait ActionContextTestable {
    type Snapshot: TestableSnapshot;
    async fn snapshot(&self) -> Self::Snapshot;

    type InjectedError; // probably needs to be convertible to errors similar to what we do for Actions
    async fn inject_retryable_error(&self, error: Self::InjectedError) -> Result<(), SomeKindOfError>;
}

trait TestableSnapshot {
    type Error; // probably needs to be convertible to Steno's error, similar to what we do for Actions
    async fn has_changed_since(&self, snapshot: Self) -> Result<bool, Self::Error>
}

This is basically letting the consumer plug in a way to tell us whether the external state of the world has changed and also to inject retryable errors.

With this, it feels like we should be able write a function with a signature like this:

fn test_saga_actions<T, SagaType>(
    // probably some SEC-related argument,
    saga_template: SagaTemplate<SagaType>, // will be a DAG post-#29
    saga_params: SagaType::ParamsType,
    user_context: T,
)
    where
        T: ActionContextTestable,
        SagaType: SagaType<ContextType=T>
{}

and the idea would be that we run through the saga, and for each action, we take a snapshot, run the action, take another snapshot, run the action again, take another snapshot, etc. and compare various snapshots to see if the external state is changed. The details here are a little tricky. If we add an async fn rollback(&self, snapshot: Self::Snapshot) to ActionContextTestable, then Steno has a lot of flexibility in how to implement this. But implementing rollback might be both complicated and error-prone. It'd be interesting to think about whether we could sequence the testing such that we never needed to rollback. Maybe implementing rollback_to_initial_snapshot() would be simpler and easier.

@davepacheco
Copy link
Collaborator Author

Talking with @bnaecker it occurred to me that when we first automate testing for this stuff in Omicron, we may find a number of actions that violate some of the expected invariants. It probably won't be tenable to block integration of these tests on fixing all the existing actions. Instead, I think we'll want to document the invariants we're testing and what specific edge cases they're needed for -- that is, what's the impact of violating them? That can help us prioritize addressing whatever failures we see.

smklein added a commit that referenced this issue Dec 28, 2022
Provides a simple API for instructing a node to "execute twice".

This provides a "bare-minimum" helper API for testing idempotency within a saga. When combined with #67 - which was used to test unwind safety - it should be possible to test that all actions / undo actions within a saga are idempotent, at least across being called twice.

Part of #31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant