-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Let PartitionProcessors failures propagate to the PartitionProcessorManager #2214
Conversation
3d033bd
to
cf35633
Compare
This change prevents a failure of a PP from bringing down the entire runtime, instead the error is propagated to the PPM and the status of the PP is basically popped out from the inner map. This will then show up on the CC next status pull and can then decide to take an action. |
a4cd855
to
e09cb67
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this PR @muhamadazmy. The changes look good to me. I had one comment regarding the clean shut down case where a Stopped
event could also be helpful. +1 for merging after resolving the comments.
@@ -648,6 +652,14 @@ impl<T: TransportConnect> PartitionProcessorManager<T> { | |||
error!(%partition_id, error=%err, "Starting partition processor failed"); | |||
self.running_partition_processors.remove(&partition_id); | |||
} | |||
ProcessorEvent::Stopped(err) => { | |||
error!(%partition_id, error=%err, "Partition processor exited"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe warn
because the system is not yet broken (hopefully). warn!(%partition_id, "Partition processor exited unexpectedly: {err}");
let _ = events | ||
.send(ManagerEvent { | ||
partition_id, | ||
event: ProcessorEvent::Stopped(err), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would probably also return a Stopped
event if the partition processor exited w/o an error. That way the receiver can also clean up its data structures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, indeed. Good catch!
c63d90b
to
67aaa83
Compare
…anager Summary: When PP crashes (returns an error) we should update it's status instead of panicing Fixes restatedev#2147
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. +1 for merging.
Let PartitionProcessors failures propagate to the PartitionProcessorManager
Summary:
When PP crashes (returns an error) we should update it's status instead
of panicing
Fixes #2147
Stack created with Sapling. Best reviewed with ReviewStack.