-
Notifications
You must be signed in to change notification settings - Fork 593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
archival: Add replica validator #24626
base: dev
Are you sure you want to change the base?
Conversation
94ba70b
to
8a839b2
Compare
The validator is used to check if replica state is not diverged from the previous leader. Signed-off-by: Evgeny Lazin <[email protected]>
8a839b2
to
4126158
Compare
Retry command for Build#60061please wait until all jobs are finished before running the slash command
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea looks all right 👍
A bit concerned that we would allow the uploads to proceed if cloud_storage_disable_upload_consistency_checks
is set. First of all, in which case would it be fine to allow uploads to proceed rather than potentially going and fixing the manifest first?
Also, hah, what would "fixing" mean?
Other concern is that we overload this setting with too much, it is also global so large blast radius.
We should probably have a per-partition mechanism to allow inconsistencies only at specific offsets. (Not for this PR of course)
// Get the last uploaded segment and try to translate one of its offsets | ||
// and compare the results. | ||
auto last_segment = manifest.last_segment(); | ||
if (last_segment.has_value() && local_so < last_segment->base_offset) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<=
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Last segment exists in the manifest and can be translated using | ||
// local offset translation state. | ||
|
||
auto expected_delta = last_segment->delta_offset; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it make sense to also validate delta_offset_end
?
/// over cluster replicas. If the archiver starts on a replica | ||
/// with inconsistent offset translator state it should be able | ||
/// to detect this and report the problem. | ||
class replica_state_validator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this need to be a class? i think it would be better as a free function with a struct result
a bit unusual for a validator class to do work in constructor and then have no other use
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the method could accept a log object and a manifest object only, no need for a full blown partition
will make testing easier, less includes too
|
||
bool has_anomalies() const noexcept; | ||
|
||
const std::deque<replica_state_anomaly> get_anomalies() const noexcept; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dead code?
#include "cluster/fwd.h" | ||
#include "cluster/partition.h" | ||
|
||
#include <seastar/core/shared_ptr.hh> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unused include?
|
||
#pragma once | ||
|
||
#include "cluster/fwd.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unused?
CI test resultstest results on build#60061
|
The validator is used to check if replica state is not diverged from the previous leader.
Backports Required
Release Notes