Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add hashKV checking to robustness test #18386

Open
serathius opened this issue Jul 31, 2024 · 9 comments
Open

Add hashKV checking to robustness test #18386

serathius opened this issue Jul 31, 2024 · 9 comments
Assignees
Labels
area/robustness-testing help wanted priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. type/feature

Comments

@serathius
Copy link
Member

What would you like to be added?

#18274 proposes to change very delicate part of compaction to fix a correctness bug. There is high concern in this change potentially changing result of hashKV which can lead to cluster incorrectly marking itself as corrupted. We need to build a very solid confidence in hashKV calculation which regards to traffic and compaction.

Robustness tests seem like a good candidate to build that confidence even further, the main challenge in adding tests like in #18369 is fact that we cannot easily cover all ways requests (PUT/DELETE/TXN/Compaction) can interleave. With Robustness generating random requests it seems that we just need to start checking hashKV.

Robustness tests could be periodically requesting hashKV from all members and validating them for equality.

Why is this needed?

Would like to generalize testing for #18369.

@serathius
Copy link
Member Author

cc @ahrtr @fuweid @henrybear327 @jmhbnz @siyuanfoundation @ah8ad3

@ahrtr
Copy link
Member

ahrtr commented Jul 31, 2024

Three cases I can think of,

  • Always check hash at the end of each test, to verify all members have exactly the same hash value on the key space. It's exactly what the previous functional test was doing.
  • Compute the hash against the compact revision after each compaction.
  • Randomly compute hash on a random historic revision during the test.

But we need to exclude false alarm. The compaction is executed in the applying workflow/path, but hashKV is executed in the API layer. The CompactRevision must be the same in the HashKVResponse of all members (the same for HashRevision), otherwise it makes no sense to compare the Hash.

Hash uint32 `protobuf:"varint,2,opt,name=hash,proto3" json:"hash,omitempty"`
// compact_revision is the compacted revision of key-value store when hash begins.
CompactRevision int64 `protobuf:"varint,3,opt,name=compact_revision,json=compactRevision,proto3" json:"compact_revision,omitempty"`
// hash_revision is the revision up to which the hash is calculated.
HashRevision int64 `protobuf:"varint,4,opt,name=hash_revision,json=hashRevision,proto3" json:"hash_revision,omitempty"`

@ahrtr
Copy link
Member

ahrtr commented Jul 31, 2024

@ArkaSaha30

@henrybear327
Copy link
Contributor

/assign @ArkaSaha30
/assign @henrybear327

We would like to work on this together as this also seems to be a good material for the talk that we might be preparing for!

@k8s-ci-robot
Copy link

@henrybear327: GitHub didn't allow me to assign the following users: ArkaSaha30.

Note that only etcd-io members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign @ArkaSaha30
/assign @henrybear327

We would like to work on this together as this also seems to be a good material for the talk that we might be preparing for!

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ArkaSaha30
Copy link
Contributor

/assign

@fuweid
Copy link
Member

fuweid commented Aug 1, 2024

Sound good to me! Especially, it can help us to do sanity and compatible check on mix-versions cluster.

@ah8ad3
Copy link
Contributor

ah8ad3 commented Aug 5, 2024

I don't know if anyone needs an extra pair of hands i'm here just ping me @ArkaSaha30 @henrybear327

@serathius serathius added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Aug 5, 2024
@serathius
Copy link
Member Author

ping @henrybear327 @ArkaSaha30 any progress?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/robustness-testing help wanted priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. type/feature
Development

No branches or pull requests

7 participants