Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(duplication): fix plog do not gc after meta server restart #2066

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ninsmiracle
Copy link
Contributor

What problem does this PR solve?

#2015

What is changed and how does it work?

When meta load duplication status from ZK, refresh duplicating string in memory ,base on the value in ZK.

Tests
  • Unit test
  • Manual test (add detailed scripts or steps below)

@github-actions github-actions bot added the cpp label Jul 9, 2024
@@ -701,6 +702,10 @@ void meta_duplication_service::recover_from_meta_state()
_meta_svc->get_meta_storage()->get_children(
get_duplication_path(*app),
[this, app](bool node_exists, const std::vector<std::string> &dup_id_list) {
dsn::defer([this, app]() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The memory state will be updated [1] after get data from ZK successfully in line 721, why update it again?

  1. https://github.com/apache/incubator-pegasus/blob/master/src/meta/duplication/meta_duplication_service.cpp#L787

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The memory state will be updated [1] after get data from ZK successfully in line 721, why update it again?

  1. https://github.com/apache/incubator-pegasus/blob/master/src/meta/duplication/meta_duplication_service.cpp#L787

Because in function recover_from_meta_state, the vector dup_id_list may be empty.
Here is the detail of our online condition:

  1. Delete the existing hot standby first
    A. Delete the corresponding dup from zk
    B. Set duplicating in meta_server memory to 0 (not saved in zk), duplicating in zk is still 1
  2. meta_server restarts
    Read dup and app_info data from zk to memory. At this time, dup is empty, but duplicating in zk is still 1 (because dup is empty, it is impossible to delete the hot standby to make duplicating 0).
  3. Meta passes the hot standby status to the replica through on_config_sync
  4. Replica GC checks that the dup map is empty (min_confirmed_decree = invalid_decree), but _is_duplication_master = true, so GC is not performed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the information. So this patch is aim to complete step 2? Why not complete the 1.B step?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the information. So this patch is aim to complete step 2? Why not complete the 1.B step?

Both two ways can fix this problem. In my opinion, the ways like step 2 is easier to implement.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correctness and consistent is important. Update the duplicating status to ZK is not very difficult if all duplications removed.
Or, if the duplicating memory status can be recovered from <app>/duplication ZK path, just competely remove the duplicating status from ZK.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants