Skip to content

Conversation

silentred
Copy link
Contributor

@silentred silentred commented Aug 24, 2025

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: silentred
Once this PR has been reviewed and has the lgtm label, please assign siyuanfoundation for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link

Hi @silentred. Thanks for your PR.

I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ahrtr
Copy link
Member

ahrtr commented Aug 26, 2025

This isn't correct. It can't prevent defragmentation from renaming the db file concurrently.

@ahrtr
Copy link
Member

ahrtr commented Aug 26, 2025

The defragmentation also needs to acquire this lock.

func (ms *maintenanceServer) Defragment(ctx context.Context, sr *pb.DefragmentRequest) (*pb.DefragmentResponse, error) {
ms.lg.Info("starting defragment")
ms.healthNotifier.defragStarted()
defer ms.healthNotifier.defragFinished()
err := ms.bg.Backend().Defrag()
if err != nil {
ms.lg.Warn("failed to defragment", zap.Error(err))
return nil, togRPCError(err)
}
ms.lg.Info("finished defragment")
return &pb.DefragmentResponse{}, nil
}

The (s *EtcdServer) Backend() is problematic, it just protects getting the backend instance; it can't protect the following operation (i.e. defragmentation).

func (s *EtcdServer) Backend() backend.Backend {
s.bemu.RLock()
defer s.bemu.RUnlock()
return s.be
}

@silentred
Copy link
Contributor Author

silentred commented Aug 27, 2025

it just protects getting the backend instance

You are right.

I was thinking bemu should protect not only changes of variable be but also underlying resources of the backend, e.g. db file, readTx, batchTx. So if we use a lock to protect

  • from OpenSnapshotBackend to the end of applySnapshot
  • and whole defrag process

is that supposed to work?
Please correct me if I am wrong, thank you.

@ahrtr
Copy link
Member

ahrtr commented Aug 27, 2025

  • and whole defrag process

I think we can add a Defragment method for Etcdserver and acquire the lock bemu, the rough implementation is something like below,

func (s *Etcdserver) Defragment(...) {
     	s.bemu.RLock() 
 	defer s.bemu.RUnlock() 
 	return s.Backend().Defrag() 

The (ms *maintenanceServer) Defragment should cal (s *Etcdserver) Defragment instead of the Backend().Defrag().

also cc @fuweid

@@ -1029,6 +1029,9 @@ func (s *EtcdServer) applySnapshot(ep *etcdProgress, toApply *toApply) {
// wait for raftNode to persist snapshot onto the disk
<-toApply.notifyc

// protect the backend from concurrent defrag
s.bemu.Lock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just have concern about increasing that scope of critical section. Yeah. it's simple fix. Just in case we run into dead lock. We probably need e2e test to cover that.

Copy link
Contributor Author

@silentred silentred Sep 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what this E2E test should do. My thought is this test might include:

Any advice? Big thanks.

BTW, can I ask why should we not use SIGSTOP to pause the etcd process?

@silentred
Copy link
Contributor Author

silentred commented Sep 1, 2025

Hi @ahrtr , thanks for your detailed explanation #20585 (comment) .
According to your comment, I think you might prefer this PR's solution. I have already modified the code in the direction of this suggestion #20553 (comment) .

I think we still need to align our understandings.

The problem is that during the above period, there are two different backend instances.

Yes, there are 2 backend instancesnewbe and s.be.

but both of which point to the same db file.

I think they have different inode .s.be points to old db file, newbe points to new db file, renamed from xxx.snap.db

Note that normally the old backend instance still reads from the old db file even after the renaming, until you restart etcdserver or perform a defragment right after the renaming or setting the new backend (s.be = newbe).

I think after s.kv.Restore(newbe), etcdserver starts reading from newbe to serve client's range requests.

We should block any write to both backends during the above process

Execpt defrag, I am not seeing any writes during applySnapshot, because it is in the apply loop, in which there is no concurrent user writes.

#20585 this PR attempts to rename snap.db file in the backend, so there is only one backedn instance. It locks the whole process of renaming xxx.snap.db to db, prevents concurrent defrag.

@ahrtr
Copy link
Member

ahrtr commented Sep 1, 2025

Hi @ahrtr , thanks for your detailed explanation #20585 (comment) .
According to your comment, I think you might prefer this PR's solution. I have already modified the code in the direction of this suggestion #20553 (comment) .

I didn't realize that you raised two separate PRs, pls close #20585 to avoid confusion.

I think they have different inode .s.be points to old db file, newbe points to new db file, renamed from xxx.snap.db

It's the reason why I said Note that normally the old backend instance still reads from the old db file even after the renaming, until you restart etcdserver or perform a defragment right after the renaming or setting the new backend (s.be = newbe).

Execpt defrag, I am not seeing any writes during applySnapshot, because it is in the apply loop, in which there is no concurrent user writes.

it isn't true. There is also API layer write, i.e. Storage version update.

func (m *Monitor) UpdateStorageVersionIfNeeded() {

@silentred
Copy link
Contributor Author

it isn't true. There is also API layer write, i.e. Storage version update.

Thanks. I didn't notice.

I will squash this PR's commits when it is ready to merge.

@silentred silentred changed the title [NotMerge] simple fix for 20271 [fix #20271] protect backend usage from concurrent applySnapshot and defrag Sep 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants