Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement: persist commit index in LogStore to accelerate recovery #613

Open
wants to merge 37 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 25 commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
2e5a8a0
feat: add CommitTrackingLogStore interface for commit index management
peterxcli Sep 1, 2024
ffc6b3b
chore: remove non-idiomatic type assert func
peterxcli Sep 3, 2024
7383d96
feat(raft): add fast recovery mode for quicker log application
peterxcli Sep 4, 2024
f6295e0
feat(raft): add recovery from committed logs during startup
peterxcli Sep 4, 2024
f2ae7a9
refactor(store): rename ReadCommitIndex to GetCommitIndex for consist…
peterxcli Sep 6, 2024
ce1895c
fix: also set inmem commit index when revocer log commit progress fro…
peterxcli Sep 10, 2024
ab50a58
perf: optimize startup recovery by skipping duplicated log replay
peterxcli Sep 10, 2024
4e7e04b
refactor(inmem-commit-tracking-store): store commit index in memory u…
peterxcli Sep 13, 2024
41df55e
chore: fix typo in recoverFromCommittedLogs function name
peterxcli Sep 13, 2024
400a27d
refactor(raft): update parameter name in persistCommitIndex function
peterxcli Sep 13, 2024
e2617e8
refactor(raft): set commit index in memory before `StoreLogs`
peterxcli Sep 13, 2024
6daca47
refactor(raft): fix condition for skipping recovery in `recoverFromCo…
peterxcli Sep 18, 2024
cc09317
feat(raft): add commit tracking logs and fast recovery tests
peterxcli Sep 18, 2024
fe57b32
docs(config): update comments for FastRecovery mechanism
peterxcli Sep 19, 2024
20e8701
refactor(inmem-commit-tracking-store): simplify in-mem log tracking s…
peterxcli Sep 19, 2024
6f146e1
fix: rename persistCommitIndex to tryPersistCommitIndex
peterxcli Sep 19, 2024
a8438b0
chore(raft): rename tryPersistCommitIndex to tryStageCommitIndex for …
peterxcli Sep 20, 2024
5e6d8a4
refactor(log): introduce StagCommitIndex for optimized atomic persist…
peterxcli Sep 20, 2024
e248f00
fix(raft): correct CommitTrackingLogStore implementation
peterxcli Sep 24, 2024
2a913ab
feat(raft): improve fast recovery error handling and commit index val…
peterxcli Sep 24, 2024
7cd6732
feat: add `CommitTrackingLogStore` interface check and adjust return …
peterxcli Oct 9, 2024
92c04a0
refactor: improve type assertion for log store in TestRaft_FastRecovery
peterxcli Oct 9, 2024
8e8ba07
feat: add warning log for unsupported fast recovery
peterxcli Oct 10, 2024
2020cab
refactor: move commitIndex retrieve into `tryStageCommitIndex`
peterxcli Oct 10, 2024
2a7d584
refactor: remove error from return field of recoverFromCommittedLogs
peterxcli Oct 10, 2024
bdac45b
refactor: rename FastRecovery and revert the stageCommittedIdx change
peterxcli Oct 11, 2024
ed47a25
docs: documented GetCommitIndex in CommitTrackingLogStore interface
peterxcli Oct 11, 2024
ad87d86
docs: change fastRecovery flag to recoverCommittedLog in all document…
peterxcli Oct 11, 2024
30fc43e
refactor: add a new ErrIncompatibleLogStore for recoverFromCommittedLogs
peterxcli Oct 11, 2024
e797962
docs: clarify RestoreCommittedLogs configuration requirement
peterxcli Oct 11, 2024
500567f
refactor: rename recoverFromCommittedLogs to restoreFromCommittedLogs
peterxcli Oct 11, 2024
cfffcb5
refactor!: update MakeCluster functions to return error
peterxcli Oct 11, 2024
560c0b9
test: add test for RestoreCommittedLogs with incompatible log store
peterxcli Oct 11, 2024
8c722fa
Revert "refactor!: update MakeCluster functions to return error"
peterxcli Oct 12, 2024
300a6e7
refactor: update makeCluster to return errors
peterxcli Oct 12, 2024
8d11a28
Use wrapped err
peterxcli Oct 12, 2024
1bdf161
docs: clarify GetCommitIndex behavior in CommitTrackingLogStore inter…
peterxcli Oct 15, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 42 additions & 1 deletion api.go
Original file line number Diff line number Diff line change
Expand Up @@ -217,6 +217,10 @@ type Raft struct {
// preVoteDisabled control if the pre-vote feature is activated,
// prevote feature is disabled if set to true.
preVoteDisabled bool

// fastRecovery is used to enable fast recovery mode
// fast recovery mode is disabled if set to false.
fastRecovery bool
}

// BootstrapCluster initializes a server's storage with the given cluster
Expand Down Expand Up @@ -566,6 +570,7 @@ func NewRaft(conf *Config, fsm FSM, logs LogStore, stable StableStore, snaps Sna
followerNotifyCh: make(chan struct{}, 1),
mainThreadSaturation: newSaturationMetric([]string{"raft", "thread", "main", "saturation"}, 1*time.Second),
preVoteDisabled: conf.PreVoteDisabled || !transportSupportPreVote,
fastRecovery: conf.FastRecovery,
}
if !transportSupportPreVote && !conf.PreVoteDisabled {
r.logger.Warn("pre-vote is disabled because it is not supported by the Transport")
Expand All @@ -585,9 +590,12 @@ func NewRaft(conf *Config, fsm FSM, logs LogStore, stable StableStore, snaps Sna
return nil, err
}

r.recoverFromCommittedLogs()

// Scan through the log for any configuration change entries.
snapshotIndex, _ := r.getLastSnapshot()
for index := snapshotIndex + 1; index <= lastLog.Index; index++ {
lastappliedIndex := r.getLastApplied()
for index := max(snapshotIndex, lastappliedIndex) + 1; index <= lastLog.Index; index++ {
var entry Log
if err := r.logs.GetLog(index, &entry); err != nil {
r.logger.Error("failed to get log", "index", index, "error", err)
Expand Down Expand Up @@ -697,6 +705,39 @@ func (r *Raft) tryRestoreSingleSnapshot(snapshot *SnapshotMeta) bool {
return true
}

// recoverFromCommittedLogs recovers the Raft node from committed logs.
func (r *Raft) recoverFromCommittedLogs() {
if !r.fastRecovery {
return
}

// If the store implements CommitTrackingLogStore, we can read the commit index from the store.
// This is useful when the store is able to track the commit index and we can avoid replaying logs.
store, ok := r.logs.(CommitTrackingLogStore)
if !ok {
r.logger.Warn("fast recovery enabled but log store does not support it", "log_store", fmt.Sprintf("%T", r.logs))
return
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as we're considering returning an error below instead of panicking, I think we should consider doing so here as well. This is going to be a "programmer error" rather than a runtime error -- the consumer of the library should be ensuring they're passing a compatible combination of log store and FastRecovery configuration.


commitIndex, err := store.GetCommitIndex()
if err != nil {
r.logger.Error("failed to get commit index from store", "error", err)
panic(err)
}

lastIndex, err := r.logs.LastIndex()
if err != nil {
r.logger.Error("failed to get last log index from store", "error", err)
panic(err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we fallback to the non fast recovery path instead of panic?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 Great question.

For transient errors it's probably better to hard stop rather than silently weaken the expected guarantees. Someone who has enabled this feature (and provided a supported logstore) should be able to assume that by the time raft has started (without error), their FSM is at least as up-to-date as it was before a restart. Silently falling back seems like it makes it impossible to really trust that guarantee and may mean similar bugs that we are trying to prevent are still possible with no way to mitigate them (even if they are rarer).

In most use cases where each server is using this library as a core of it's functionality (i.e. all our products) and only uses a single raft group initialised during startup, a panic is probably reasonable too because then we crash and the supervisor will restart and if it was a transient error then great and if not it's no different to what will happen later when we are unable to read logs.

That said, in writing this I realised it's probably too strong of a decision to make in a library here: it would be possible for a server process to manage multiple raft instances for example and a fatal error in one of them shouldn't terminate the whole process. So I think I'd vote for making these cases return a hard error from NewRaft like we do if we fail to read a log or snapshot during startup rather than panic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess returning an error from NewRaft would allow the library user to decide if they would like to do a fallback or fail starting the application, that is reasonable I think.

I think we should implement this as a specific error type that we publish on the API to allow calling errors.Is on it without relying on string compare.

if commitIndex > lastIndex {
commitIndex = lastIndex
}

r.setCommitIndex(commitIndex)
r.processLogs(commitIndex, nil)
}

func (r *Raft) config() Config {
return r.conf.Load().(Config)
}
Expand Down
10 changes: 10 additions & 0 deletions config.go
Original file line number Diff line number Diff line change
Expand Up @@ -235,6 +235,16 @@ type Config struct {
// PreVoteDisabled deactivate the pre-vote feature when set to true
PreVoteDisabled bool

// FastRecovery controls if the Raft server should use the fast recovery
// mechanism. Fast recovery requires a LogStore implementation that
// support commit tracking. When such a store is used and this config
// enabled, raft nodes will replay all known-committed logs on disk
// before completing `NewRaft` on startup. This is mainly useful where
// the application allows relaxed-consistency reads from followers as it
// will reduce how far behind the follower's FSM is when it starts. If all reads
// are forwarded to the leader then there won't be observable benefit from this feature.
FastRecovery bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is a naming nitpick or just a question:

From the perspective of a caller to NewRaft, FastRecovery might be considerably slower right? Because it blocks while replaying all locally-persisted committed logs?

IIUC the "fast" aspect is due to more logs being replayed locally instead of streamed from a peer. Logs committed while this member was down will need to be streamed, but presumably that's often a fraction of the total log size.

If my understanding is accurate, an alternative name might be CommittedLogRecovery (or even LocalCommittedLogRecovery which is even worse!), but .... Fast Recovery definitely sounds better! So I'm more curious if my reasoning is correct than suggesting we change the name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great point actually. I'm not sure "Fast" does capture the semantics in any case really: mostly startup will take marginally to a lot longer, but on the plus side the FSM will actually startup in the same state it was before the node restarted which is probably what most users of the library assumed was the case already!

If my understanding is accurate

Yeah I think you perfectly described the tradeoff. I think CommittedLogRecovery is a reasonable name for the thing and we can explain the semantics in the comments.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@schmichael @banks How about RecoveryCommitted?

Copy link
Member

@schmichael schmichael Oct 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming is the worst. 😅 Looking around a bit we're not totally consistent, but I think generally:

  1. Recovery refers to remediating a failure of some kind (RecoverCluster and many comments)
  2. Restore is the term used for this process: replaying logs on startup.

So I think switching Recovery -> Restore is appropriate.

After that I think I prefer RestoreCommittedLogs a tiny bit over RestoreCommitted since Committed alone is a bit ambiguous. I would accept either though!


// skipStartup allows NewRaft() to bypass all background work goroutines
skipStartup bool
}
Expand Down
30 changes: 30 additions & 0 deletions inmem_store.go
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,11 @@ package raft
import (
"errors"
"sync"
"sync/atomic"
)

var _ CommitTrackingLogStore = &InmemCommitTrackingStore{}

// InmemStore implements the LogStore and StableStore interface.
// It should NOT EVER be used for production. It is used only for
// unit tests. Use the MDBStore implementation instead.
Expand Down Expand Up @@ -131,3 +134,30 @@ func (i *InmemStore) GetUint64(key []byte) (uint64, error) {
defer i.l.RUnlock()
return i.kvInt[string(key)], nil
}

type commitIndexTrackingLog struct {
log *Log
CommitIndex uint64
}
type InmemCommitTrackingStore struct {
lalalalatt marked this conversation as resolved.
Show resolved Hide resolved
InmemStore
commitIndex atomic.Uint64
}

// NewInmemCommitTrackingStore returns a new in-memory backend that tracks the commit index. Do not ever
// use for production. Only for testing.
func NewInmemCommitTrackingStore() *InmemCommitTrackingStore {
i := &InmemCommitTrackingStore{
InmemStore: *NewInmemStore(),
}
return i
}

func (i *InmemCommitTrackingStore) StageCommitIndex(index uint64) error {
i.commitIndex.Store(index)
return nil
}

func (i *InmemCommitTrackingStore) GetCommitIndex() (uint64, error) {
return i.commitIndex.Load(), nil
}
lalalalatt marked this conversation as resolved.
Show resolved Hide resolved
21 changes: 21 additions & 0 deletions log.go
Original file line number Diff line number Diff line change
Expand Up @@ -190,3 +190,24 @@ func emitLogStoreMetrics(s LogStore, prefix []string, interval time.Duration, st
}
}
}

type CommitTrackingLogStore interface {
LogStore

// StageCommitIndex stages a new commit index to be persisted.
// The staged commit index MUST only be persisted in a manner that is atomic
// with the following StoreLogs call in the face of a crash.
// This allows the Raft implementation to optimize commit index updates
// without risking inconsistency between the commit index and the log entries.
//
// The implementation MUST NOT persist this value separately from the log entries.
// Instead, it should stage the value to be written atomically with the next
// StoreLogs call.
//
// GetCommitIndex MUST never return a value higher than the last index in the log,
// even if a higher value has been staged with this method.
//
// idx is the new commit index to stage.
StageCommitIndex(idx uint64) error
dhiaayachi marked this conversation as resolved.
Show resolved Hide resolved
GetCommitIndex() (uint64, error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would a GetCommitIndex() could be implemented in a real store? Would it read the latest stored log and return the commit index associated to it? What if it don't find any because those logs were stored using a store that don't support fast-recovery?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For BoltDB, I imagine commit index would be a single KV in a separate bucket from logs so it would just read that and return it.

For WAL I anticipated extending the format slightly so that each commit entry in the log stores the most recently staged commit index and then re-populated that into memory when we open the log an scan it like we do with indexes.

If there is no commit index stored, we should just return 0, nil which is always safe and has the same behavior as current code I think right?

Copy link
Contributor

@dhiaayachi dhiaayachi Oct 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is no commit index stored, we should just return 0, nil which is always safe and has the same behaviour as current code I think right?

I agree! I think that should be documented though. Because the API allow erroring on GetCommitIndex() it could easily be mistaken as a possible error case.

}
22 changes: 22 additions & 0 deletions raft.go
Original file line number Diff line number Diff line change
Expand Up @@ -1262,6 +1262,8 @@
r.leaderState.inflight.PushBack(applyLog)
}

r.tryStageCommitIndex()

// Write the log entry locally
if err := r.logs.StoreLogs(logs); err != nil {
r.logger.Error("failed to commit logs", "error", err)
Expand Down Expand Up @@ -1385,6 +1387,21 @@
return nil
}

// tryStageCommitIndex updates the commit index in persist store if fast recovery is enabled and log store implements CommitTrackingLogStore.
func (r *Raft) tryStageCommitIndex() {
commitIndex := r.getCommitIndex()
if !r.fastRecovery {
return
}
store, ok := r.logs.(CommitTrackingLogStore)
if !ok {
return
}
if err := store.StageCommitIndex(commitIndex); err != nil {
r.logger.Error("failed to stage commit index in commit tracking log store", "index", commitIndex, "error", err)
}
}

// processRPC is called to handle an incoming RPC request. This must only be
// called from the main thread.
func (r *Raft) processRPC(rpc RPC) {
Expand Down Expand Up @@ -1535,6 +1552,11 @@
}

if n := len(newEntries); n > 0 {
// Stage the future commit index if possible
lastNewIndex := newEntries[len(newEntries)-1].Index
commitIndex := min(a.LeaderCommitIndex, lastNewIndex)
r.tryStageCommitIndex(commitIndex)

Check failure on line 1558 in raft.go

View workflow job for this annotation

GitHub Actions / go-fmt-and-vet

too many arguments in call to r.tryStageCommitIndex

// Append the new entries
if err := r.logs.StoreLogs(newEntries); err != nil {
r.logger.Error("failed to append to logs", "error", err)
Expand Down
178 changes: 178 additions & 0 deletions raft_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -1095,6 +1095,184 @@ func TestRaft_RestoreSnapshotOnStartup_Monotonic(t *testing.T) {
assert.Equal(t, lastIdx, last)
}

func TestRaft_RestoreSnapshotOnStartup_CommitTrackingLogs(t *testing.T) {
// Make the cluster
conf := inmemConfig(t)
conf.TrailingLogs = 10
opts := &MakeClusterOpts{
Peers: 1,
Bootstrap: true,
Conf: conf,
CommitTrackingLogs: true,
}
c := MakeClusterCustom(t, opts)
defer c.Close()

leader := c.Leader()

// Commit a lot of things
var future Future
for i := 0; i < 100; i++ {
future = leader.Apply([]byte(fmt.Sprintf("test%d", i)), 0)
}

// Wait for the last future to apply
if err := future.Error(); err != nil {
t.Fatalf("err: %v", err)
}

// Take a snapshot
snapFuture := leader.Snapshot()
if err := snapFuture.Error(); err != nil {
t.Fatalf("err: %v", err)
}

// Check for snapshot
snaps, _ := leader.snapshots.List()
if len(snaps) != 1 {
t.Fatalf("should have a snapshot")
}
snap := snaps[0]

// Logs should be trimmed
firstIdx, err := leader.logs.FirstIndex()
if err != nil {
t.Fatalf("err: %v", err)
}
lastIdx, err := leader.logs.LastIndex()
if err != nil {
t.Fatalf("err: %v", err)
}

if firstIdx != snap.Index-conf.TrailingLogs+1 {
t.Fatalf("should trim logs to %d: but is %d", snap.Index-conf.TrailingLogs+1, firstIdx)
}

// Shutdown
shutdown := leader.Shutdown()
if err := shutdown.Error(); err != nil {
t.Fatalf("err: %v", err)
}

// Restart the Raft
r := leader
// Can't just reuse the old transport as it will be closed
_, trans2 := NewInmemTransport(r.trans.LocalAddr())
cfg := r.config()
r, err = NewRaft(&cfg, r.fsm, r.logs, r.stable, r.snapshots, trans2)
if err != nil {
t.Fatalf("err: %v", err)
}
c.rafts[0] = r

// We should have restored from the snapshot!
if last := r.getLastApplied(); last != snap.Index {
t.Fatalf("bad last index: %d, expecting %d", last, snap.Index)
}

// Verify that logs have not been reset
first, _ := r.logs.FirstIndex()
last, _ := r.logs.LastIndex()
assert.Equal(t, firstIdx, first)
assert.Equal(t, lastIdx, last)
}

func TestRaft_FastRecovery(t *testing.T) {
// Make the cluster
conf := inmemConfig(t)
conf.TrailingLogs = 10
conf.FastRecovery = true
opts := &MakeClusterOpts{
Peers: 1,
Bootstrap: true,
Conf: conf,
CommitTrackingLogs: true,
}
c := MakeClusterCustom(t, opts)
defer c.Close()

leader := c.Leader()

// Commit a lot of things
var future Future
for i := 0; i < 100; i++ {
future = leader.Apply([]byte(fmt.Sprintf("test%d", i)), 0)
}

// Wait for the last future to apply
if err := future.Error(); err != nil {
t.Fatalf("err: %v", err)
}

// Take a snapshot
snapFuture := leader.Snapshot()
if err := snapFuture.Error(); err != nil {
t.Fatalf("err: %v", err)
}

// Check for snapshot
snaps, _ := leader.snapshots.List()
if len(snaps) != 1 {
t.Fatalf("should have a snapshot")
}
snap := snaps[0]

// Logs should be trimmed
firstIdx, err := leader.logs.FirstIndex()
if err != nil {
t.Fatalf("err: %v", err)
}

if firstIdx != snap.Index-conf.TrailingLogs+1 {
t.Fatalf("should trim logs to %d: but is %d", snap.Index-conf.TrailingLogs+1, firstIdx)
}

// Commit a lot of things (for fast recovery test)
for i := 0; i < 100; i++ {
future = leader.Apply([]byte(fmt.Sprintf("test%d", i)), 0)
}

// Wait for the last future to apply
if err := future.Error(); err != nil {
t.Fatalf("err: %v", err)
}

// Shutdown
shutdown := leader.Shutdown()
if err := shutdown.Error(); err != nil {
t.Fatalf("err: %v", err)
}

// Restart the Raft
r := leader
// Can't just reuse the old transport as it will be closed
_, trans2 := NewInmemTransport(r.trans.LocalAddr())
cfg := r.config()
r, err = NewRaft(&cfg, r.fsm, r.logs, r.stable, r.snapshots, trans2)
if err != nil {
t.Fatalf("err: %v", err)
}
c.rafts[0] = r

store, ok := r.logs.(CommitTrackingLogStore)
if !ok {
t.Fatal("err: raft log store does not implement CommitTrackingLogStore interface")
}
Comment on lines +1260 to +1263
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 This is fine, but I'd probably have left it like it was before. The behaviour of an unchecked type assert like we had before (i.e. no ok assignment) would be to panic with a message that's basically the same as the error here. That should have been sufficient to fail the test anyway. The most puzzling thing to me is that it didn't. Perhaps you didn't actually run this locally before the interface implementation was fixed and then it was masked by a different error in CI (which is sadly often flaky).

We can leave it like this for now though I guess at least it doesn't stop the entire test run which is arguably nicer although something that was a programming mistake that would never work is fine to panic on in tests IMO.

commitIdx, err := store.GetCommitIndex()
// We should have applied all committed logs
if last := r.getLastApplied(); last != commitIdx {
t.Fatalf("bad last index: %d, expecting %d", last, commitIdx)
}

// Expect: snap.Index --- commitIdx --- lastIdx
lastIdx, err := r.logs.LastIndex()
if err != nil {
t.Fatalf("err: %v", err)
}
assert.LessOrEqual(t, snap.Index, commitIdx)
assert.LessOrEqual(t, commitIdx, lastIdx)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming these tests are failing right now because the InmemStore wasn't updated to match the new interface right? If they are passing for you then it might be worth a look!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It passes😱

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @banks
First I want to thanks that you reviewed so many changes and gave lots of advices and opinions. 😁


Back to this topic, I think this test case is ok🤔.

Because basically the commit index stored in the store would always lower than the last log index in store (there is always one StoreLogs call lag).

As the comment "Expect: snap.Index --- commitIdx --- lastIdx" leaved, I think we can't sure what is the exact position of the commitIndex in every test, so I just test the interval only.
But now we find this would lead to another problem - we can't even detect the commit store interface and its implementation aren't match😱.
We can simply solve that detection issue by changing assert.LessOrEqual to assert.Less, but that would another flaky problem because of the uncertainty of the commit index.

What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 I think the assertion is fine as it is, the real issue seems to be the line above:

commitIdx, err := r.logs.(CommitTrackingLogStore).GetCommitIndex()

I'm not sure how that didn't just panic if the log store actually didn't implement the interface 🤷 .

I don't think we need to change the assertions.


func TestRaft_SnapshotRestore_Progress(t *testing.T) {
// Make the cluster
conf := inmemConfig(t)
Expand Down
Loading
Loading