fix: Resolve some data races in chunk accesses #15080

benclive · 2024-11-22T17:06:11Z

What this PR does / why we need it:

Fixes 3 data races highlighted by running Loki with -race.
None of them are major
- One is locking incorrectly and reading stale data (GetStats)
- The other two are related to encoding a chunk which writes to the offset field. We don't explicitly read the offset field elsewhere but we do iterate over c.blocks in a bunch of places which may be incorrect if it has been updated recently.

benclive · 2024-11-22T17:08:46Z

pkg/ingester/checkpoint.go

@@ -239,8 +239,8 @@ func (s *streamIterator) Next() bool {
 	// remove the first stream
 	s.instances[0].streams = s.instances[0].streams[1:]

-	stream.chunkMtx.RLock()
-	defer stream.chunkMtx.RUnlock()
+	stream.chunkMtx.Lock()


toWireChunk writes to c.blocks[*].offset on pkg/chunkenc/memchunk.go:674
Races with chunk.Bounds(), which doesn't strictly read this value, but does iterate over c.blocks.

I briefly looked at memchunk.go to see if we could avoid the write to offset, but its not easy to do. Offset is expected to be set after encoding a chunk so all the tests fail if you keep track of offsets separately.

benclive · 2024-11-22T17:09:35Z

pkg/ingester/flush.go

@@ -441,9 +441,11 @@ func (i *Ingester) flushChunks(ctx context.Context, fp model.Fingerprint, labelP
 		)

 		// encodeChunk mutates the chunk so we must pass by reference
+		chunkMtx.Lock()


Same as toWireChunks. encodeChunk ultimately writes to c.blocks[*].offset in pkg/chunkenc/memchunk.go:674

benclive · 2024-11-22T17:11:36Z

pkg/ingester/instance.go

@@ -737,8 +737,9 @@ func (i *instance) getStats(ctx context.Context, req *logproto.IndexStatsRequest

 	if err = i.forMatchingStreams(ctx, from, matchers, nil, func(s *stream) error {
 		// Consider streams which overlap our time range
+		s.chunkMtx.RLock()


We read s.chunk.Bounds() in shouldConsiderStreams

but Bounds() already acquires the RLock here. do we still need this?

Good spot - we probably don't need this change then. Grabbing the write lock when encoding should be enough since that is what causes the conflict.

cyriltovena · 2024-11-28T15:26:22Z

pkg/ingester/flush.go

@@ -441,9 +441,11 @@ func (i *Ingester) flushChunks(ctx context.Context, fp model.Fingerprint, labelP
 		)

 		// encodeChunk mutates the chunk so we must pass by reference
+		chunkMtx.Lock()
 		if err := i.encodeChunk(ctx, &ch, c); err != nil {
 			return err


should you unlock here too ?

Yep, on line 448 below. I can wrap this in a func if you prefer to use defer

Fix some data races in chunks

c8684e4

pull-request-size bot added the size/S label Nov 22, 2024

benclive commented Nov 22, 2024

View reviewed changes

benclive changed the title ~~Fix some data races in chunks~~ fix: Resolve some data races in chunks Nov 25, 2024

benclive changed the title ~~fix: Resolve some data races in chunks~~ fix: Resolve some data races in chunk accesses Nov 28, 2024

benclive marked this pull request as ready for review November 28, 2024 14:37

benclive requested a review from a team as a code owner November 28, 2024 14:37

cyriltovena reviewed Nov 28, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Resolve some data races in chunk accesses #15080

fix: Resolve some data races in chunk accesses #15080

benclive commented Nov 22, 2024 •

edited

Loading

benclive Nov 22, 2024

benclive Nov 28, 2024

benclive Nov 22, 2024

benclive Nov 22, 2024

ashwanthgoli Nov 29, 2024

benclive Nov 29, 2024

cyriltovena Nov 28, 2024

benclive Nov 28, 2024

fix: Resolve some data races in chunk accesses #15080

Are you sure you want to change the base?

fix: Resolve some data races in chunk accesses #15080

Conversation

benclive commented Nov 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benclive commented Nov 22, 2024 •

edited

Loading