Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Resolve some data races in chunk accesses #15080

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

benclive
Copy link
Contributor

@benclive benclive commented Nov 22, 2024

What this PR does / why we need it:

  • Fixes 3 data races highlighted by running Loki with -race.
  • None of them are major
    • One is locking incorrectly and reading stale data (GetStats)
    • The other two are related to encoding a chunk which writes to the offset field. We don't explicitly read the offset field elsewhere but we do iterate over c.blocks in a bunch of places which may be incorrect if it has been updated recently.

@@ -239,8 +239,8 @@ func (s *streamIterator) Next() bool {
// remove the first stream
s.instances[0].streams = s.instances[0].streams[1:]

stream.chunkMtx.RLock()
defer stream.chunkMtx.RUnlock()
stream.chunkMtx.Lock()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

toWireChunk writes to c.blocks[*].offset on pkg/chunkenc/memchunk.go:674
Races with chunk.Bounds(), which doesn't strictly read this value, but does iterate over c.blocks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I briefly looked at memchunk.go to see if we could avoid the write to offset, but its not easy to do. Offset is expected to be set after encoding a chunk so all the tests fail if you keep track of offsets separately.

@@ -441,9 +441,11 @@ func (i *Ingester) flushChunks(ctx context.Context, fp model.Fingerprint, labelP
)

// encodeChunk mutates the chunk so we must pass by reference
chunkMtx.Lock()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as toWireChunks. encodeChunk ultimately writes to c.blocks[*].offset in pkg/chunkenc/memchunk.go:674

@@ -737,8 +737,9 @@ func (i *instance) getStats(ctx context.Context, req *logproto.IndexStatsRequest

if err = i.forMatchingStreams(ctx, from, matchers, nil, func(s *stream) error {
// Consider streams which overlap our time range
s.chunkMtx.RLock()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We read s.chunk.Bounds() in shouldConsiderStreams

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but Bounds() already acquires the RLock here. do we still need this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good spot - we probably don't need this change then. Grabbing the write lock when encoding should be enough since that is what causes the conflict.

@benclive benclive changed the title Fix some data races in chunks fix: Resolve some data races in chunks Nov 25, 2024
@benclive benclive changed the title fix: Resolve some data races in chunks fix: Resolve some data races in chunk accesses Nov 28, 2024
@benclive benclive marked this pull request as ready for review November 28, 2024 14:37
@benclive benclive requested a review from a team as a code owner November 28, 2024 14:37
@@ -441,9 +441,11 @@ func (i *Ingester) flushChunks(ctx context.Context, fp model.Fingerprint, labelP
)

// encodeChunk mutates the chunk so we must pass by reference
chunkMtx.Lock()
if err := i.encodeChunk(ctx, &ch, c); err != nil {
return err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should you unlock here too ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, on line 448 below. I can wrap this in a func if you prefer to use defer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants