-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Mitigate ingester race between Query & GetChunkIDs #15178
Conversation
@@ -35,6 +35,10 @@ import ( | |||
util_log "github.com/grafana/loki/v3/pkg/util/log" | |||
) | |||
|
|||
var defaultQuorumConfig = ring.DoUntilQuorumConfig{ | |||
// Nothing here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think redundant comment. If we do want a comment I would write something like this (finishing the sentence):
// Nothing here | |
// We use the usable, default value of `DoUntilQuorumConfig` as ... | |
var defaultQuorumConfig = ring.DoUntilQuorumConfig{} |
@@ -79,7 +83,8 @@ func newIngesterQuerier(querierConfig Config, clientCfg client.Config, ring ring | |||
} | |||
|
|||
// forAllIngesters runs f, in parallel, for all ingesters | |||
func (q *IngesterQuerier) forAllIngesters(ctx context.Context, f func(context.Context, logproto.QuerierClient) (interface{}, error)) ([]responseFromIngesters, error) { | |||
// waitForAllResponses param can be used to require results from all ingesters in the replication set. If this is set to false, the call will return as soon as we have a quorum by zone. Only valid for partition-ingesters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can improve this comment, let's discuss offline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offline, we will follow up with some better comments in another PR.
} | ||
|
||
// forGivenIngesterSets runs f, in parallel, for given ingester sets | ||
func (q *IngesterQuerier) forGivenIngesterSets(ctx context.Context, replicationSet []ring.ReplicationSet, f func(context.Context, logproto.QuerierClient) (interface{}, error)) ([]responseFromIngesters, error) { | ||
// Enable minimize requests so we initially query a single ingester per replication set, as each replication-set is one partition. | ||
// waitForAllResponses param can be used to require results from all ingesters in all replication sets. If this is set to false, the call will return as soon as we have a quorum by zone. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here. Let's see if we can improve this comment. If I was unfamiliar with how this works, I would be surprised because results from all ingesters in all replication sets
contradicts quorum by zone
which by definition is a majority, not all.
What this PR does / why we need it:
Fixes a race condition between the Query and GetChunkIDs calls when using partition-ingesters.
The bug occurs when first Querying logs from an ingester that just flushed a chunk (returning 0 logs), followed by calling GetChunkIDs on an ingester that has not flushed a chunk (returning 0 chunk refs). The bug manifests by returning 0 results when some log lines do exist in other ingesters.
Special notes for your reviewer:
Checklist
CONTRIBUTING.md
guide (required)feat
PRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.docs/sources/setup/upgrade/_index.md
deprecated-config.yaml
anddeleted-config.yaml
files respectively in thetools/deprecated-config-checker
directory. Example PR