-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove querier wait time metric. #11233
Conversation
@@ -64,7 +65,9 @@ func (pm *processorManager) concurrency(n int) { | |||
n = 0 | |||
} | |||
|
|||
workerId := 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i am guessing the workerId does not matter much here. but there is a chance here of re-using workerIds if the concurrency value changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. I wonder what happens then. Are we restarting the querier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when more schedulers get added? which means each worker gets a smaller concurrency value.
not something that is often done in practice, so it should be alright I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Co-authored-by: Danny Kopping <[email protected]>
…karsten/move-queuing-time
Trivy scan found the following vulnerabilities:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would the new native histograms be helpful here as well? these are much less costly and have less relative error between buckets
there's essentially a single series per native histogram metric and buckets are populated sparsely in storage
@@ -58,7 +58,7 @@ func (fp *frontendProcessor) notifyShutdown(ctx context.Context, conn *grpc.Clie | |||
} | |||
|
|||
// runOne loops, trying to establish a stream to the frontend to begin request processing. | |||
func (fp *frontendProcessor) processQueriesOnSingleStream(ctx context.Context, conn *grpc.ClientConn, address string) { | |||
func (fp *frontendProcessor) processQueriesOnSingleStream(ctx context.Context, conn *grpc.ClientConn, address, _ string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the _
param intentional here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, processQueriesOnSingleStream
implements the processor
interface and we don't use the worker ID in the case of the frontend processor.
@@ -39,7 +39,7 @@ func TestRecvFailDoesntCancelProcess(t *testing.T) { | |||
running.Store(true) | |||
defer running.Store(false) | |||
|
|||
mgr.processQueriesOnSingleStream(ctx, cc, "test:12345") | |||
mgr.processQueriesOnSingleStream(ctx, cc, "test:12345", "1") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is "1"
necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. Since it's ignored we don't need it here.
**What this PR does / why we need it**: We would like to know how long a querier worker is idle to understand if workstealing would have an impact. The original metric was too noisy and its cardinality was too high. Instead, we are going to log the wait time. **Checklist** - [ ] Reviewed the [`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md) guide (**required**) - [ ] Documentation added - [ ] Tests updated - [ ] `CHANGELOG.md` updated - [ ] If the change is worth mentioning in the release notes, add `add-to-release-notes` label - [ ] Changes that require user attention or interaction to upgrade are documented in `docs/sources/setup/upgrade/_index.md` - [ ] For Helm chart changes bump the Helm chart version in `production/helm/loki/Chart.yaml` and update `production/helm/loki/CHANGELOG.md` and `production/helm/loki/README.md`. [Example PR](grafana@d10549e) - [ ] If the change is deprecating or removing a configuration option, update the `deprecated-config.yaml` and `deleted-config.yaml` files respectively in the `tools/deprecated-config-checker` directory. [Example PR](grafana@0d4416a) --------- Co-authored-by: Danny Kopping <[email protected]>
What this PR does / why we need it:
We would like to know how long a querier worker is idle to understand if workstealing would have an impact. The original metric was too noisy and its cardinality was too high. Instead, we are going to log the wait time.
Checklist
CONTRIBUTING.md
guide (required)CHANGELOG.md
updatedadd-to-release-notes
labeldocs/sources/setup/upgrade/_index.md
production/helm/loki/Chart.yaml
and updateproduction/helm/loki/CHANGELOG.md
andproduction/helm/loki/README.md
. Example PRdeprecated-config.yaml
anddeleted-config.yaml
files respectively in thetools/deprecated-config-checker
directory. Example PR