Remove querier wait time metric. #11233

jeschkies · 2023-11-15T12:48:15Z

What this PR does / why we need it:
We would like to know how long a querier worker is idle to understand if workstealing would have an impact. The original metric was too noisy and its cardinality was too high. Instead, we are going to log the wait time.

Checklist

Reviewed the CONTRIBUTING.md guide (required)
Documentation added
Tests updated
CHANGELOG.md updated
- If the change is worth mentioning in the release notes, add add-to-release-notes label
Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
For Helm chart changes bump the Helm chart version in production/helm/loki/Chart.yaml and update production/helm/loki/CHANGELOG.md and production/helm/loki/README.md. Example PR
If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

ashwanthgoli · 2023-11-15T13:09:15Z

pkg/querier/worker/processor_manager.go

@@ -64,7 +65,9 @@ func (pm *processorManager) concurrency(n int) {
 		n = 0
 	}

+	workerId := 0


i am guessing the workerId does not matter much here. but there is a chance here of re-using workerIds if the concurrency value changes.

Good question. I wonder what happens then. Are we restarting the querier?

when more schedulers get added? which means each worker gets a smaller concurrency value.
not something that is often done in practice, so it should be alright I think

ashwanthgoli

lgtm

pkg/querier/worker/scheduler_processor.go

Co-authored-by: Danny Kopping <[email protected]>

…-time

…karsten/move-queuing-time

github-actions · 2023-11-21T21:09:20Z

Trivy scan found the following vulnerabilities:

HIGH openssl: Incorrect cipher key and IV length processing in libcrypto3 v3.1.3-r0. Fixed in v3.1.4-r0
HIGH openssl: Incorrect cipher key and IV length processing in libssl3 v3.1.3-r0. Fixed in v3.1.4-r0

…-time

cstyan

would the new native histograms be helpful here as well? these are much less costly and have less relative error between buckets

there's essentially a single series per native histogram metric and buckets are populated sparsely in storage

cstyan · 2023-12-04T19:31:30Z

pkg/querier/worker/frontend_processor.go

@@ -58,7 +58,7 @@ func (fp *frontendProcessor) notifyShutdown(ctx context.Context, conn *grpc.Clie
 }

 // runOne loops, trying to establish a stream to the frontend to begin request processing.
-func (fp *frontendProcessor) processQueriesOnSingleStream(ctx context.Context, conn *grpc.ClientConn, address string) {
+func (fp *frontendProcessor) processQueriesOnSingleStream(ctx context.Context, conn *grpc.ClientConn, address, _ string) {


is the _ param intentional here?

Yes, processQueriesOnSingleStream implements the processor interface and we don't use the worker ID in the case of the frontend processor.

cstyan · 2023-12-04T19:31:49Z

pkg/querier/worker/frontend_processor_test.go

@@ -39,7 +39,7 @@ func TestRecvFailDoesntCancelProcess(t *testing.T) {
 		running.Store(true)
 		defer running.Store(false)

-		mgr.processQueriesOnSingleStream(ctx, cc, "test:12345")
+		mgr.processQueriesOnSingleStream(ctx, cc, "test:12345", "1")


is "1" necessary?

You are right. Since it's ignored we don't need it here.

…-time

**What this PR does / why we need it**: We would like to know how long a querier worker is idle to understand if workstealing would have an impact. The original metric was too noisy and its cardinality was too high. Instead, we are going to log the wait time. **Checklist** - [ ] Reviewed the [`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md) guide (**required**) - [ ] Documentation added - [ ] Tests updated - [ ] `CHANGELOG.md` updated - [ ] If the change is worth mentioning in the release notes, add `add-to-release-notes` label - [ ] Changes that require user attention or interaction to upgrade are documented in `docs/sources/setup/upgrade/_index.md` - [ ] For Helm chart changes bump the Helm chart version in `production/helm/loki/Chart.yaml` and update `production/helm/loki/CHANGELOG.md` and `production/helm/loki/README.md`. [Example PR](grafana@d10549e) - [ ] If the change is deprecating or removing a configuration option, update the `deprecated-config.yaml` and `deleted-config.yaml` files respectively in the `tools/deprecated-config-checker` directory. [Example PR](grafana@0d4416a) --------- Co-authored-by: Danny Kopping <[email protected]>

Remove querier wait time metric.

df98efe

jeschkies requested a review from dannykopping November 15, 2023 12:48

jeschkies requested a review from a team as a code owner November 15, 2023 12:48

pull-request-size bot added the size/M label Nov 15, 2023

ashwanthgoli reviewed Nov 15, 2023

View reviewed changes

ashwanthgoli approved these changes Nov 15, 2023

View reviewed changes

dannykopping reviewed Nov 15, 2023

View reviewed changes

pkg/querier/worker/scheduler_processor.go Outdated Show resolved Hide resolved

jeschkies and others added 3 commits November 15, 2023 21:30

Update pkg/querier/worker/scheduler_processor.go

0c750b8

Co-authored-by: Danny Kopping <[email protected]>

Merge remote-tracking branch 'grafana/main' into karsten/move-queuing…

9ceecbb

…-time

Merge remote-tracking branch 'origin/karsten/move-queuing-time' into …

2aa2df2

…karsten/move-queuing-time

jeschkies enabled auto-merge (squash) November 22, 2023 07:26

jeschkies added 2 commits November 22, 2023 08:35

Rename workID

ab87075

Merge remote-tracking branch 'grafana/main' into karsten/move-queuing…

fb55650

…-time

cstyan reviewed Dec 4, 2023

View reviewed changes

jeschkies added 3 commits December 6, 2023 21:13

Merge remote-tracking branch 'grafana/main' into karsten/move-queuing…

bf96304

…-time

Pass empty string.

a232309

Rename worker ID

1b13901

jeschkies merged commit 5b8d0e6 into grafana:main Dec 6, 2023
7 checks passed

jeschkies deleted the karsten/move-queuing-time branch December 7, 2023 05:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove querier wait time metric. #11233

Remove querier wait time metric. #11233

jeschkies commented Nov 15, 2023

ashwanthgoli Nov 15, 2023

jeschkies Nov 15, 2023

ashwanthgoli Nov 15, 2023

ashwanthgoli left a comment

github-actions bot commented Nov 21, 2023 •

edited

Loading

cstyan left a comment

cstyan Dec 4, 2023

jeschkies Dec 6, 2023

cstyan Dec 4, 2023

jeschkies Dec 6, 2023

Remove querier wait time metric. #11233

Remove querier wait time metric. #11233

Conversation

jeschkies commented Nov 15, 2023

ashwanthgoli Nov 15, 2023

Choose a reason for hiding this comment

jeschkies Nov 15, 2023

Choose a reason for hiding this comment

ashwanthgoli Nov 15, 2023

Choose a reason for hiding this comment

ashwanthgoli left a comment

Choose a reason for hiding this comment

github-actions bot commented Nov 21, 2023 • edited Loading

cstyan left a comment

Choose a reason for hiding this comment

cstyan Dec 4, 2023

Choose a reason for hiding this comment

jeschkies Dec 6, 2023

Choose a reason for hiding this comment

cstyan Dec 4, 2023

Choose a reason for hiding this comment

jeschkies Dec 6, 2023

Choose a reason for hiding this comment

github-actions bot commented Nov 21, 2023 •

edited

Loading