-
Notifications
You must be signed in to change notification settings - Fork 544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: Increased latencies on store-gatway after updating to 2.14.2 #10291
Comments
Could you expand on your environment: which object storage do you run on (s3, gcs, something on-prem)? Could you identify any particular object storage operations, where the latency increased (the "Mimir / Reads" dashboard should have the "Block object storage (stote-gateway)" set of panels)? Does the caching still work as expected? Anything particular that the "Mimir / Reads resources" dashboard shows? Also, do you happen to have traces that could reveal the source of latency? |
Hi @narqo. Happy new Year and Thanks for coming back to this issue. The environment we're running is S3. This is what the object storage related metrics of the Mimir Reads Dashboard look like for the day on which we updated around 11:30. Caching worked fine as far as I can tell, we've turned off the Nothing fancy on the Read resources Dashboard. The Ruler consumes more CPU since the update and the Store-Gateway seems to hit We haven't set up anything to receive traces in Jaeger format within this cluster. Let me see if I can bring this up and provide you with some traces |
Turning off the chunks-cache should have had a massive impact to the point of making querying nearly unusable which makes me think that it has somehow not been working since the upgrade. Can you share the portion of the "Mimir / Reads" dashboard that includes chunks and index caching info? It should include RPS, latency, and hit ratios. |
This was a bug in the object storage client. Annoying but ultimately not causing a problem. More concerning to me is that it doesn't seem like the index-cache is being used at all from your screenshots? That cache also has a large impact on query latency. Can you confirm that you've setup the index-cache and it's being used? |
We did not yet set up the |
We've turned on the I somehow failed getting some Mimir traces into our Tempo instance using the following at the store-gateway
and using alloy's receiver for jaeger
Something obvious I'm missing here? Live-Debugging for alloy looked like nothing was ever received. While having debug logs turned on for the store-gateway I noticed some logs using a filter of duration > 10s:
and sometimes also
Does this help to narrow it down? |
What is the bug?
Hi guys, we've recently upgraded Mimir from
2.13.0
>2.14.2
and noticed an sharp increase of latencies for the thestore-gateway
on the read path which we cannot make any reasoning of.Store-gateway
average
went up from 50ms -> 1s /p99
went up from 1s -> 25sWe're not having set any special configurations for the store-gateway. Did someone experience some similar issues?
How to reproduce it?
What did you think would happen?
What was your environment?
Any additional context to share?
No response
The text was updated successfully, but these errors were encountered: