Skip to content
This repository has been archived by the owner on Apr 22, 2020. It is now read-only.

Schedule zmon check execution with some jitter / offset #663

Open
otrosien opened this issue Dec 6, 2018 · 1 comment
Open

Schedule zmon check execution with some jitter / offset #663

otrosien opened this issue Dec 6, 2018 · 1 comment
Labels
chore technical debts, operational excellence, compliance and minor security topics, re-factoring needs

Comments

@otrosien
Copy link

otrosien commented Dec 6, 2018

If a check has many entities, the amount of parallelism of the check execution can DoS the target service. Offer a way to (e.g. evenly) distribute the check execution throughout the check interval.

Currently zmon assumes that two entities are independent, and thus can be queried in parallel. But this assumption often does not hold.

Example 1: We have Elasticsearch data nodes as entities in zmon, and have checks that pull local stats from the entities. If all data nodes are queried at the same time, it will cause a lot of stress inside the Elasticsearch cluster, which can lead to user-facing latency / GC pauses.

Example 2: Our neighbour team has a check that queries all main zalando categories (as zmon entities) for currently returned page-1 items. This check cannot be properly rate-limited in zmon and causes request spikes in our search cluster.

@csenol
Copy link

csenol commented Dec 6, 2018

We have a workaround for Example 2 in application layer.
Maybe zmon can introduce sleep in utility functions

@pitr pitr added the chore technical debts, operational excellence, compliance and minor security topics, re-factoring needs label May 13, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
chore technical debts, operational excellence, compliance and minor security topics, re-factoring needs
Projects
None yet
Development

No branches or pull requests

3 participants