-
Notifications
You must be signed in to change notification settings - Fork 205
Description
Hi thanks for the project!
When working on the sgl-model-gateway in https://github.com/sgl-project/sglang, I would like to expose some metrics for each worker, e.g. the circuit breaker state / the health per worker. But the workers are added and deleted dynamically, and it is needed to remove the metric for the deleted worker, otherwise the values are weird (e.g. if we set health to zero, then we cannot know whether it is a unhealthy existing worker, or an already deleted worker. same for circuit breaker state - it use zero for closed state).
Looking at the code, I guess the simplest way may be to expose the Registry.delete_counter/delete_gauge/... methods, or alternatively expose the registry object. I am happy to PR if this looks good to you.
The Recency may not be great for this case, b/c firstly it uses a coarse mask but we may want to have more fine grained control. Secondly, if the gateway does not have incoming request for a while, we definitely do not want the metrics to be deleted, since all workers are still there.