You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When measuring how long it takes for a model to complete a task, it's important to factor out rate limits and recovery from other transient errors. We want to be able to both report total actual time taken as well as have time limits enforced based on this actual time.
One significant confounder here is that its not enough to measure when we are backing off, most model APIs do their own internal backoff which is not visible to callers. For example, here is Anthropic's retry handling:
If we want to know about these retries we need to somehow hook into the http client at a much lower level. This will need to be done per-model provider (as e.g Google, Bedrock, and Azure all use bespoke HTTP clients and strategies).
The text was updated successfully, but these errors were encountered:
When measuring how long it takes for a model to complete a task, it's important to factor out rate limits and recovery from other transient errors. We want to be able to both report total actual time taken as well as have time limits enforced based on this actual time.
One significant confounder here is that its not enough to measure when we are backing off, most model APIs do their own internal backoff which is not visible to callers. For example, here is Anthropic's retry handling:
https://github.com/anthropics/anthropic-sdk-python/blob/af3912c5dd97fa821bc6cc8539ee835a28a96edc/src/anthropic/_base_client.py#L1654
This in turn uses a set of custom HTTP headers to determine retry intervals: https://github.com/anthropics/anthropic-sdk-python/blob/af3912c5dd97fa821bc6cc8539ee835a28a96edc/src/anthropic/_base_client.py#L1654
If we want to know about these retries we need to somehow hook into the http client at a much lower level. This will need to be done per-model provider (as e.g Google, Bedrock, and Azure all use bespoke HTTP clients and strategies).
The text was updated successfully, but these errors were encountered: