Recording pauses for rate-limiting and transient errors #1139

jjallaire · 2025-01-19T13:29:06Z

When measuring how long it takes for a model to complete a task, it's important to factor out rate limits and recovery from other transient errors. We want to be able to both report total actual time taken as well as have time limits enforced based on this actual time.

One significant confounder here is that its not enough to measure when we are backing off, most model APIs do their own internal backoff which is not visible to callers. For example, here is Anthropic's retry handling:

https://github.com/anthropics/anthropic-sdk-python/blob/af3912c5dd97fa821bc6cc8539ee835a28a96edc/src/anthropic/_base_client.py#L1654

This in turn uses a set of custom HTTP headers to determine retry intervals: https://github.com/anthropics/anthropic-sdk-python/blob/af3912c5dd97fa821bc6cc8539ee835a28a96edc/src/anthropic/_base_client.py#L1654

If we want to know about these retries we need to somehow hook into the http client at a much lower level. This will need to be done per-model provider (as e.g Google, Bedrock, and Azure all use bespoke HTTP clients and strategies).

jjallaire self-assigned this Jan 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recording pauses for rate-limiting and transient errors #1139

Recording pauses for rate-limiting and transient errors #1139

jjallaire commented Jan 19, 2025

Recording pauses for rate-limiting and transient errors #1139

Recording pauses for rate-limiting and transient errors #1139

Comments

jjallaire commented Jan 19, 2025