Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recording pauses for rate-limiting and transient errors #1139

Open
jjallaire opened this issue Jan 19, 2025 · 0 comments
Open

Recording pauses for rate-limiting and transient errors #1139

jjallaire opened this issue Jan 19, 2025 · 0 comments
Assignees

Comments

@jjallaire
Copy link
Collaborator

When measuring how long it takes for a model to complete a task, it's important to factor out rate limits and recovery from other transient errors. We want to be able to both report total actual time taken as well as have time limits enforced based on this actual time.

One significant confounder here is that its not enough to measure when we are backing off, most model APIs do their own internal backoff which is not visible to callers. For example, here is Anthropic's retry handling:

https://github.com/anthropics/anthropic-sdk-python/blob/af3912c5dd97fa821bc6cc8539ee835a28a96edc/src/anthropic/_base_client.py#L1654

This in turn uses a set of custom HTTP headers to determine retry intervals: https://github.com/anthropics/anthropic-sdk-python/blob/af3912c5dd97fa821bc6cc8539ee835a28a96edc/src/anthropic/_base_client.py#L1654

If we want to know about these retries we need to somehow hook into the http client at a much lower level. This will need to be done per-model provider (as e.g Google, Bedrock, and Azure all use bespoke HTTP clients and strategies).

@jjallaire jjallaire self-assigned this Jan 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant