Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broader notion of retryable exception #1174

Open
tadamcz opened this issue Jan 22, 2025 · 10 comments
Open

Broader notion of retryable exception #1174

tadamcz opened this issue Jan 22, 2025 · 10 comments

Comments

@tadamcz
Copy link
Contributor

tadamcz commented Jan 22, 2025

Currently, we retry a provider error if it's a rate limit:

retry=retry_if_exception(self.api.is_rate_limit),

retry=retry_if_exception(self.api.is_rate_limit),

But there are other errors that should be retried, such as undifferentiated 500s, arguably 502s, timeouts, and things like:

httpx.RemoteProtocolError: Server disconnected without sending a response.

Currently these notions are sometimes conflated in the code, e.g. here a ReadTimeOut is treated as a rate limit:

    @override
    def is_rate_limit(self, ex: BaseException) -> bool:
        return (
            isinstance(ex, SDKError)
            and ex.status_code == 429
            or isinstance(ex, ReadTimeout | AsyncReadTimeout)
        )

@override
def is_rate_limit(self, ex: BaseException) -> bool:
return (
isinstance(ex, SDKError)
and ex.status_code == 429
or isinstance(ex, ReadTimeout | AsyncReadTimeout)
)


My proposal is to introduce a new method should_retry to ModelAPI to capture the broader notion of an exception that should be retried. If you agree with the idea, I can take a crack at implementing this.

In the in-progress display, we should probably also replace HTTP Rate Limits with HTTP Retries or something.

@jjallaire
Copy link
Collaborator

Yes, that method should be renamed should_retry as its only used for determining whether we should retry. So in fact the "read timeout" isn't treated as a rate limit (that rate limit counter actually looks at 429 responses at a lower level). Renaming the method will clean up those semantics.

Sadly it's kind of an empirical question for each provider which errors are in fact retryable. The AsyncReadTimeout is something we noticed that Mistral just randomly had in its stack and it was always recoverable. Here is a method that faithfully implements the original recommendations from Google on retries when interacting w/ GCP:

def httpx_should_retry(ex: BaseException) -> bool:
.

Ideally each provider could apply this heuristic + whatever custom heuristics are required. The current as you no doubt noted is kind of scattershot. Ideally we can intercept HTTP status code bearing exceptions for all providers and apply the heuristics in the function linked to above. Then, in addition we can add other exception types over time that we've noticed are retryable.

@tadamcz
Copy link
Contributor Author

tadamcz commented Jan 22, 2025

Yes, that method should be renamed should_retry as its only used for determining whether we should retry.

That works as well, if you're OK with a breaking change to the API.

that rate limit counter actually looks at 429 responses at a lower level

Didn't realise this! Thanks

Ideally each provider could apply this heuristic + whatever custom heuristics are required.

Yep, this makes sense.

@tadamcz
Copy link
Contributor Author

tadamcz commented Jan 22, 2025

that rate limit counter actually looks at 429 responses at a lower level

It would be useful to display the number of retried responses, so that there is some indication of what's going on if lots of requests are being retried without being 429s. OTOH, we don't want to overload the in-progress display with an ever-increasing amount of information. If we're going to have just one, IMO the total number retried is more important than the number rate limited. What do you think?

@jjallaire
Copy link
Collaborator

You might want to hold off on this only because we've got another set of related changes we want to make soon: being able to detect exactly the time taken for inference (vs. retries) on a call to generate(). This is going to be either complicated (because model packages do their own retrying which is not detectable without either monkey patching or scraping debug logs) or will involve our turning off all retries and handling them ourselves (but this isn't a great path either b/c some of the model packages do very fancy retrying with special HTTP headers that it would be a huge amount of work to emulate).

Anyway, all the code related to retrying/rate limits is due for an overhaul (and this is the top priority of one of our biggest users). This will probably go down in the next 3-4 weeks so I would wait for this to land (or even be in progress and then we can work on it together).

@jjallaire
Copy link
Collaborator

That works as well, if you're OK with a breaking change to the API.

I'd just rename the stuff inside Inspect but still call the old API for backwards compatibility.

@tadamcz
Copy link
Contributor Author

tadamcz commented Jan 23, 2025

This will probably go down in the next 3-4 weeks so I would wait for this to land (or even be in progress and then we can work on it together).

Sure, happy to wait. Keep us posted on this issue.

@tadamcz
Copy link
Contributor Author

tadamcz commented Jan 23, 2025

I'd just rename the stuff inside Inspect but still call the old API for backwards compatibility.

Sorry, how would this work? At first glance I don't see how this would work with Python inheritance (unless doing some really gnarly introspection stuff that I don't think we should do). Could you post a code snippet demonstrating what you mean?

@jjallaire
Copy link
Collaborator

Yes, some introspection (that's very frequently what you need to do to provide graceful backward compatibility). We want both of these things to be true: we almost never break people and we evolve our APIs over time to make them more elegant. Upholding those principles is IMO more important than the principle of "never do anything gnarly".

@tadamcz
Copy link
Contributor Author

tadamcz commented Jan 23, 2025

I see, and fair enough. What would this look like? A code snippet would be perfect for me to learn about this.

@jjallaire
Copy link
Collaborator

Something like this:

def is_overridden(method_name: str, subclass: Type, base_class: Type) -> bool:
    return getattr(subclass, method_name).__func__ != getattr(base_class, method_name).__func__

def is_rate_limit_overridden(model: ModelAPI) -> bool:
    return is_overridden("is_rate_limit", type(model), ModelAPI)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants