-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Client resiliance to provider server errors #88
Comments
Exponential backoff or some other retry mechanism? I've typically "handled" (major air quotes) these errors by making large requests multiple times over a given time period, but this is not ideal for anyone. The current escape sequence is ripe for improvement. |
As an aside, the current code using x = 200
x is 200
# true
y = 500
y is 500
# false was happening. Turns out, |
On topic... I wrote a very (very very) quick-and-dirty attempt at making the code a bit more retryable, including to Connection errors. Any feedback on what would be more idiomatic python is extremely welcome. This is in @staticmethod
def retryable_get(session, url, params):
r = Client._get(session, url, params)
wait_time = 1
retries = 1
while (r is None or should_retry(r.status_code)) and retries <= 12:
if r is None:
print(f"Connection Error, retrying")
else:
print(f"{r.status_code} received, sleeping #{wait_time} second")
pretty_sleep(wait_time)
r = Client._get(session, url, params)
wait_time = wait_time * 2
retries += 1
if r is None:
raise ConnectionError
return r
@staticmethod
def _get(session, url, params):
try:
r = session.get(url, params=params)
return r
except ConnectionError as e:
return None |
What happens now?
Some providers have implementation issues with their MDS endpoints. It's common for standard requests to result in un-explained 500 errors that will disappear when retrying or for the remote end to disconnect mid-request.
What should happen?
I'd like to request that we investigate adding logic to retry requests on certain conditions, like the remote disconnecting mid-request or receiving a 500 error
How do we do that?
I'm opening this issue to discuss what the recommended course of action would be to improve the resilience of the client library.
The text was updated successfully, but these errors were encountered: