Client resiliance to provider server errors #88

zaksoup · 2020-01-29T01:00:48Z

What happens now?

Some providers have implementation issues with their MDS endpoints. It's common for standard requests to result in un-explained 500 errors that will disappear when retrying or for the remote end to disconnect mid-request.

What should happen?

I'd like to request that we investigate adding logic to retry requests on certain conditions, like the remote disconnecting mid-request or receiving a 500 error

How do we do that?

I'm opening this issue to discuss what the recommended course of action would be to improve the resilience of the client library.

thekaveman · 2020-01-29T02:08:17Z

Exponential backoff or some other retry mechanism? I've typically "handled" (major air quotes) these errors by making large requests multiple times over a given time period, but this is not ideal for anyone. The current escape sequence is ripe for improvement.

Some related conversation can be found in #13 and #82

zaksoup · 2020-01-30T00:29:38Z

As an aside, the current code using is not only works for status codes < 256. I'm not a pythonista by any account so I spent a bit too long trying to figure out why

x = 200
x is 200
# true
y = 500
y is 500
# false

was happening. Turns out, is checks for object equivalence and for ints < 256 python uses the same object, but above that they'll be different objects...

zaksoup · 2020-01-30T00:31:41Z

On topic... I wrote a very (very very) quick-and-dirty attempt at making the code a bit more retryable, including to Connection errors. Any feedback on what would be more idiomatic python is extremely welcome. This is in client.py...

    @staticmethod
    def retryable_get(session, url, params):
        r = Client._get(session, url, params)
        wait_time = 1
        retries = 1
        while (r is None or should_retry(r.status_code)) and retries <= 12:
            if r is None:
                print(f"Connection Error, retrying")
            else:
                print(f"{r.status_code} received, sleeping #{wait_time} second")

            pretty_sleep(wait_time)
            r = Client._get(session, url, params)
            wait_time = wait_time * 2
            retries += 1

        if r is None:
            raise ConnectionError
        return r

    @staticmethod
    def _get(session, url, params):
        try:
            r = session.get(url, params=params)
            return r
        except ConnectionError as e:
            return None

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Client resiliance to provider server errors #88

Client resiliance to provider server errors #88

zaksoup commented Jan 29, 2020

thekaveman commented Jan 29, 2020

zaksoup commented Jan 30, 2020

zaksoup commented Jan 30, 2020

Client resiliance to provider server errors #88

Client resiliance to provider server errors #88

Comments

zaksoup commented Jan 29, 2020

What happens now?

What should happen?

How do we do that?

thekaveman commented Jan 29, 2020

zaksoup commented Jan 30, 2020

zaksoup commented Jan 30, 2020