Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: Made Compute Engine Metadata service exceptions more informative #1637

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

Ark-kun
Copy link

@Ark-kun Ark-kun commented Nov 26, 2024

Previously, the information about the original exceptions was omitted.

Previously, the information about the original exceptions was omitted.
@Ark-kun Ark-kun requested review from a team as code owners November 26, 2024 01:59
Copy link
Member

@viacheslav-rostovtsev viacheslav-rostovtsev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

attempt,
retry_count,
response.status,
raise exceptions.TransportError(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the retryable status code is expected in this case so I'm not sure a conversion to exception is the right thing and it makes the code a bit harder to understand.

(1) Can the customer set log level to an appropriate level to see these attempts? This is the best solution because it preserves the full history of retry attempts.

(2) If the code gets here, response is set and you can just use that to raise the error below. This would also catch the non-retryable error case.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please take another look at the code execution flow. It's likely that the code change is not changing the behavior the way you think it does. There are no new exceptions thrown to the user. And the logging still happens the same way as before. It's just that now in one of the cases instead of having duplicated logging code there the logging branch is triggered by an exception.

The initial solution was simple and did not change this area. However, @viacheslav-rostovtsev noticed an issue when the retriable error limit is exhausted. In that case the information about the service responses (retriable errors) is lost. A simplest way to capture that information in the final exception is to wrap it in TransportError and let it get assigned to last_exception.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do need this change, then I agree with Wes's proposal in (2).

We could build the message "Response status: {}; Response:\n{}".format( response.status, response.data ), and store in a string, use it for the warning message and also use it in the final exception. The string will have the last response status.

I do not understand why the execution flow needs to be changed to just log the last response.

else:
break

except exceptions.TransportError as e:
last_exception = e

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good idea. Clear response here if you're following my suggestions.

@@ -229,7 +231,7 @@ def get(
raise exceptions.TransportError(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we should consult last_exception and response to construct the right error:

(1) Non-retryable error encountered: response.status
(2) Ran out of retries: last_exception
(3) Ran out of retries: response.status

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this was the first iteration of the idea. It seems to me that converting response to an exception works just as well and simplifies the logic since only the last exception needs to be tracked. The logging etc does not change. Do you want us to change this?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the code flow is more confusing with exceptions, consistent with the Python style guide, so I'd like it changed. Happy to get a 2nd opinion from someone with more Python experience like @sai-sunder-s or @parthea

/cc @Ark-kun

@@ -202,22 +202,24 @@ def get(

backoff = ExponentialBackoff(total_attempts=retry_count)

last_exception = None
for attempt in backoff:
try:
response = request(url=url, method="GET", headers=headers_to_use)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clear last_exception here if you're following my suggestions.

)
continue
else:
break

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clear last_exception here if you follow my suggestions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants