-
-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement delayed session creation #51
base: main
Are you sure you want to change the base?
Conversation
It appears that the original |
Yes. Regarding the failing tests I'll take a look at what needs to be done to make those pass. Side note, don't worry about maintaining compatibility with Python 3.6 as support is going to be dropped for that version. |
It seems like this library was a copy-paste of original PRAW in terms of how requests are handles, which unfortunately doesn't apply to NUM_ATTEMPTS = 5 # try 5 times
for attempt in range(NUM_ATTEMPTS):
try:
async with session.request(...) as response:
# handle response, read status, read json or text, etc.
data = await response.json()
return data
except aiohttp.ClientConnectionError:
# this should really be the only one you need to catch, which can happen if there are internet problems
# unless timeout is None or 0, this will also catch requests timing out
continue # retry Because this touches on multiple methods and a rate limiter wrapping the call at the very end (+ tests), I am withdrawing my offer of including a change for this in the PR, as it'd be too complicated and time consuming for me. Considering the |
Yes, it was decided to do it that was (where logical and feasible) to make syncing changes and features between the to easier to manage.
I agree. Out of curiosity, is there a significant downside to creating and destroying the session each time or when an exception occurs? Is there a benefit from reusing the same session during the entire lifecycle of the application?
Could you elaborate? This exists in because it will retry depending on the exception returned; some exceptions it will retry on and some will not and return it to asyncpraw for handling.
Which change are you referring to? Converting to context managers? |
A session is a pool of connections that also happens to allow for grouping requests in regards to things like common/default headers etc. If you happen to be doing two requests at the same time, the session will reuse the connection opened on the first request, for the second and any further request. Without a session, you are guaranteed to open a new connection for each and every request, with no possibility of reusing the same one. In general, it's less efficient. As the documentation says, creating and storing the As far as "creating and destroying the session" goes, doing so for every request is inefficient - you may as well not use sessions at all and just use
What I had in mind, that you've probably missed on, is that there's no need to have this as a separate function. You can just as well handle exceptions from the same function that does the request, like I showed you in the provided example. Here it is again: NUM_ATTEMPTS = 5 # try 5 times
for attempt in range(NUM_ATTEMPTS):
try:
async with session.request(...) as response:
# handle response, read status, read json or text, etc.
data = await response.json()
return data
except aiohttp.ClientConnectionError:
# this should really be the only one you need to catch, which can happen if there are internet problems
# unless timeout is None or 0, this will also catch requests timing out
continue # retry Note that the def asyncprawcore_request():
try:
raise ValueError
except TypeError:
# TypeError is handled here, ValueError is not, so it continues to bubble up
def asyncpraw_request():
try:
asyncprawcore_request()
except ValueError :
# ValueError is handled here because core didn't handle it If the exception raised in the code above happens to be a With that in mind, the example again, with slightly expanded code: # core
async def asyncprawcore_request(...):
NUM_ATTEMPTS = 5 # try 5 times
for attempt in range(NUM_ATTEMPTS):
try:
async with session.request(...) as response:
# handle response, read status, read json or text, etc.
data = await response.json()
return data
except aiohttp.ClientConnectionError:
# retry on simple internet issues
continue
except Exception as exc:
# something else went wrong
raise YourException(exc)
# main lib
async def asyncpraw_request(...):
try:
data = await asyncprawcore_request(...)
return data
except YourException:
# your special exception happened in the core, catch it here
except OtherException:
# handle any other exception that wasn't handled by the core here And again, you don't need error handling to be a function, exception handling can be done inside those exception handlers (their name, duh) defined within the
Yes, well, I don't really feel like I can handle rewriting this much without breaking anything and making sure tests are up to date and pass as well. I don't even know how much of this works because I didn't write it. This change is simple, but vast (needs to modify multiple methods) and requires knowledge about internals that I don't have, plus there's a rate limiter wrapping the call along the way, which probably can't really handle async context managers - that part would most likely have to be rewritten entirely. Again, no idea how all of that works, so can't say I'm sure of this, but it's what my hunch and experience are telling me. Otherwise, this PR is good as is to me, assuming the tests will account for what I described in the previous comment, with headers being passed. I'll push out one more commit that adjust that one test's assert value, so it should hopefully be back to one non-passing test. |
1149be8
to
d625e3b
Compare
I'm going to pick this up and get it merged. Thanks for the help with this! |
5b5a650
to
2411274
Compare
@DevilXD Would you mind taking a look at this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything else looks just about right, assuming it all works together.
asyncprawcore/requestor.py
Outdated
@@ -56,29 +58,58 @@ def __init__( | |||
msg = "user_agent is not descriptive" | |||
raise InvalidInvocation(msg) | |||
|
|||
self.headers = {"User-Agent": f"{user_agent} asyncprawcore/{__version__}"} | |||
self.loop = loop or asyncio.get_event_loop() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To properly implement the delayed session creation, you need to start with going away from passing around the loop object. New asyncio handles all of that internally via asyncio.get_running_loop()
, which will raise a RuntimeError
when it's called outside of a coroutine. There were plans to deprecate get_event_loop
in Python 3.10, but its deprecation has been delayed, and thus it is deprecated only since Python 3.12,, but it will be removed eventually. It's better to move on now than later, as it'll also let you avoid other potential issues that arise from potentially mismatching the loop object - if not directly in this library, the user will do so in their user code.
There should be no self.loop
here, or anywhere really. If you need the loop object, use loop = asyncio.get_running_loop()
. If it raises the RuntimeError
, that means you need to restructure the code so that it gets called from inside a coroutine, directly or indirectly. Note that this still counts:
async def main():
# no await, only the class is created, but the __init__ runs while we're executing a coroutine, so all is good
r = Requestor()
The main entry point to an asynchronous application uses asyncio.run(...)
to execute a top level coroutine, at which point the loop will exist, and everything else in asyncio will "just work". If you really, really, really need and want to support passing in a loop
parameter for the aiohttp.ClientSession
object creation, then you'll need some extra code to actually store that ref anyway (probably under self.loop
- sigh...), specifically for those users who like to break all the asyncio rules when designing their application, but otherwise, going by the standard today's asyncio rules, no loop object is needed to be passed around anywhere.
Final note: aiohttp.ClientSession
should figure out the proper loop object to use on it's own, as long as you won't try to pass anything in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I got that changed.
Co-authored-by: Joel Payne <15524072+lilspazjoekp@users.noreply.github.com>
2411274
to
d9f3677
Compare
Reference discussion: praw-dev/asyncpraw#167
The changes have been implemented as requested. There's a really important note I've noticed and would like to add here though - due to how the requests are handled (without using the context manager like documentation says), the code calling the
request
method needs to handle releasing the response object. Normally, this would be handled by the context manager's__aexit__
method of theClientResponse
that's returned. Reference code: https://github.com/aio-libs/aiohttp/blob/ca51ebd5ffb77f66218933e017a0c924f5810b4d/aiohttp/client_reqrep.py#L1134Since I haven't seen anything doing so in
asyncpraw
itself, it means that the entire library is "leaking" unclosed client response objects, much like you'd open a file and forget to close it. I'm not sure howaiohttp
handles cases like this, but it should not be a thing. To fix this, eitherasyncpraw
(or any other library usingasyncprawcore
) has to include afinally
clause followed byresponse.release()
, or therequest
method here needs to be converted to use the async context manager (probably by becoming one in itself), just like the documentation explains so. Something like:Usage of this would then change in
asyncpraw
like so:Should this also become a part of this PR?