Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement delayed session creation #51

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

DevilXD
Copy link

@DevilXD DevilXD commented Feb 22, 2022

Reference discussion: praw-dev/asyncpraw#167

The changes have been implemented as requested. There's a really important note I've noticed and would like to add here though - due to how the requests are handled (without using the context manager like documentation says), the code calling the request method needs to handle releasing the response object. Normally, this would be handled by the context manager's __aexit__ method of the ClientResponse that's returned. Reference code: https://github.com/aio-libs/aiohttp/blob/ca51ebd5ffb77f66218933e017a0c924f5810b4d/aiohttp/client_reqrep.py#L1134

Since I haven't seen anything doing so in asyncpraw itself, it means that the entire library is "leaking" unclosed client response objects, much like you'd open a file and forget to close it. I'm not sure how aiohttp handles cases like this, but it should not be a thing. To fix this, either asyncpraw (or any other library using asyncprawcore) has to include a finally clause followed by response.release(), or the request method here needs to be converted to use the async context manager (probably by becoming one in itself), just like the documentation explains so. Something like:

from contextlib import asynccontextmanager

class Requestor:
    # bad
    def request(self, *args, **kwargs):
        return await self._session.request(*args, **kwargs)

    # good
    @asynccontextmanager
    def request(self, *args, **kwargs):
        async with self._session.request(*args, **kwargs) as response:
            yield response

Usage of this would then change in asyncpraw like so:

class Reddit:
    # current
    async def request(self, ...):
        response = await self._core.request(...)
        # response.json or response.text are read
        # response object isn't released afterwards

    # changed
    async def request(self, ...):
        async with self._core.request(...) as response:
            # use response to read json or text from it
            # using the context manager will release the response properly

Should this also become a part of this PR?

@DevilXD
Copy link
Author

DevilXD commented Feb 22, 2022

It appears that the original __init__ of the Requestor was accessing a private attribute _default_headers of aiohttp.ClientSession (to set a default user agent header), which is not allowed per PEP8. To fix this, the code has been modified so that the user agent header is passed with each and every request made, no matter which session is used (self-created or custom). This makes test_initialize and test_request__use_custom_session tests not possible to adjust from my side, since I'm not familiar with mock or anything used there really. Detecting if User-Agent has been included can only be done from the mocked request call itself, by checking which headers have been passed to the request. Note that this would also need to take into account the headers-merging behavior of the aiohttp.ClientSession.request method, since the custom header from test_request__use_custom_session isn't passed as a kwarg but comes from the session defaults.

@LilSpazJoekp
Copy link
Member

Should this also become a part of this PR?

Yes.

Regarding the failing tests I'll take a look at what needs to be done to make those pass.

Side note, don't worry about maintaining compatibility with Python 3.6 as support is going to be dropped for that version.

@DevilXD
Copy link
Author

DevilXD commented Feb 23, 2022

It seems like this library was a copy-paste of original PRAW in terms of how requests are handles, which unfortunately doesn't apply to aiohttp because it has a different way of handling requests and responses. The request path that I've managed to figure out so far is: asyncpraw.Reddit.request -> asyncprawcore.Session.request (uses JSON) -> asyncprawcore.Session._request_with_retries (uses response) -> asyncprawcore.Session._do_retry and asyncprawcore.Session._make_request ->asyncprawcore.Session._rate_limiter.call wrapping asyncprawcore.Requestor.request. All of these would have to be rewritten into using an async context manager, up to asyncprawcore.Session._request_with_retries, which is where the processing of the response object ends and it's converted to JSON. In my opinion, asyncprawcore.Session._do_retry shouldn't even exist, as you can retry a request with a simple for loop:

NUM_ATTEMPTS = 5  # try 5 times
for attempt in range(NUM_ATTEMPTS):
    try:
        async with session.request(...) as response:
            # handle response, read status, read json or text, etc.
            data = await response.json()
            return data
    except aiohttp.ClientConnectionError:
        # this should really be the only one you need to catch, which can happen if there are internet problems
        # unless timeout is None or 0, this will also catch requests timing out
        continue  # retry

Because this touches on multiple methods and a rate limiter wrapping the call at the very end (+ tests), I am withdrawing my offer of including a change for this in the PR, as it'd be too complicated and time consuming for me. Considering the release method documentation, calling .json() or .text() on the response most likely already releases the connection. Still, a leak can occur if none of those two methods are called (for example during an exception) and you'll end up with an unreleased connection that'll be kept open until application restart, unless a careful finally followed by response.release() is added at key points, or everything is converted to using the context manager like the documentation explained to do so.

@LilSpazJoekp
Copy link
Member

It seems like this library was a copy-paste of original PRAW in terms of how requests are handles

Yes, it was decided to do it that was (where logical and feasible) to make syncing changes and features between the to easier to manage.

All of these would have to be rewritten into using an async context manager

I agree. Out of curiosity, is there a significant downside to creating and destroying the session each time or when an exception occurs? Is there a benefit from reusing the same session during the entire lifecycle of the application?

In my opinion, asyncprawcore.Session._do_retry shouldn't even exist

Could you elaborate? This exists in because it will retry depending on the exception returned; some exceptions it will retry on and some will not and return it to asyncpraw for handling.

I am withdrawing my offer of including a change for this in the PR, as it'd be too complicated and time consuming for me.

Which change are you referring to? Converting to context managers?

@DevilXD
Copy link
Author

DevilXD commented Feb 23, 2022

Out of curiosity, is there a significant downside to creating and destroying the session each time or when an exception occurs? Is there a benefit from reusing the same session during the entire lifecycle of the application?

A session is a pool of connections that also happens to allow for grouping requests in regards to things like common/default headers etc. If you happen to be doing two requests at the same time, the session will reuse the connection opened on the first request, for the second and any further request. Without a session, you are guaranteed to open a new connection for each and every request, with no possibility of reusing the same one. In general, it's less efficient. As the documentation says, creating and storing the aiohttp.ClientSession object somewhere in your application, and then using it for each and every connection is the way to go.

As far as "creating and destroying the session" goes, doing so for every request is inefficient - you may as well not use sessions at all and just use aiohttp.request directly at that point. Still, if you happen to need to destroy it to recreate it on a new object, because your application needs to do so, then it's perfectly fine. All the session does is give you a way to more efficiently reuse existing connections, as well as group together common things for requests - that's it.

This exists in because it will retry depending on the exception returned; some exceptions it will retry on and some will not and return it to asyncpraw for handling.

What I had in mind, that you've probably missed on, is that there's no need to have this as a separate function. You can just as well handle exceptions from the same function that does the request, like I showed you in the provided example. Here it is again:

NUM_ATTEMPTS = 5  # try 5 times
for attempt in range(NUM_ATTEMPTS):
    try:
        async with session.request(...) as response:
            # handle response, read status, read json or text, etc.
            data = await response.json()
            return data
    except aiohttp.ClientConnectionError:
        # this should really be the only one you need to catch, which can happen if there are internet problems
        # unless timeout is None or 0, this will also catch requests timing out
        continue  # retry

Note that the except aiohttp.ClientConnectionError: part is just an example here - if you need to handle a particular exception there, you include it in the exception handler (aiohttp.ClientConnectionError in this case). If you happen to get an exception that doesn't have a matching handler in the current scope, the exception just continues "bubbling up" until it finds a handler that does so. Consider this:

def asyncprawcore_request():
    try:
        raise ValueError
    except TypeError:
        # TypeError is handled here, ValueError is not, so it continues to bubble up

def asyncpraw_request():
    try:
        asyncprawcore_request()
    except ValueError :
        # ValueError is handled here because core didn't handle it

If the exception raised in the code above happens to be a ValueError, it'll be handled in asyncpraw_request, which is the closest scope that has a ValueError handler defined. If the exception would be a TypeError instead, the inner exception handler in asyncprawcore_request would handle it, and it'd never "bubble up" to asyncpraw_request. If you need a catch-all exception handler, you can just use Exception which matches pretty much all exceptions which you would normally care about.

With that in mind, the example again, with slightly expanded code:

# core
async def asyncprawcore_request(...):
    NUM_ATTEMPTS = 5  # try 5 times
    for attempt in range(NUM_ATTEMPTS):
        try:
            async with session.request(...) as response:
                # handle response, read status, read json or text, etc.
                data = await response.json()
                return data
        except aiohttp.ClientConnectionError:
            # retry on simple internet issues
            continue
        except Exception as exc:
            # something else went wrong
            raise YourException(exc)

# main lib
async def asyncpraw_request(...):
    try:
        data = await asyncprawcore_request(...)
        return data
    except YourException:
        # your special exception happened in the core, catch it here
    except OtherException:
        # handle any other exception that wasn't handled by the core here

And again, you don't need error handling to be a function, exception handling can be done inside those exception handlers (their name, duh) defined within the request method itself. Using a separate function for it is just more code for the same effect, really. If you'd need a "live example" of this, you can check out my aRez wrapper, here: https://github.com/DevilXD/aRez/blob/e6aee8607af060edbd07fe85d1b4969b7c8b9da3/arez/endpoint.py#L423-L457 Note that, without that except Exception as exc: at the end there, any unhandled exception would just bubble out and need to be handled by the code that called Endpoint.request itself.

Which change are you referring to? Converting to context managers?

Yes, well, I don't really feel like I can handle rewriting this much without breaking anything and making sure tests are up to date and pass as well. I don't even know how much of this works because I didn't write it. This change is simple, but vast (needs to modify multiple methods) and requires knowledge about internals that I don't have, plus there's a rate limiter wrapping the call along the way, which probably can't really handle async context managers - that part would most likely have to be rewritten entirely. Again, no idea how all of that works, so can't say I'm sure of this, but it's what my hunch and experience are telling me.

Otherwise, this PR is good as is to me, assuming the tests will account for what I described in the previous comment, with headers being passed. I'll push out one more commit that adjust that one test's assert value, so it should hopefully be back to one non-passing test.

@DevilXD DevilXD force-pushed the async_session_create branch from 1149be8 to d625e3b Compare February 23, 2022 20:19
@LilSpazJoekp
Copy link
Member

cc @vikramaditya91

@LilSpazJoekp
Copy link
Member

I'm going to pick this up and get it merged. Thanks for the help with this!

@LilSpazJoekp LilSpazJoekp force-pushed the async_session_create branch 2 times, most recently from 5b5a650 to 2411274 Compare August 22, 2024 19:30
@LilSpazJoekp
Copy link
Member

@DevilXD Would you mind taking a look at this?

Copy link
Author

@DevilXD DevilXD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything else looks just about right, assuming it all works together.

@@ -56,29 +58,58 @@ def __init__(
msg = "user_agent is not descriptive"
raise InvalidInvocation(msg)

self.headers = {"User-Agent": f"{user_agent} asyncprawcore/{__version__}"}
self.loop = loop or asyncio.get_event_loop()
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To properly implement the delayed session creation, you need to start with going away from passing around the loop object. New asyncio handles all of that internally via asyncio.get_running_loop(), which will raise a RuntimeError when it's called outside of a coroutine. There were plans to deprecate get_event_loop in Python 3.10, but its deprecation has been delayed, and thus it is deprecated only since Python 3.12,, but it will be removed eventually. It's better to move on now than later, as it'll also let you avoid other potential issues that arise from potentially mismatching the loop object - if not directly in this library, the user will do so in their user code.

There should be no self.loop here, or anywhere really. If you need the loop object, use loop = asyncio.get_running_loop(). If it raises the RuntimeError, that means you need to restructure the code so that it gets called from inside a coroutine, directly or indirectly. Note that this still counts:

async def main():
    # no await, only the class is created, but the __init__ runs while we're executing a coroutine, so all is good
    r = Requestor()

The main entry point to an asynchronous application uses asyncio.run(...) to execute a top level coroutine, at which point the loop will exist, and everything else in asyncio will "just work". If you really, really, really need and want to support passing in a loop parameter for the aiohttp.ClientSession object creation, then you'll need some extra code to actually store that ref anyway (probably under self.loop - sigh...), specifically for those users who like to break all the asyncio rules when designing their application, but otherwise, going by the standard today's asyncio rules, no loop object is needed to be passed around anywhere.

Final note: aiohttp.ClientSession should figure out the proper loop object to use on it's own, as long as you won't try to pass anything in.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I got that changed.

Co-authored-by: Joel Payne <15524072+lilspazjoekp@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants