Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added session ID as a contextVar #327

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

JohnPeng47
Copy link

This works well with server applications with async support, as a common use case is aggregating the token cost per API request. I believe this is also in-line with ell design philosophy of relying on Python language features (ie. ell.config as a global variable) to promote ell lifecycle operations

for invocations by session ID. Adding it as a
context var allows querying by session ID in an
async server setup (ie. FastAPI), where a common
usecase would be to collect the cost of invocations
in a given user session
@JohnPeng47
Copy link
Author

Think this can be useful for filtering invocations on the front end as well, since I think logically grouping invocations in sessions rather than individual LMP histrories make more sense for me

@MadcowD
Copy link
Owner

MadcowD commented Oct 24, 2024

I like this but is the implementation complete. I don't see it actually used anywhere?

@JohnPeng47
Copy link
Author

JohnPeng47 commented Oct 24, 2024

Ah. So for the implementation, I actually created a new api.py file in the root folder, and then exposed all of sql.py's functions via the global config.store like so:

from ell.configurator import config
from ell.ctxt import get_session_id
from typing import Dict, List, Optional, Set, Any
from ell.types import SerializedLMP, Invocation, InvocationContents

...
def write_invocation(invocation: Invocation, consumes: Set[str]) -> Optional[Any]:
    """
    Write an invocation to the store.

    :param invocation: The Invocation object to write.
    :param consumes: A set of invocation IDs that this invocation consumes.
    :return: None
    """
    return config.store.write_invocation(invocation, consumes)

def get_invocations_by_session_id(session_id: str = "") -> List[Invocation]:
    """
    Retrieve invocations by session ID.

    :param session_id: The session ID to filter by.
    :return: A list of Invocation objects.
    """
    session_id = session_id or get_session_id()
    return config.store.get_invocations_by_sessionid(session_id)

And then on the client I would just call get_invocations_by_session_id. I figured the way you would probably want to access the API functions would be via a file export and not through an server API interface, so I just made it easier and more explicit by exporting them in api.py. Not sure how you felt that, was actually gonna ask if you had any better ideas, forgot to put that msg in the PR

@MadcowD
Copy link
Owner

MadcowD commented Oct 24, 2024

@alex-dixon might have some input here actually ill check it out when i land from this flight

@alex-dixon
Copy link
Contributor

Might consider a metadata dict that can pass through to storage layers. I think this would support other use cases and allow the user to be in control over the lifecycle of concepts like session / request id across async/threaded contexts.

I'd need more time to understand the session concept and its lifecycle -- is it per ell initialization?

@JohnPeng47
Copy link
Author

Yeah wrt to the lifecycle, the answer is yes for single threaded, and no for multi-threaded/async (coroutines). Although I haven't tested it extensively, afaik both of these cases, either single threaded execution as a simple script or multithreaded/async in a server loop, in which case a new session_id is created per thread, works.
For example, right now I am using it to log the costs for a single API call that triggers multiple LMPs using uvicorn/FastAPI. The same cost logging function works when called just by itself, as a standalone script too as well, which I do in my eval suites.

I do agree there should probbaly be some kind of session variable specified through the user to "tag" LMP calls, but I think this works pretty well as a default, out-of-the box session implementation

@JohnPeng47
Copy link
Author

Oh I think there is one case that might be surprising. Which is multi-threading it will create a new session_id in each thread. This actually might be kinda of an anti-usecase for this, because the user would probably expect all ell calls to have the same session_id. Hmm, yeah this would actually be kind of annoying, don't really have a good solution for this case (and actually, I just realized this was the cause of an error I had earlier in not being able to find some invocations ids via filtering with session_id lol)

@alex-dixon
Copy link
Contributor

@MadcowD could he use origin_trace for this?

@JohnPeng47
Copy link
Author

Hey so got a suggestion to solve the threads problem, its a two parter:

  1. Add a pass-through 'session_id' param on ell.complex that let's you overwrite the default contextVar
  2. Add a "propagate: bool = True" parameter, when True, will pass set its child invocations session_ids to its own session_id. This makes it so if you have a couple of nested LMP calls, you only have to set the parent

What do you guys think?

@JohnPeng47
Copy link
Author

Personally I think the biggest thing session_id can be used for is filtering LMPs in the front end. At least for my personal use case, all of my logic is implemented using several LMPs, so seeing them grouped together rather than the single history for a single LMP makes much more sense to me

@MadcowD
Copy link
Owner

MadcowD commented Oct 29, 2024

Okay I agree with the idea of run groups and we do actually need this feature for the new evaluation PR so I'll give this another look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants