-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LLM abstraction #420
Comments
Any solution here should also support #421. |
WIP: #432 This can all be modified to take the list[Message] easily, I'm just not 100% where that render needs to exist - I have a render_prompt() in the exl2 assistant as a very basic example.. but it may belong elsewhere - happy to discuss. |
Here are my thoughts:
|
This seems fine to me. I can't think of any technical argument against this assumption. I'm generally in favour of extending the We would of course need to update the web API to handle this new method (if we were to include it inside of the |
Comments make sense to me; Re streaming: I'm not against the idea, but what's the use-case for streaming specifically on the generate() part of the API? Most use-cases will be for python->LLM->python type patterns and not really displayed to users - I could maybe see the case if you had a ton of pre-processing that you needed the user to be aware of, but I'd likely just show this as a updating piece of text in UI that's displaying current process (pre-processing/fetching-sources/etc) if anything at all. Most of what is meant to be displayed (and therefor streamed) is in answer and generate is more for the pieces out of view. Kwargs are for sure needed, that's a good point. re "Why do we want it on the base class?" that's really design philosophy, so I bow to your vision there; if we have it in base class we at least try to enforce patterns - if a given assistant doesn't need a generate() maybe it just gets implemented as pass or some basic type-correct return. Or - maybe generate is a core piece within answer and answer simply returns a different dtype (Message) with the user choosing if that is also fetching sources or just running generate(). An assistant without need of Answer() could basically just have answer call generate, and then cast it into a Message.(?) "Generate" is a solid.. 6/10 in my book.. so I'm on board with Nick on that one - placeholder at best, but I don't really love a lot of alternatives either. I just asked gpt4o for ideas and got: inference, evaluate, query, invoke, compute, process, and run. Among a lot of others I like much less. Inference is certainly the most literal, so there's zero question as to what it does, but I might lean toward evaluate, query, process over that one. |
Feature description
We currently abstract LLMs through
ragna.core.Assistant
. While this allows users to implement arbitrary assistants, it makes it unnecessarily hard to use LLMs for other tasks in the RAG pipeline. For example, preprocessing a prompt would require a custom implementation of potentially the same LLM used for answering the prompt.Thus, I propose we implement a
Llm
base class and add builtin components for all assistants we currently have. Probably under theragna.llms
namespace.I'm not sure yet whether
ragna.core.Assistant
should subclass the newLlm
base class or if we rather use composition. Open to both.I'm also not 100% sure what the interface should look like. I would like to see a small survey of other tools in the ecosystem to compare.
Value and/or benefit
Less duplication for users that want to use a more complex pipleline.
Anything else?
No response
The text was updated successfully, but these errors were encountered: