Replies: 2 comments 4 replies
-
It seems to be worth it to take it to the full Builder Pattern, which will provide:
Imagine this interface: from outlines import models, OutputType
model = models.provider("name", *init_args, **init_kwargs)
result = model \
.inference_settings(*args, **kwargs) \ # optional
.txt_prompt() | .visual_prompt() | .audio_prompt() \ # any or all
.output_type(OutputType) \
.stream() or .load() # final call, which is hidden build() + output approach Where:1.
|
Beta Was this translation helpful? Give feedback.
-
For pydantic models, I've seen BAML handle streaming by generating the entire key structure with empty fields, and then filling out values as you go. This is presumably quite resource intensive, but it did give a cohesive streaming generation experience that respected structure. Example: Step 1: {
"field1":"",
"field2":null,
} Step 2, first sample {
"field1":"Hello ",
"field2":null,
} Step 3, completion of first value {
"field1":"Hello World",
"field2":null,
} Step 4, complete {
"field1":"Hello ",
"field2":15,
} Streaming in structured gen is very strange though, so I'm not sure what the appropriate interface is. The bonuses of implementing a "fill the skeleton" approach as above is that, in principle, you can to run early validation, inline tool evaluations, etc. |
Beta Was this translation helpful? Give feedback.
-
The current design of the library is not flexible enough:
outlines.generate
module;New user interface
We need to make the interface of the library simpler and more flexible. I propose the following design, in pseudo-code:
Users thus need only be concerned about the output type, be it a Python type, a Pydantic model, etc. without having to learn new functions. This implicitly re-centers Outlines around the definition of output types.
Extra parameters
Any other value passed to
models.provider
is passed directly to the initialization function in the corresponding library:Same for other values passed to the
__call__
method of the model:This will give users more flexibility. For instance this would solve #1199, and would allow users to use a wider variety of sampling algorithms that those described in
samplers.py
. It would also simplify the code as we will not be trying to normalize the parameters anymore. See here, here or here for example.Outlines will become a thin wrapper around the libraries to augment them with a friendly interface to do structured generation.
Async execution
Asynchronous execution is necessary for agentic workflows, among other things. We should thus support async calls whenever possible:
AsyncLLMEngine
.Streaming
We should also offer the possibility to stream tokens, although I am not quite sure how that would work with types such as Pydantic models. A common way to do this is to pass
streaming=True
to the generation function:Although I am not a big fan of this and would prefer a new method such as:
Multi-modal models
Multi-modal models are different form text-2-text model in that they accept multiple modalities as an input. I thus believe they can simply be handled by defining specific input types:
In this case, however, if
image
is of typePIL.Image
we may be able to simply pass a tuple as an input:In any case, this should be handled by looking at the types of the inputs.
Reviewers
@torymur, @lapp0
Beta Was this translation helpful? Give feedback.
All reactions