Refactor of the main user interface #1220

rlouf · 2024-10-21T13:24:17Z

rlouf
Oct 21, 2024
Maintainer

The current design of the library is not flexible enough:

A new output type requires a new function in the outlines.generate module;
Users need to remember the names of these different functions;
It is difficult to pass model-specific initialization and inference parameters.

New user interface

We need to make the interface of the library simpler and more flexible. I propose the following design, in pseudo-code:

from outlines import models


model = models.provider("name")
result = model(prompt, OutputType)

Users thus need only be concerned about the output type, be it a Python type, a Pydantic model, etc. without having to learn new functions. This implicitly re-centers Outlines around the definition of output types.

Extra parameters

Any other value passed to models.provider is passed directly to the initialization function in the corresponding library:

model = models.provider("name", *init_args, **init_kwargs)

Same for other values passed to the __call__ method of the model:

result = model(prompt, OutputType, *inference_args, **inference_kwargs)

This will give users more flexibility. For instance this would solve #1199, and would allow users to use a wider variety of sampling algorithms that those described in samplers.py. It would also simplify the code as we will not be trying to normalize the parameters anymore. See here, here or here for example.

Outlines will become a thin wrapper around the libraries to augment them with a friendly interface to do structured generation.

Async execution

Asynchronous execution is necessary for agentic workflows, among other things. We should thus support async calls whenever possible:

OpenAI
Gemini
vLLM. This will require us to wrap AsyncLLMEngine.

Streaming

We should also offer the possibility to stream tokens, although I am not quite sure how that would work with types such as Pydantic models. A common way to do this is to pass streaming=True to the generation function:

results = model(prompt, output_type, stream=True)

Although I am not a big fan of this and would prefer a new method such as:

results = model.stream(prompt, output_type)

Multi-modal models

Multi-modal models are different form text-2-text model in that they accept multiple modalities as an input. I thus believe they can simply be handled by defining specific input types:

from outlines.inputs import Vision


model("prompt", output_type)  # Text to `output_type`
model(Vision("prompt", image), output_type)  # Text & Image to `output_type`

In this case, however, if image is of type PIL.Image we may be able to simply pass a tuple as an input:

model(("prompt", image), output_type)

In any case, this should be handled by looking at the types of the inputs.

Reviewers

@torymur, @lapp0

torymur · 2024-10-22T12:57:56Z

torymur
Oct 22, 2024
Maintainer

It seems to be worth it to take it to the full Builder Pattern, which will provide:

flexibility: it would be easy to adjust it in any way
clarity: separates all these concerns perfectly

Imagine this interface:

from outlines import models, OutputType

model = models.provider("name", *init_args, **init_kwargs)
result = model \
    .inference_settings(*args, **kwargs) \ # optional
    .txt_prompt() | .visual_prompt() | .audio_prompt() \ # any or all
    .output_type(OutputType) \
    .stream() or .load() # final call, which is hidden build() + output approach

Where:

1. `.load()`

As in json/pickle, as one of the explicit alternative names to streaming, which is to say "buffering/batching", but in more pythonic way.
There are other names too, for example:

fetch as in requests:
results = model.fetch(prompt, output_type)
extract as in lxml/BeautifulSoup:
results = model.extract(prompt, output_type)

Or maybe something else.

2. Unsupported prompt types

Since it's difficult to determine what particular model can and cannot do in terms of types of prompts (txt, visual, video, audio, etc.), we could provide all types and catch the exceptions with nice consistent message of unsupported prompt type.

3. Chaining prompts

Builder Pattern will also allow easily chain different prompt types, which might be quite useful.

3 replies

cpfiffer Oct 22, 2024
Collaborator

from outlines import models, OutputType

model = models.provider("name", *init_args, **init_kwargs)
result = model \
    .inference_settings(*args, **kwargs) \ # optional
    .txt_prompt() | .visual_prompt() | .audio_prompt() \ # any or all
    .output_type(OutputType) \
    .stream() or .load() # final call, which is hidden build() + output approach

I like the builder approach here, particularly since model providers are starting to add more custom token fields (reflection, tools available, etc) and we could flexibly extend this to handle common special token types.

rlouf Oct 22, 2024
Maintainer Author

I think a better pattern to manage token types is to use an object that represents the prompt, and translating it to a prompt string before passing it to the model. This is what the prompt library is supposed to be doing.

torymur Oct 23, 2024
Maintainer

I agree, that custom token fields seems to be a natural part of .txt input type, also I would imagine high inconsistency and great variability, since currently it's quite experimental, maybe it'll transform later into something different, let's see. And indeed prompt lib might shine even more considering these custom token fields, maybe we can extend examples with these too.

I suggested the pattern, even though it looks different to what we have currently, mainly to be very explicit of what interface provides and be highly flexible, but visually stable later.

For example, I agree with @rlouf that streaming approach without flags is better, but if we'll have .stream, then there is a necessity to being explicit of how to make the opposite call without guessing. Or this situation with different prompt types, I think having them defined separately will help distinguish, pass correct arguments and later easily debug, when model doesn't support visual prompts for instance and fails. So that, user experience will be smoother. Even though in python being implicit it's quite common as a pattern and people will likely guess correctly that a call without a .stream is the opposite one, I think it might worth to break out of it entirely to maximize predictability of UI.

cpfiffer · 2024-10-22T14:53:37Z

cpfiffer
Oct 22, 2024
Collaborator

We should also offer the possibility to stream tokens, although I am not quite sure how that would work with types such as Pydantic models. A common way to do this is to pass streaming=True to the generation function:
results = model(prompt, output_type, stream=True)
Although I am not a big fan of this and would prefer a new method such as:
results = model.stream(prompt, output_type)

For pydantic models, I've seen BAML handle streaming by generating the entire key structure with empty fields, and then filling out values as you go. This is presumably quite resource intensive, but it did give a cohesive streaming generation experience that respected structure.

Example:

Step 1:

{
  "field1":"",
  "field2":null,
}

Step 2, first sample

{
  "field1":"Hello ",
  "field2":null,
}

Step 3, completion of first value

{
  "field1":"Hello World",
  "field2":null,
}

Step 4, complete field2:

{
  "field1":"Hello ",
  "field2":15,
}

Streaming in structured gen is very strange though, so I'm not sure what the appropriate interface is.

The bonuses of implementing a "fill the skeleton" approach as above is that, in principle, you can to run early validation, inline tool evaluations, etc.

1 reply

torymur Oct 23, 2024
Maintainer

Indeed, that's not what I would expect as a stream, I guess we will be simply relaying whatever we get.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor of the main user interface #1220

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 4 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Refactor of the main user interface #1220

rlouf Oct 21, 2024 Maintainer

New user interface

Extra parameters

Async execution

Streaming

Multi-modal models

Reviewers

Replies: 2 comments · 4 replies

torymur Oct 22, 2024 Maintainer

Where:

1. .load()

2. Unsupported prompt types

3. Chaining prompts

cpfiffer Oct 22, 2024 Collaborator

rlouf Oct 22, 2024 Maintainer Author

torymur Oct 23, 2024 Maintainer

cpfiffer Oct 22, 2024 Collaborator

torymur Oct 23, 2024 Maintainer

rlouf
Oct 21, 2024
Maintainer

Replies: 2 comments 4 replies

torymur
Oct 22, 2024
Maintainer

1. `.load()`

cpfiffer Oct 22, 2024
Collaborator

rlouf Oct 22, 2024
Maintainer Author

torymur Oct 23, 2024
Maintainer

cpfiffer
Oct 22, 2024
Collaborator

torymur Oct 23, 2024
Maintainer