Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache model output on a particular dataset #2925

Open
chandlj opened this issue Nov 7, 2024 · 4 comments
Open

Cache model output on a particular dataset #2925

chandlj opened this issue Nov 7, 2024 · 4 comments

Comments

@chandlj
Copy link

chandlj commented Nov 7, 2024

It would be nice if we could pre-compute a model's output on a particular dataset, and essentially "cache" this for use in an evaluation. For example, we have a large dataset of long-context documents and running our model on this dataset is particularly expensive. If we would like to change our evaluation pipeline at any point, either adding/removing/modifying a computed metric/score on the dataset, then it seems to me that we would have to re-run our model on the entire dataset to get a new evaluation run.

It does seem like you would be able to do this:

dataset = Dataset(
    name="papers",
    rows=[
        {"id": "0", "docs": ..., "output": ...}, # Output is the pre-computed model output, stored at the database level
    ],
)

class IdentityModel(weave.Model):
    @weave.op()
    async def predict(self, docs: ..., output: T) -> T:
        return output

model = IdentityModel()
evaluation = Evaluation(dataset=dataset, scorers=[...]) # Add our metrics here

However, this is obviously not ideal and would probably be confusing in the UI.

@andrewtruong
Copy link
Collaborator

Hey @chandlj, we're working on "adding calls to a dataset", which I think is what you're asking for.

Basically:

# 1. Create Call objects containing the inputs, outputs, etc.
calls = []
for x in range(3):
    res, call = await model.predict.call(...)
    calls.append(call)

# 2. Generate a dataset from those calls (your pre-computed model outputs)
dataset = Dataset.from_calls(calls)

# 3. Pass to Evaluation as you would normally.
evaluation = Evaluation(dataset=dataset, ...)

Then you can reuse the dataset later using the "Use" tab in the UI

@chandlj
Copy link
Author

chandlj commented Nov 7, 2024

Hey @andrewtruong thanks for the swift reply! When can we expect this feature to be completed?

@andrewtruong
Copy link
Collaborator

No firm timeline atm, but my current guess would be in the next few weeks!

Would you want to primarily add calls via the API (like above), or via the UI?

@chandlj
Copy link
Author

chandlj commented Nov 11, 2024

We would probably like to do it via the API. I kind of envision it where we would have a dataset of inputs, so like:

dataset = Dataset(name="papers", rows=[{"id": 0, "context": ...}, ...])

calls = []
for entry in dataset:
    res, call = await model.predict(entry)
    calls.append(call)

dataset_with_responses = Dataset.from_calls(calls)

dataset_with_responses.publish(name="papers_with_calls")

...
# Later, using the dataset
dataset = weave.ref("papers_with_calls")

evaluation = Evaluation(dataset=dataset, scorers=[... dynamically changing list of scorers ...])

evaluation.evaluate() # In theory, you would not need to pass the model here because we have already computed outputs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants