Cache model output on a particular dataset #2925

chandlj · 2024-11-07T03:04:56Z

It would be nice if we could pre-compute a model's output on a particular dataset, and essentially "cache" this for use in an evaluation. For example, we have a large dataset of long-context documents and running our model on this dataset is particularly expensive. If we would like to change our evaluation pipeline at any point, either adding/removing/modifying a computed metric/score on the dataset, then it seems to me that we would have to re-run our model on the entire dataset to get a new evaluation run.

It does seem like you would be able to do this:

dataset = Dataset(
    name="papers",
    rows=[
        {"id": "0", "docs": ..., "output": ...}, # Output is the pre-computed model output, stored at the database level
    ],
)

class IdentityModel(weave.Model):
    @weave.op()
    async def predict(self, docs: ..., output: T) -> T:
        return output

model = IdentityModel()
evaluation = Evaluation(dataset=dataset, scorers=[...]) # Add our metrics here

However, this is obviously not ideal and would probably be confusing in the UI.

andrewtruong · 2024-11-07T14:23:14Z

Hey @chandlj, we're working on "adding calls to a dataset", which I think is what you're asking for.

Basically:

# 1. Create Call objects containing the inputs, outputs, etc.
calls = []
for x in range(3):
    res, call = await model.predict.call(...)
    calls.append(call)

# 2. Generate a dataset from those calls (your pre-computed model outputs)
dataset = Dataset.from_calls(calls)

# 3. Pass to Evaluation as you would normally.
evaluation = Evaluation(dataset=dataset, ...)

Then you can reuse the dataset later using the "Use" tab in the UI

chandlj · 2024-11-07T14:57:05Z

Hey @andrewtruong thanks for the swift reply! When can we expect this feature to be completed?

andrewtruong · 2024-11-08T17:24:00Z

No firm timeline atm, but my current guess would be in the next few weeks!

Would you want to primarily add calls via the API (like above), or via the UI?

chandlj · 2024-11-11T18:49:05Z

We would probably like to do it via the API. I kind of envision it where we would have a dataset of inputs, so like:

dataset = Dataset(name="papers", rows=[{"id": 0, "context": ...}, ...])

calls = []
for entry in dataset:
    res, call = await model.predict(entry)
    calls.append(call)

dataset_with_responses = Dataset.from_calls(calls)

dataset_with_responses.publish(name="papers_with_calls")

...
# Later, using the dataset
dataset = weave.ref("papers_with_calls")

evaluation = Evaluation(dataset=dataset, scorers=[... dynamically changing list of scorers ...])

evaluation.evaluate() # In theory, you would not need to pass the model here because we have already computed outputs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache model output on a particular dataset #2925

Cache model output on a particular dataset #2925

chandlj commented Nov 7, 2024

andrewtruong commented Nov 7, 2024

chandlj commented Nov 7, 2024

andrewtruong commented Nov 8, 2024

chandlj commented Nov 11, 2024

Cache model output on a particular dataset #2925

Cache model output on a particular dataset #2925

Comments

chandlj commented Nov 7, 2024

andrewtruong commented Nov 7, 2024

chandlj commented Nov 7, 2024

andrewtruong commented Nov 8, 2024

chandlj commented Nov 11, 2024