-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache model output on a particular dataset #2925
Comments
Hey @chandlj, we're working on "adding calls to a dataset", which I think is what you're asking for. Basically: # 1. Create Call objects containing the inputs, outputs, etc.
calls = []
for x in range(3):
res, call = await model.predict.call(...)
calls.append(call)
# 2. Generate a dataset from those calls (your pre-computed model outputs)
dataset = Dataset.from_calls(calls)
# 3. Pass to Evaluation as you would normally.
evaluation = Evaluation(dataset=dataset, ...) Then you can reuse the dataset later using the "Use" tab in the UI |
Hey @andrewtruong thanks for the swift reply! When can we expect this feature to be completed? |
No firm timeline atm, but my current guess would be in the next few weeks! Would you want to primarily add calls via the API (like above), or via the UI? |
We would probably like to do it via the API. I kind of envision it where we would have a dataset of inputs, so like: dataset = Dataset(name="papers", rows=[{"id": 0, "context": ...}, ...])
calls = []
for entry in dataset:
res, call = await model.predict(entry)
calls.append(call)
dataset_with_responses = Dataset.from_calls(calls)
dataset_with_responses.publish(name="papers_with_calls")
...
# Later, using the dataset
dataset = weave.ref("papers_with_calls")
evaluation = Evaluation(dataset=dataset, scorers=[... dynamically changing list of scorers ...])
evaluation.evaluate() # In theory, you would not need to pass the model here because we have already computed outputs |
It would be nice if we could pre-compute a model's output on a particular dataset, and essentially "cache" this for use in an evaluation. For example, we have a large dataset of long-context documents and running our model on this dataset is particularly expensive. If we would like to change our evaluation pipeline at any point, either adding/removing/modifying a computed metric/score on the dataset, then it seems to me that we would have to re-run our model on the entire dataset to get a new evaluation run.
It does seem like you would be able to do this:
However, this is obviously not ideal and would probably be confusing in the UI.
The text was updated successfully, but these errors were encountered: