-
Notifications
You must be signed in to change notification settings - Fork 254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speeding up MTEB #381
Comments
Only downloading the splits we use might also give us some free speed up. It seems like currently there is no way to specify a subset of splits to load. Only all or one are available.
Perhaps a workaround is to loop over the splits needed to load one Dataset at a time and construct a |
Yea that seems like a great approach |
I can start on that this week when I get some downtime from the conference, if no one has started yet. |
Looks like this relies on a WIP from datasets: huggingface/datasets#6832 |
I believe this issue is mostly resolved. While we can def. speed up MTEB more I think most of the initial ideas in this PR has been implemented (or will be implemented as a part of MMTEB) |
This is an overview issue on how to speed up MTEB:
I see the following options for speeding up MTEB:
load_dataset
function. Note this will lead to bugs if thedataset_transform
assumes the full dataset (probably shouldn't happen, but it might).Task-specific speed-ups:
Overview of slowest segments:
Based on existing results from the
paraphrase-multilingual-MiniLM-L12-v2
(which might have been run on all sorts of systems).The text was updated successfully, but these errors were encountered: