-
Notifications
You must be signed in to change notification settings - Fork 300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate UniBench #1590
Comments
@isaac-chung @Samoed I’m currently reviewing the paper and going through the code to better understand UniBench, but I’m unsure how to begin this task. Could someone provide some guidance or suggestions on how I could get started? Any advice would be greatly appreciated. Thank you so much! Best, |
There seems to be quite a lot of overlap already with what we've implemented in
|
@isaac-chung Awesome that makes sense. Quick clarification when I make a new branch should I make that branch from MIEB (the branch you linked above) or just the normal main MTEB? (Asking so when I make a PR it will get merged properly |
@YashDThapliyal from |
Okay so I compared the datasets between UniBench and MTEB's Image tasks. Here’s what I did: For UniBench: Used a script to extract dataset names from the dataset_url fields in benchmarks.py, generating UniBenchDatasets.txt. For MTEB: Manually copied the names of files from each eng folders for each Task in /Image. Then I consolidated them in MTEB_all_datasets.txt, and processed the unique names into MTEB_unique_datasets.txt. But when I compared both files to identify differences I found that there is a total of 55 datasets exclusive to UniBench. I’d like to confirm if this approach is correct before proceeding, because if so then would i have to create 55 new files as specified in (https://github.com/embeddings-benchmark/mteb/blob/main/docs/adding_a_dataset.md) Thanks! |
@YashDThapliyal sure, sounds good, but do note the parsing needed. For instance, UniBench uses "haideraltahan/wds_cifar10" whereas MIEB uses the original "uoft-cs/cifar10". Here, the part of the url before "_" is to be omitted. In such cases, I'd consider 'zero-shot cifar10' as covered. (Note that MIEB covers the linear probe variant as well as zero shot for each of the classification tasks, whereas uniBench does not) Please share all 3 .txt files as well. I doubt that there are that many. I only counted 50 occurrences of "@register_benchmark" in benchmark.py from UniBench, whereas the paper cites 53 tasks. Would love to understand how 55 came about. |
@isaac-chung yeah my code strips the part of the name before the _ so that shouldnt be an issue. This is the link to the folder where the parsing code as well as all the files are stored: https://github.com/YashDThapliyal/mteb/tree/integrating-uni-bench/mteb/tasks/Image/UniBenchIntegration MTEB_unique_datasets.txt |
@YashDThapliyal How did you generate
|
@Samoed I did this in python: |
For MIEB tasks, I think your script should programmatically go through all subfolders with mteb/tasks/Image in the |
@isaac-chung Okay, I will write a new script that does this. Basically I would just need to get the names of every file within mteb/tasks/Image except the init files correct? I was debating doing either that or going thru all the init files and then getting the imports and filtering those |
Whatever works for you, as long as your MIEB results match the ground truth. Maybe reading from files like https://github.com/embeddings-benchmark/mteb/blob/mieb/mteb%2Ftasks%2FImage%2FZeroshotClassification%2F__init__.py might be easier as you mentioned.
|
Hi @isaac-chung, I wrote a script that grabs all the import statements from the MIEB image folder, parses the dataset names, and compares them with those in UniBench. This resulted in 40 unique names in UniBench that aren’t in MIEB. I noticed you previously mentioned "ground truth," but I’m unsure what you meant by that. If you notice any discrepancies, please let me know. I had some parsing issues earlier, but I’ve refined the scripts, and everything should now be working as expected. Relevant Links:
Let me know if you’d like me to tweak anything or investigate further. Otherwise, I’ll proceed to create 40 new datasets following the docs page, ensuring that each dataset implementation includes a model at the end of the file for testing the data and saving the results. Best, |
Thanks, Yash.
This is referring to the actual datasets present in MIEB. You can manually inspect the subfolders and the .py files to validate your own results. These are still not quite completely correct. For example,
Here is the script I used to extract all 138 MIEB task names and output in the same format: from mteb import get_tasks
mieb_tasks = get_tasks(categories=["i2i","i2t","t2i","it2t","it2i","i2it","t2it","it2it"])
num_tasks = len(mieb_tasks)
print(num_tasks)
# print names in newline
for task in mieb_tasks:
print(task.metadata.name.lower()) Here is the output for your convenience. Note that:
Let me know if you have further questions. MIEB dataset names in lowercase (click to expand)
In light of this discussion, I feel that MIEB could use a similar task table created like in MMTEB. CC @gowitheflow-1998 @KennethEnevoldsen |
A quick scan reduces the unique list of dataset to Unibench to the following. This gives ~20 datasets, not counting the ones with notes. There are also a lot of imagenet variants, which I'm not sure if we want to include. Maybe @gowitheflow-1998 can chime in here.
|
@isaac-chung Thank you so much for generating this list, which I assume is the list of the actual unique datasets to UniBench. I think the next steps would just be to wait on some confirmation from @gowitheflow-1998 to finalize this list of datasets that need implementing and then i can begin implementing them via following the guide that you had linked above. For now, I will go ahead and delete the files/scripts I was using to generate names of the datasets, and will make a new folder within /Image called UniBench where I can create the datasets and have an init file for them as well. |
@YashDThapliyal @isaac-chung thanks so much for the efforts! The unique list looks great.
A new folder within /Image works although not too necessary (whatever works better for you!). Can also just put each new implemented task under the abstask folder they correspond to (e.g., if |
@gowitheflow-1998 That makes sense, I can try to do that but if it gets too complicated I may end up creating a folder just for simplicity :). Quick clarification though: Am I still implementing all the imagenet variants? Additionally about the actual detail for implementing the datasets. Should I just google them and try to find them on hugging face to find all of the relevant data that would be needed to fill out the template in adding a dataset? |
@YashDThapliyal I think the imagenet variants are all worth implementing! They either have domain differences or are evaluating different properties such as robustness, and are thus useful. Should google them for actual details yeah - If existing datasets on Huggingface match the actual details in the paper, we can use the ones on HuggingFace; if not, we typically make the dataset ourselves with source images from say the Github repo of the original paper, process them with their source code/own implemented code that matches the details, and upload them to Huggingface. About things getting too complicated: Feel free to submit several separate PRs for all these, even draft ones (e.g., one PR after 3-5 tasks) so that we can review and start improving them together! |
@gowitheflow-1998 sounds good, I will begin that process and start implementing the existing datasets first and then also a draft PR every few datasets so we can ensure we are on the right track |
Paper: https://arxiv.org/abs/2408.04810
Code: https://github.com/facebookresearch/unibench
List of tasks: https://github.com/facebookresearch/unibench/blob/main/unibench/benchmarks_zoo/benchmarks.py
The text was updated successfully, but these errors were encountered: