-
Notifications
You must be signed in to change notification settings - Fork 291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for FinMTEB benchmark #1379
base: v2.0.0
Are you sure you want to change the base?
Conversation
Hey @Muennighoff @KennethEnevoldsen @isaac-chung! Here's a WIP PR to close #1267. I had a few questions/notes:
I'll add the summarization changes and make the PRs to results and leaderboard once this is done. |
Hi @alt-glitch , thanks for working on this!
Let me know if anything is unclear. |
Re. 2: PRs to |
…lassification tasks
Thanks for the comments! Some more info:
I'm interested in helping with getting the results too! |
from mteb.abstasks.TaskMetadata import TaskMetadata | ||
|
||
|
||
class FOMCClassification(AbsTaskClassification): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General comment: Meta data is required to be filled out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah understood. Working on it.
Not sure what is meant by:
|
Summarisation tasks here don't have human_summaries or relevance scores. So the Spearman correlation is calculated between summary and text. Hence the STSEvaluator is used. See: yixuantt/FinMTEB#2
|
Update: It's taking me a couple more days to fill out all the metadata fields for this benchmark as this seems to be mostly a manual process — reading the paper referenced for each dataset to understand and derive the date of dataset creation, annotation creators, and sample creation process since there are 64 datasets :) If there's something I'm missing, do let me know! Thanks! |
Thanks for taking the time on this. I believe metadata is the only thing missing and then it can be reviewed and merged. |
Moving this to v2.0.0 to avoid merge conflict in the future. I can resolve the current merge conflicts one metadata is added. |
Checklist
make test
.make lint
.Adding datasets checklist
Reason for dataset addition:
Fixes #1267
I have run the following models on the task (adding the results to the pr). These can be run using the
mteb -m {model_name} -t {task_name}
command.sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
FiQAClassification
as of now.intfloat/multilingual-e5-small
FINAL
as of now.I have checked that the performance is neither trivial (both models gain close to perfect scores) nor random (both models gain close to random scores).
If the dataset is too big (e.g. >2048 examples), considering using
self.stratified_subsampling() under dataset_transform()
I have filled out the metadata object in the dataset file (find documentation on it here).
Run tests locally to make sure nothing is broken using
make test
.Run the formatter to format the code using
make lint
.