Skip to content

Commit 45af553

Browse files
authored
Merge branch 'embeddings-benchmark:main' into chemteb
2 parents 4883f42 + c3b46b7 commit 45af553

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

47 files changed

+1818
-192
lines changed
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
name: Daily Space Rebuild
2+
on:
3+
schedule:
4+
# Runs at midnight Pacific Time (8 AM UTC)
5+
- cron: '0 8 * * *'
6+
workflow_dispatch: # Allows manual triggering
7+
8+
jobs:
9+
rebuild:
10+
runs-on: ubuntu-latest
11+
steps:
12+
- name: Trigger Factory Rebuild
13+
run: |
14+
curl -X POST \
15+
"https://huggingface.co/api/spaces/mteb/leaderboard_2_demo/restart?factory=true" \
16+
-H "Authorization: Bearer ${{ secrets.HF_TOKEN }}"
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
name: Model Loading
2+
3+
on:
4+
pull_request:
5+
paths:
6+
- 'mteb/models/**.py'
7+
8+
jobs:
9+
extract-and-run:
10+
runs-on: ubuntu-latest
11+
12+
steps:
13+
- name: Checkout repository
14+
uses: actions/checkout@v3
15+
16+
- name: Set up Python
17+
uses: actions/setup-python@v4
18+
with:
19+
python-version: '3.10'
20+
cache: 'pip'
21+
22+
- name: Install dependencies and run tests
23+
run: |
24+
make model-load-test

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -145,3 +145,6 @@ tests/create_meta/model_card.md
145145
# removed results from mteb repo they are now available at: https://github.com/embeddings-benchmark/results
146146
results/
147147
uv.lock
148+
149+
# model loading tests
150+
model_names.txt

Makefile

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,4 +35,11 @@ pr:
3535
build-docs:
3636
@echo "--- 📚 Building documentation ---"
3737
# since we do not have a documentation site, this just build tables for the .md files
38-
python docs/create_tasks_table.py
38+
python docs/create_tasks_table.py
39+
40+
41+
model-load-test:
42+
@echo "--- 🚀 Running model load test ---"
43+
pip install ".[dev, speedtask, pylate,gritlm,xformers,model2vec]"
44+
python scripts/extract_model_names.py
45+
python tests/test_models/model_loading.py --model_name_file scripts/model_names.txt

docs/adding_a_model.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -30,10 +30,10 @@ These will save the results in a folder called `results/{model_name}/{model_revi
3030

3131
2. **Push Results to the Leaderboard**
3232

33-
To add results to the public leaderboard you can push your results to the [results repository](https://github.com/embeddings-benchmark/results) afterwards they will appear on the leaderboard after a day.
33+
To add results to the public leaderboard you can push your results to the [results repository](https://github.com/embeddings-benchmark/results) via a PR. Once merged they will appear on the leaderboard after a day.
3434

3535

36-
3. (Optional) **Add the results using to the model card:**
36+
3. (Optional) **Add results to the model card:**
3737

3838
`mteb` implements a cli for adding results to the model card:
3939

@@ -49,7 +49,7 @@ If the readme already exists:
4949
mteb create_meta --results_folder results/{model_name}/{model_revision} --output_path model_card.md --from_existing your_existing_readme.md
5050
```
5151

52-
Note that if you can run the model on many tasks, this can lead to an excessively large readme frontmatter.
52+
Note that running the model on many tasks may lead to a huge readme front matter.
5353

5454
4. **Wait for a refresh the leaderboard:**
5555

@@ -70,4 +70,4 @@ The leaderboard [automatically refreshes daily](https://github.com/embeddings-be
7070

7171
###### Instantiating the Model with Prompts
7272

73-
If you are unable to directly add the prompts in the model configuration, you can instantiate the model using the `sentence_transformers_loader` and pass `prompts` as an argument. For more details, see the `mteb/models/bge_models.py` file.
73+
If you are unable to directly add the prompts in the model configuration, you can instantiate the model using the `sentence_transformers_loader` and pass `prompts` as an argument. For more details, see the `mteb/models/bge_models.py` file.

mteb/evaluation/evaluators/BitextMiningEvaluator.py

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -44,9 +44,13 @@ def __call__(self, model: Encoder, *, encode_kwargs: dict[str, Any] = {}):
4444

4545
def compute_metrics(self, model: Encoder, encode_kwargs: dict[str, Any] = {}):
4646
pair_elements = {p for pair in self.pairs for p in pair}
47-
subsets = [
48-
col for col in self.sentences.features.keys() if col in pair_elements
49-
]
47+
if isinstance(self.sentences, Dataset):
48+
subsets = [
49+
col for col in self.sentences.features.keys() if col in pair_elements
50+
]
51+
else:
52+
# BUCC outputs a dict instead of a Dataset
53+
subsets = list(pair_elements)
5054
n_subsets = len(subsets)
5155

5256
embeddings = {}

mteb/evaluation/evaluators/RetrievalEvaluator.py

Lines changed: 12 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -167,12 +167,12 @@ def search(
167167
self.corpus_embeddings[request_qid].append(sub_corpus_embeddings)
168168

169169
# Compute similarites using self defined similarity otherwise default to cosine-similarity
170-
similarity_scores = cos_sim(query_embeddings, sub_corpus_embeddings)
171170
if hasattr(self.model, "similarity"):
172171
similarity_scores = self.model.similarity(
173-
float(self.model.similarity(e1, e2))
174-
for e1, e2 in zip(query_embeddings, sub_corpus_embeddings)
172+
query_embeddings, sub_corpus_embeddings
175173
)
174+
else:
175+
similarity_scores = cos_sim(query_embeddings, sub_corpus_embeddings)
176176
is_nan = torch.isnan(similarity_scores)
177177
if is_nan.sum() > 0:
178178
logger.warning(
@@ -307,15 +307,17 @@ def search_cross_encoder(
307307
assert (
308308
len(queries_in_pair) == len(corpus_in_pair) == len(instructions_in_pair)
309309
)
310+
corpus_in_pair = corpus_to_str(list(corpus_in_pair))
310311

311312
if hasattr(self.model, "model") and isinstance(
312313
self.model.model, CrossEncoder
313314
):
314315
# can't take instructions, so add them here
315-
queries_in_pair = [
316-
f"{q} {i}".strip()
317-
for i, q in zip(instructions_in_pair, queries_in_pair)
318-
]
316+
if instructions_in_pair[0] is not None:
317+
queries_in_pair = [
318+
f"{q} {i}".strip()
319+
for i, q in zip(instructions_in_pair, queries_in_pair)
320+
]
319321
scores = self.model.predict(list(zip(queries_in_pair, corpus_in_pair))) # type: ignore
320322
else:
321323
# may use the instructions in a unique way, so give them also
@@ -374,6 +376,9 @@ def __init__(self, model, **kwargs):
374376
self.save_corpus_embeddings = kwargs.get("save_corpus_embeddings", False)
375377
self.corpus_embeddings = {}
376378

379+
if hasattr(self.model, "similarity") and callable(self.model.similarity):
380+
self.similarity = self.model.similarity
381+
377382
def encode_corpus(
378383
self,
379384
corpus: list[dict[str, str]],

0 commit comments

Comments
 (0)