Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUCC fails #1694

Open
Muennighoff opened this issue Jan 2, 2025 · 4 comments
Open

BUCC fails #1694

Muennighoff opened this issue Jan 2, 2025 · 4 comments

Comments

@Muennighoff
Copy link
Contributor

2025-01-02 22:39:00.672963: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-01-02 22:39:00.686375: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-01-02 22:39:00.690128: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
INFO:mteb.cli:Running with parameters: Namespace(model='silma-ai/silma-embeddding-matryoshka-v0.1', task_types=None, categories=None, tasks=['BUCC'], languages=None, benchmarks=None, device=None, output_folder='/data/niklas/results/results', verbosity=2, co2_tracker=True, eval_splits=None, model_revision=None, batch_size=64, overwrite=False, save_predictions=False, func=<function run at 0x7fca802180d0>)
WARNING:mteb.model_meta:Loader not specified for model silma-ai/silma-embeddding-matryoshka-v0.1, loading using sentence transformers.
INFO:mteb.evaluation.MTEB:

## Evaluating 1 tasks:
─────────────────────────────── Selected tasks  ────────────────────────────────
BitextMining
    - BUCC, s2s, multilingual 4 / 4 Subsets


INFO:mteb.evaluation.MTEB:

********************** Evaluating BUCC **********************
WARNING:mteb.abstasks.AbsTask:Dataset 'BUCC' is superseded by 'BUCC.v2', you might consider using the newer version of the dataset.
Loading Dataset Infos from /env/lib/conda/gritkto/lib/python3.10/site-packages/datasets/packaged_modules/json
INFO:datasets.info:Loading Dataset Infos from /env/lib/conda/gritkto/lib/python3.10/site-packages/datasets/packaged_modules/json
Overwrite dataset info from restored data version if exists.
INFO:datasets.builder:Overwrite dataset info from restored data version if exists.
Loading Dataset info from /data/huggingface/datasets/mteb___bucc-bitext-mining/de-en/0.0.0/1739dc11ffe9b7bfccd7f3d585aeb4c544fc6677
INFO:datasets.info:Loading Dataset info from /data/huggingface/datasets/mteb___bucc-bitext-mining/de-en/0.0.0/1739dc11ffe9b7bfccd7f3d585aeb4c544fc6677
Found cached dataset bucc-bitext-mining (/data/huggingface/datasets/mteb___bucc-bitext-mining/de-en/0.0.0/1739dc11ffe9b7bfccd7f3d585aeb4c544fc6677)
INFO:datasets.builder:Found cached dataset bucc-bitext-mining (/data/huggingface/datasets/mteb___bucc-bitext-mining/de-en/0.0.0/1739dc11ffe9b7bfccd7f3d585aeb4c544fc6677)
Loading Dataset info from /data/huggingface/datasets/mteb___bucc-bitext-mining/de-en/0.0.0/1739dc11ffe9b7bfccd7f3d585aeb4c544fc6677
INFO:datasets.info:Loading Dataset info from /data/huggingface/datasets/mteb___bucc-bitext-mining/de-en/0.0.0/1739dc11ffe9b7bfccd7f3d585aeb4c544fc6677
Loading Dataset Infos from /env/lib/conda/gritkto/lib/python3.10/site-packages/datasets/packaged_modules/json
INFO:datasets.info:Loading Dataset Infos from /env/lib/conda/gritkto/lib/python3.10/site-packages/datasets/packaged_modules/json
Overwrite dataset info from restored data version if exists.
INFO:datasets.builder:Overwrite dataset info from restored data version if exists.
Loading Dataset info from /data/huggingface/datasets/mteb___bucc-bitext-mining/fr-en/0.0.0/1739dc11ffe9b7bfccd7f3d585aeb4c544fc6677
INFO:datasets.info:Loading Dataset info from /data/huggingface/datasets/mteb___bucc-bitext-mining/fr-en/0.0.0/1739dc11ffe9b7bfccd7f3d585aeb4c544fc6677
Found cached dataset bucc-bitext-mining (/data/huggingface/datasets/mteb___bucc-bitext-mining/fr-en/0.0.0/1739dc11ffe9b7bfccd7f3d585aeb4c544fc6677)
INFO:datasets.builder:Found cached dataset bucc-bitext-mining (/data/huggingface/datasets/mteb___bucc-bitext-mining/fr-en/0.0.0/1739dc11ffe9b7bfccd7f3d585aeb4c544fc6677)
Loading Dataset info from /data/huggingface/datasets/mteb___bucc-bitext-mining/fr-en/0.0.0/1739dc11ffe9b7bfccd7f3d585aeb4c544fc6677
INFO:datasets.info:Loading Dataset info from /data/huggingface/datasets/mteb___bucc-bitext-mining/fr-en/0.0.0/1739dc11ffe9b7bfccd7f3d585aeb4c544fc6677
Loading Dataset Infos from /env/lib/conda/gritkto/lib/python3.10/site-packages/datasets/packaged_modules/json
INFO:datasets.info:Loading Dataset Infos from /env/lib/conda/gritkto/lib/python3.10/site-packages/datasets/packaged_modules/json
Overwrite dataset info from restored data version if exists.
INFO:datasets.builder:Overwrite dataset info from restored data version if exists.
Loading Dataset info from /data/huggingface/datasets/mteb___bucc-bitext-mining/ru-en/0.0.0/1739dc11ffe9b7bfccd7f3d585aeb4c544fc6677
INFO:datasets.info:Loading Dataset info from /data/huggingface/datasets/mteb___bucc-bitext-mining/ru-en/0.0.0/1739dc11ffe9b7bfccd7f3d585aeb4c544fc6677
Found cached dataset bucc-bitext-mining (/data/huggingface/datasets/mteb___bucc-bitext-mining/ru-en/0.0.0/1739dc11ffe9b7bfccd7f3d585aeb4c544fc6677)
INFO:datasets.builder:Found cached dataset bucc-bitext-mining (/data/huggingface/datasets/mteb___bucc-bitext-mining/ru-en/0.0.0/1739dc11ffe9b7bfccd7f3d585aeb4c544fc6677)
Loading Dataset info from /data/huggingface/datasets/mteb___bucc-bitext-mining/ru-en/0.0.0/1739dc11ffe9b7bfccd7f3d585aeb4c544fc6677
INFO:datasets.info:Loading Dataset info from /data/huggingface/datasets/mteb___bucc-bitext-mining/ru-en/0.0.0/1739dc11ffe9b7bfccd7f3d585aeb4c544fc6677
Loading Dataset Infos from /env/lib/conda/gritkto/lib/python3.10/site-packages/datasets/packaged_modules/json
INFO:datasets.info:Loading Dataset Infos from /env/lib/conda/gritkto/lib/python3.10/site-packages/datasets/packaged_modules/json
Overwrite dataset info from restored data version if exists.
INFO:datasets.builder:Overwrite dataset info from restored data version if exists.
Loading Dataset info from /data/huggingface/datasets/mteb___bucc-bitext-mining/zh-en/0.0.0/1739dc11ffe9b7bfccd7f3d585aeb4c544fc6677
INFO:datasets.info:Loading Dataset info from /data/huggingface/datasets/mteb___bucc-bitext-mining/zh-en/0.0.0/1739dc11ffe9b7bfccd7f3d585aeb4c544fc6677
Found cached dataset bucc-bitext-mining (/data/huggingface/datasets/mteb___bucc-bitext-mining/zh-en/0.0.0/1739dc11ffe9b7bfccd7f3d585aeb4c544fc6677)
INFO:datasets.builder:Found cached dataset bucc-bitext-mining (/data/huggingface/datasets/mteb___bucc-bitext-mining/zh-en/0.0.0/1739dc11ffe9b7bfccd7f3d585aeb4c544fc6677)
Loading Dataset info from /data/huggingface/datasets/mteb___bucc-bitext-mining/zh-en/0.0.0/1739dc11ffe9b7bfccd7f3d585aeb4c544fc6677
INFO:datasets.info:Loading Dataset info from /data/huggingface/datasets/mteb___bucc-bitext-mining/zh-en/0.0.0/1739dc11ffe9b7bfccd7f3d585aeb4c544fc6677
ERROR:mteb.evaluation.MTEB:Error while evaluating BUCC: "Column gold not in the dataset. Current columns in the dataset: ['sentence1', 'sentence2', 'lang']"
Traceback (most recent call last):
  File "/env/lib/conda/gritkto/bin/mteb", line 8, in <module>
    sys.exit(main())
  File "/data/niklas/mteb/mteb/cli.py", line 387, in main
    args.func(args)
  File "/data/niklas/mteb/mteb/cli.py", line 145, in run
    eval.run(
  File "/data/niklas/mteb/mteb/evaluation/MTEB.py", line 630, in run
    raise e
  File "/data/niklas/mteb/mteb/evaluation/MTEB.py", line 534, in run
    task.load_data(**kwargs)
  File "/data/niklas/mteb/mteb/abstasks/MultiSubsetLoader.py", line 17, in load_data
    self.dataset_transform()
  File "/data/niklas/mteb/mteb/tasks/BitextMining/multilingual/BUCCBitextMining.py", line 67, in dataset_transform
    gold = data["gold"][0]
  File "/env/lib/conda/gritkto/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 2872, in __getitem__
    return self._getitem(key)
  File "/env/lib/conda/gritkto/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 2856, in _getitem
    pa_subtable = query_table(self._data, key, indices=self._indices)
  File "/env/lib/conda/gritkto/lib/python3.10/site-packages/datasets/formatting/formatting.py", line 590, in query_table
    _check_valid_column_key(key, table.column_names)
  File "/env/lib/conda/gritkto/lib/python3.10/site-packages/datasets/formatting/formatting.py", line 527, in _check_valid_column_key
    raise KeyError(f"Column {key} not in the dataset. Current columns in the dataset: {columns}")
KeyError: "Column gold not in the dataset. Current columns in the dataset: ['sentence1', 'sentence2', 'lang']"

@Samoed Samoed mentioned this issue Jan 4, 2025
2 tasks
@Samoed
Copy link
Collaborator

Samoed commented Jan 4, 2025

It would work without dataset_transform, but after that the results will not match.

Leaderboard Without dataset_transform
de-en 97.86 98.9579
fr-en 92.66 97.3538
ru-en 93.5 96.9278
zh-en 88.79 96.4016

I tried running it with pre-#1674 (v1.25.15), but it still gave an error because it's not a datasets.Dataset. When I cast it to Dataset, it gave an error due to columns having different sizes. I'm curious about the intuition behind the gold column, as other bitext datasets don't seem to use it.

sentence1 = [sentence1[i] for (i, j) in gold]

I think maybe second parameter of golden supposed to be as sentence2, but then never used?

@isaac-chung
Copy link
Collaborator

I agree. Could we try simply using sentence1 and sentence2 as is, and compare that score to the leaderboard?

Separately, there's also BUCC.v2. Maybe this would supersede BUCC?

@Samoed
Copy link
Collaborator

Samoed commented Jan 4, 2025

BUCC is already superseded by BUCC.v2. If I run it without dataset_transform, it produces the same scores as BUCC.v2, which I provided earlier.

@isaac-chung
Copy link
Collaborator

Oh I see, thanks for explaining.

After taking a look at the paper, the MTEB dataset seems to contain only "gold" sets. e.g. de-en has 9580 rows. This actually leads me to believe that this is the train split, and not the test split.

Nonetheless, as this contains cross-lingual pairs, to leave this as a bitext mining task, we can remove dataset_transform. However, judging from the paper's eval criteria (Section 3), it feels more like a cross-lingual retrieval task, where the corpus is a mix of monolingual sentences in and out of the golden sets.

In the results of a system, a true positive TP is a pair of sentences that is present in the gold standard and a false positive FP is a pair of sentences that is not present in the gold standard. A false negative FN is a pair of sentences present in the gold standard but absent from system results. Precision, Recall and F1-score were then computed using the usual formulas.

@Samoed Samoed mentioned this issue Jan 4, 2025
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants