Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

openai/text-embedding-3-large fails on sts22 #1650

Open
Muennighoff opened this issue Jan 1, 2025 · 0 comments
Open

openai/text-embedding-3-large fails on sts22 #1650

Muennighoff opened this issue Jan 1, 2025 · 0 comments

Comments

@Muennighoff
Copy link
Contributor

2025-01-01 02:59:59.297956: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-01-01 02:59:59.311878: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-01-01 02:59:59.315723: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
INFO:mteb.cli:Running with parameters: Namespace(model='openai/text-embedding-3-large', task_types=None, categories=None, tasks=['STS22'], languages=None, benchmarks=None, device=None, output_folder='/data/niklas/results/results', verbosity=2, co2_tracker=True, eval_splits=None, model_revision=None, batch_size=64, overwrite=False, save_predictions=False, func=<function run at 0x7ff797b0d630>)
INFO:mteb.evaluation.MTEB:

Evaluating 1 tasks:

─────────────────────────────── Selected tasks ────────────────────────────────
STS
- STS22, p2p, multilingual 18 / 18 Subsets

INFO:mteb.evaluation.MTEB:

********************** Evaluating STS22 **********************
WARNING:mteb.abstasks.AbsTask:Dataset 'STS22' is superseded by 'STS22.v2', you might consider using the newer version of the dataset.
Loading Dataset Infos from /env/lib/conda/gritkto/lib/python3.10/site-packages/datasets/packaged_modules/json
INFO:datasets.info:Loading Dataset Infos from /env/lib/conda/gritkto/lib/python3.10/site-packages/datasets/packaged_modules/json
Overwrite dataset info from restored data version if exists.
INFO:datasets.builder:Overwrite dataset info from restored data version if exists.
Loading Dataset info from /data/huggingface/datasets/mteb___sts22-crosslingual-sts/default/0.0.0/de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3
INFO:datasets.info:Loading Dataset info from /data/huggingface/datasets/mteb___sts22-crosslingual-sts/default/0.0.0/de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3
Found cached dataset sts22-crosslingual-sts (/data/huggingface/datasets/mteb___sts22-crosslingual-sts/default/0.0.0/de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3)
INFO:datasets.builder:Found cached dataset sts22-crosslingual-sts (/data/huggingface/datasets/mteb___sts22-crosslingual-sts/default/0.0.0/de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3)
Loading Dataset info from /data/huggingface/datasets/mteb___sts22-crosslingual-sts/default/0.0.0/de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3
INFO:datasets.info:Loading Dataset info from /data/huggingface/datasets/mteb___sts22-crosslingual-sts/default/0.0.0/de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3
INFO:mteb.abstasks.AbsTask:
Task: STS22, split: test, subset: zh-en. Running...
--- Logging error ---
Traceback (most recent call last):
File "/data/niklas/mteb/mteb/models/openai_models.py", line 94, in encode
response = self._client.embeddings.create(
File "/env/lib/conda/gritkto/lib/python3.10/site-packages/openai/resources/embeddings.py", line 114, in create
return self._post(
File "/env/lib/conda/gritkto/lib/python3.10/site-packages/openai/_base_client.py", line 1260, in post
return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
File "/env/lib/conda/gritkto/lib/python3.10/site-packages/openai/_base_client.py", line 937, in request
return self._request(
File "/env/lib/conda/gritkto/lib/python3.10/site-packages/openai/_base_client.py", line 1041, in _request
raise self._make_status_error_from_response(err.response) from None
openai.BadRequestError: Error code: 400 - {'error': {'message': "'$.input' is invalid. Please check the API reference: https://platform.openai.com/docs/api-reference.", 'type': 'invalid_request_error', 'param': None, 'code': None}}

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/env/lib/conda/gritkto/lib/python3.10/logging/init.py", line 1100, in emit
msg = self.format(record)
File "/env/lib/conda/gritkto/lib/python3.10/logging/init.py", line 943, in format
return fmt.format(record)
File "/env/lib/conda/gritkto/lib/python3.10/logging/init.py", line 678, in format
record.message = record.getMessage()
File "/env/lib/conda/gritkto/lib/python3.10/logging/init.py", line 368, in getMessage
msg = msg % self.args
TypeError: not all arguments converted during string formatting
Call stack:
File "/env/lib/conda/gritkto/bin/mteb", line 8, in
sys.exit(main())
File "/data/niklas/mteb/mteb/cli.py", line 387, in main
args.func(args)
File "/data/niklas/mteb/mteb/cli.py", line 145, in run
eval.run(
File "/data/niklas/mteb/mteb/evaluation/MTEB.py", line 576, in run
results, tick, tock = self._run_eval(
File "/data/niklas/mteb/mteb/evaluation/MTEB.py", line 304, in _run_eval
results = task.evaluate(
File "/data/niklas/mteb/mteb/abstasks/AbsTask.py", line 126, in evaluate
scores[hf_subset] = self._evaluate_subset(
File "/data/niklas/mteb/mteb/abstasks/AbsTaskSTS.py", line 88, in _evaluate_subset
scores = evaluator(model, encode_kwargs=encode_kwargs)
File "/data/niklas/mteb/mteb/evaluation/evaluators/STSEvaluator.py", line 47, in call
embeddings1 = model.encode(
File "/data/niklas/mteb/mteb/models/openai_models.py", line 102, in encode
logger.info("Sleeping for 10 seconds due to error", e)
Message: 'Sleeping for 10 seconds due to error'
Arguments: (BadRequestError('Error code: 400 - {'error': {'message': "'$.input' is invalid. Please check the API reference: https://platform.openai.com/docs/api-reference.", 'type': 'invalid_request_error', 'param': None, 'code': None}}'),)
--- Logging error ---
Traceback (most recent call last):
File "/data/niklas/mteb/mteb/models/openai_models.py", line 107, in encode
response = self._client.embeddings.create(
File "/env/lib/conda/gritkto/lib/python3.10/site-packages/openai/resources/embeddings.py", line 114, in create
return self._post(
File "/env/lib/conda/gritkto/lib/python3.10/site-packages/openai/_base_client.py", line 1260, in post
return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
File "/env/lib/conda/gritkto/lib/python3.10/site-packages/openai/_base_client.py", line 937, in request
return self._request(
File "/env/lib/conda/gritkto/lib/python3.10/site-packages/openai/_base_client.py", line 1041, in _request
raise self._make_status_error_from_response(err.response) from None
openai.BadRequestError: Error code: 400 - {'error': {'message': "'$.input' is invalid. Please check the API reference: https://platform.openai.com/docs/api-reference.", 'type': 'invalid_request_error', 'param': None, 'code': None}}

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/env/lib/conda/gritkto/lib/python3.10/logging/init.py", line 1100, in emit
msg = self.format(record)
File "/env/lib/conda/gritkto/lib/python3.10/logging/init.py", line 943, in format
return fmt.format(record)
File "/env/lib/conda/gritkto/lib/python3.10/logging/init.py", line 678, in format
record.message = record.getMessage()
File "/env/lib/conda/gritkto/lib/python3.10/logging/init.py", line 368, in getMessage
msg = msg % self.args
TypeError: not all arguments converted during string formatting
Call stack:
File "/env/lib/conda/gritkto/bin/mteb", line 8, in
sys.exit(main())
File "/data/niklas/mteb/mteb/cli.py", line 387, in main
args.func(args)
File "/data/niklas/mteb/mteb/cli.py", line 145, in run
eval.run(
File "/data/niklas/mteb/mteb/evaluation/MTEB.py", line 576, in run
results, tick, tock = self._run_eval(
File "/data/niklas/mteb/mteb/evaluation/MTEB.py", line 304, in _run_eval
results = task.evaluate(
File "/data/niklas/mteb/mteb/abstasks/AbsTask.py", line 126, in evaluate
scores[hf_subset] = self._evaluate_subset(
File "/data/niklas/mteb/mteb/abstasks/AbsTaskSTS.py", line 88, in _evaluate_subset
scores = evaluator(model, encode_kwargs=encode_kwargs)
File "/data/niklas/mteb/mteb/evaluation/evaluators/STSEvaluator.py", line 47, in call
embeddings1 = model.encode(
File "/data/niklas/mteb/mteb/models/openai_models.py", line 114, in encode
logger.info("Sleeping for 60 seconds due to error", e)
Message: 'Sleeping for 60 seconds due to error'
Arguments: (BadRequestError('Error code: 400 - {'error': {'message': "'$.input' is invalid. Please check the API reference: https://platform.openai.com/docs/api-reference.", 'type': 'invalid_request_error', 'param': None, 'code': None}}'),)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant