You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
!pip show flash-attn
Name: flash-attn
Version: 2.6.3
Summary: Flash Attention: Fast and Memory-Efficient Exact Attention
Home-page: https://github.com/Dao-AILab/flash-attention
Author: Tri Dao
Author-email: [email protected]
License:
Requires: einops, torch
Required-by:
2025-01-02 23:49:56.428686: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-01-02 23:49:56.442158: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-01-02 23:49:56.445995: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
INFO:mteb.cli:Running with parameters: Namespace(model='openbmb/MiniCPM-Embedding', task_types=None, categories=None, tasks=['STS12'], languages=None, benchmarks=None, device=None, output_folder='/data/niklas/results/results', verbosity=2, co2_tracker=True, eval_splits=None, model_revision=None, batch_size=16, overwrite=False, save_predictions=False, func=<function run at 0x7f940fc800d0>)
You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]
Loading checkpoint shards: 50%|█████ | 1/2 [00:01<00:01, 1.04s/it]
Loading checkpoint shards: 100%|██████████| 2/2 [00:01<00:00, 1.64it/s]
Loading checkpoint shards: 100%|██████████| 2/2 [00:01<00:00, 1.48it/s]
WARNING:mteb.models.sentence_transformer_wrapper:Model prompts are not in the expected format. Ignoring them.
INFO:mteb.evaluation.MTEB:
## Evaluating 1 tasks:
─────────────────────────────── Selected tasks ────────────────────────────────
STS
- STS12, s2s
INFO:mteb.evaluation.MTEB:
********************** Evaluating STS12 **********************
No config specified, defaulting to the single config: sts12-sts/default
INFO:datasets.builder:No config specified, defaulting to the single config: sts12-sts/default
Loading Dataset Infos from /env/lib/conda/gritkto/lib/python3.10/site-packages/datasets/packaged_modules/json
INFO:datasets.info:Loading Dataset Infos from /env/lib/conda/gritkto/lib/python3.10/site-packages/datasets/packaged_modules/json
Overwrite dataset info from restored data version if exists.
INFO:datasets.builder:Overwrite dataset info from restored data version if exists.
Loading Dataset info from /data/huggingface/datasets/mteb___sts12-sts/default/0.0.0/a0d554a64d88156834ff5ae9920b964011b16384
INFO:datasets.info:Loading Dataset info from /data/huggingface/datasets/mteb___sts12-sts/default/0.0.0/a0d554a64d88156834ff5ae9920b964011b16384
Found cached dataset sts12-sts (/data/huggingface/datasets/mteb___sts12-sts/default/0.0.0/a0d554a64d88156834ff5ae9920b964011b16384)
INFO:datasets.builder:Found cached dataset sts12-sts (/data/huggingface/datasets/mteb___sts12-sts/default/0.0.0/a0d554a64d88156834ff5ae9920b964011b16384)
Loading Dataset info from /data/huggingface/datasets/mteb___sts12-sts/default/0.0.0/a0d554a64d88156834ff5ae9920b964011b16384
INFO:datasets.info:Loading Dataset info from /data/huggingface/datasets/mteb___sts12-sts/default/0.0.0/a0d554a64d88156834ff5ae9920b964011b16384
INFO:mteb.abstasks.AbsTask:
Task: STS12, split: test, subset: default. Running...
INFO:mteb.models.sentence_transformer_wrapper:No model prompts found for task=STS12 prompt_type=None
INFO:mteb.models.sentence_transformer_wrapper:Encoding 3108 sentences.
WARNING:transformers_modules.openbmb.MiniCPM-Embedding.c0cb2de33fb366e17c30f9d53142ff11bc18e049.modeling_minicpm:The input hidden states seems to be silently casted in float32, this might be related to the fact you have upcasted embedding or layer norm layers in float32. We will cast back the input in torch.float32.
ERROR:mteb.evaluation.MTEB:Error while evaluating STS12: FlashAttention only support fp16 and bf16 data type
Traceback (most recent call last):
File "/env/lib/conda/gritkto/bin/mteb", line 8, in <module>
sys.exit(main())
File "/data/niklas/mteb/mteb/cli.py", line 387, in main
args.func(args)
File "/data/niklas/mteb/mteb/cli.py", line 145, in run
eval.run(
File "/data/niklas/mteb/mteb/evaluation/MTEB.py", line 630, in run
raise e
File "/data/niklas/mteb/mteb/evaluation/MTEB.py", line 569, in run
results, tick, tock = self._run_eval(
File "/data/niklas/mteb/mteb/evaluation/MTEB.py", line 304, in _run_eval
results = task.evaluate(
File "/data/niklas/mteb/mteb/abstasks/AbsTask.py", line 126, in evaluate
scores[hf_subset] = self._evaluate_subset(
File "/data/niklas/mteb/mteb/abstasks/AbsTaskSTS.py", line 88, in _evaluate_subset
scores = evaluator(model, encode_kwargs=encode_kwargs)
File "/data/niklas/mteb/mteb/evaluation/evaluators/STSEvaluator.py", line 47, in __call__
embeddings1 = model.encode(
File "/data/niklas/mteb/mteb/models/sentence_transformer_wrapper.py", line 108, in encode
embeddings = self.model.encode(
File "/env/lib/conda/gritkto/lib/python3.10/site-packages/sentence_transformers/SentenceTransformer.py", line 623, in encode
out_features = self.forward(features, **kwargs)
File "/env/lib/conda/gritkto/lib/python3.10/site-packages/sentence_transformers/SentenceTransformer.py", line 690, in forward
input = module(input, **module_kwargs)
File "/env/lib/conda/gritkto/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/env/lib/conda/gritkto/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/env/lib/conda/gritkto/lib/python3.10/site-packages/sentence_transformers/models/Transformer.py", line 393, in forward
output_states = self.auto_model(**trans_features, **kwargs, return_dict=False)
File "/env/lib/conda/gritkto/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/env/lib/conda/gritkto/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/data/huggingface/modules/transformers_modules/openbmb/MiniCPM-Embedding/c0cb2de33fb366e17c30f9d53142ff11bc18e049/modeling_minicpm.py", line 1089, in forward
layer_outputs = decoder_layer(
File "/env/lib/conda/gritkto/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/env/lib/conda/gritkto/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/data/huggingface/modules/transformers_modules/openbmb/MiniCPM-Embedding/c0cb2de33fb366e17c30f9d53142ff11bc18e049/modeling_minicpm.py", line 812, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/env/lib/conda/gritkto/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/env/lib/conda/gritkto/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/data/huggingface/modules/transformers_modules/openbmb/MiniCPM-Embedding/c0cb2de33fb366e17c30f9d53142ff11bc18e049/modeling_minicpm.py", line 565, in forward
attn_output = self._flash_attention_forward(
File "/data/huggingface/modules/transformers_modules/openbmb/MiniCPM-Embedding/c0cb2de33fb366e17c30f9d53142ff11bc18e049/modeling_minicpm.py", line 613, in _flash_attention_forward
attn_output_unpad = flash_attn_varlen_func(
File "/env/lib/conda/gritkto/lib/python3.10/site-packages/flash_attn/flash_attn_interface.py", line 1124, in flash_attn_varlen_func
return FlashAttnVarlenFunc.apply(
File "/env/lib/conda/gritkto/lib/python3.10/site-packages/torch/autograd/function.py", line 574, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/env/lib/conda/gritkto/lib/python3.10/site-packages/flash_attn/flash_attn_interface.py", line 620, in forward
out, q, k, v, out_padded, softmax_lse, S_dmask, rng_state = _flash_attn_varlen_forward(
File "/env/lib/conda/gritkto/lib/python3.10/site-packages/flash_attn/flash_attn_interface.py", line 90, in _flash_attn_varlen_forward
out, q, k, v, out_padded, softmax_lse, S_dmask, rng_state = flash_attn_cuda.varlen_fwd(
RuntimeError: FlashAttention only support fp16 and bf16 data type
The text was updated successfully, but these errors were encountered:
The text was updated successfully, but these errors were encountered: