Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

openbmb/MiniCPM-Embedding still fails #1696

Open
Muennighoff opened this issue Jan 2, 2025 · 1 comment · May be fixed by #1705
Open

openbmb/MiniCPM-Embedding still fails #1696

Muennighoff opened this issue Jan 2, 2025 · 1 comment · May be fixed by #1705

Comments

@Muennighoff
Copy link
Contributor

!pip show flash-attn
Name: flash-attn
Version: 2.6.3
Summary: Flash Attention: Fast and Memory-Efficient Exact Attention
Home-page: https://github.com/Dao-AILab/flash-attention
Author: Tri Dao
Author-email: [email protected]
License: 
Requires: einops, torch
Required-by: 
2025-01-02 23:49:56.428686: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-01-02 23:49:56.442158: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-01-02 23:49:56.445995: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
INFO:mteb.cli:Running with parameters: Namespace(model='openbmb/MiniCPM-Embedding', task_types=None, categories=None, tasks=['STS12'], languages=None, benchmarks=None, device=None, output_folder='/data/niklas/results/results', verbosity=2, co2_tracker=True, eval_splits=None, model_revision=None, batch_size=16, overwrite=False, save_predictions=False, func=<function run at 0x7f940fc800d0>)
You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]
Loading checkpoint shards:  50%|█████     | 1/2 [00:01<00:01,  1.04s/it]
Loading checkpoint shards: 100%|██████████| 2/2 [00:01<00:00,  1.64it/s]
Loading checkpoint shards: 100%|██████████| 2/2 [00:01<00:00,  1.48it/s]
WARNING:mteb.models.sentence_transformer_wrapper:Model prompts are not in the expected format. Ignoring them.
INFO:mteb.evaluation.MTEB:

## Evaluating 1 tasks:
─────────────────────────────── Selected tasks  ────────────────────────────────
STS
    - STS12, s2s


INFO:mteb.evaluation.MTEB:

********************** Evaluating STS12 **********************
No config specified, defaulting to the single config: sts12-sts/default
INFO:datasets.builder:No config specified, defaulting to the single config: sts12-sts/default
Loading Dataset Infos from /env/lib/conda/gritkto/lib/python3.10/site-packages/datasets/packaged_modules/json
INFO:datasets.info:Loading Dataset Infos from /env/lib/conda/gritkto/lib/python3.10/site-packages/datasets/packaged_modules/json
Overwrite dataset info from restored data version if exists.
INFO:datasets.builder:Overwrite dataset info from restored data version if exists.
Loading Dataset info from /data/huggingface/datasets/mteb___sts12-sts/default/0.0.0/a0d554a64d88156834ff5ae9920b964011b16384
INFO:datasets.info:Loading Dataset info from /data/huggingface/datasets/mteb___sts12-sts/default/0.0.0/a0d554a64d88156834ff5ae9920b964011b16384
Found cached dataset sts12-sts (/data/huggingface/datasets/mteb___sts12-sts/default/0.0.0/a0d554a64d88156834ff5ae9920b964011b16384)
INFO:datasets.builder:Found cached dataset sts12-sts (/data/huggingface/datasets/mteb___sts12-sts/default/0.0.0/a0d554a64d88156834ff5ae9920b964011b16384)
Loading Dataset info from /data/huggingface/datasets/mteb___sts12-sts/default/0.0.0/a0d554a64d88156834ff5ae9920b964011b16384
INFO:datasets.info:Loading Dataset info from /data/huggingface/datasets/mteb___sts12-sts/default/0.0.0/a0d554a64d88156834ff5ae9920b964011b16384
INFO:mteb.abstasks.AbsTask:
Task: STS12, split: test, subset: default. Running...
INFO:mteb.models.sentence_transformer_wrapper:No model prompts found for task=STS12 prompt_type=None
INFO:mteb.models.sentence_transformer_wrapper:Encoding 3108 sentences.
WARNING:transformers_modules.openbmb.MiniCPM-Embedding.c0cb2de33fb366e17c30f9d53142ff11bc18e049.modeling_minicpm:The input hidden states seems to be silently casted in float32, this might be related to the fact you have upcasted embedding or layer norm layers in float32. We will cast back the input in torch.float32.
ERROR:mteb.evaluation.MTEB:Error while evaluating STS12: FlashAttention only support fp16 and bf16 data type
Traceback (most recent call last):
  File "/env/lib/conda/gritkto/bin/mteb", line 8, in <module>
    sys.exit(main())
  File "/data/niklas/mteb/mteb/cli.py", line 387, in main
    args.func(args)
  File "/data/niklas/mteb/mteb/cli.py", line 145, in run
    eval.run(
  File "/data/niklas/mteb/mteb/evaluation/MTEB.py", line 630, in run
    raise e
  File "/data/niklas/mteb/mteb/evaluation/MTEB.py", line 569, in run
    results, tick, tock = self._run_eval(
  File "/data/niklas/mteb/mteb/evaluation/MTEB.py", line 304, in _run_eval
    results = task.evaluate(
  File "/data/niklas/mteb/mteb/abstasks/AbsTask.py", line 126, in evaluate
    scores[hf_subset] = self._evaluate_subset(
  File "/data/niklas/mteb/mteb/abstasks/AbsTaskSTS.py", line 88, in _evaluate_subset
    scores = evaluator(model, encode_kwargs=encode_kwargs)
  File "/data/niklas/mteb/mteb/evaluation/evaluators/STSEvaluator.py", line 47, in __call__
    embeddings1 = model.encode(
  File "/data/niklas/mteb/mteb/models/sentence_transformer_wrapper.py", line 108, in encode
    embeddings = self.model.encode(
  File "/env/lib/conda/gritkto/lib/python3.10/site-packages/sentence_transformers/SentenceTransformer.py", line 623, in encode
    out_features = self.forward(features, **kwargs)
  File "/env/lib/conda/gritkto/lib/python3.10/site-packages/sentence_transformers/SentenceTransformer.py", line 690, in forward
    input = module(input, **module_kwargs)
  File "/env/lib/conda/gritkto/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/env/lib/conda/gritkto/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/env/lib/conda/gritkto/lib/python3.10/site-packages/sentence_transformers/models/Transformer.py", line 393, in forward
    output_states = self.auto_model(**trans_features, **kwargs, return_dict=False)
  File "/env/lib/conda/gritkto/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/env/lib/conda/gritkto/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data/huggingface/modules/transformers_modules/openbmb/MiniCPM-Embedding/c0cb2de33fb366e17c30f9d53142ff11bc18e049/modeling_minicpm.py", line 1089, in forward
    layer_outputs = decoder_layer(
  File "/env/lib/conda/gritkto/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/env/lib/conda/gritkto/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data/huggingface/modules/transformers_modules/openbmb/MiniCPM-Embedding/c0cb2de33fb366e17c30f9d53142ff11bc18e049/modeling_minicpm.py", line 812, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/env/lib/conda/gritkto/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/env/lib/conda/gritkto/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data/huggingface/modules/transformers_modules/openbmb/MiniCPM-Embedding/c0cb2de33fb366e17c30f9d53142ff11bc18e049/modeling_minicpm.py", line 565, in forward
    attn_output = self._flash_attention_forward(
  File "/data/huggingface/modules/transformers_modules/openbmb/MiniCPM-Embedding/c0cb2de33fb366e17c30f9d53142ff11bc18e049/modeling_minicpm.py", line 613, in _flash_attention_forward
    attn_output_unpad = flash_attn_varlen_func(
  File "/env/lib/conda/gritkto/lib/python3.10/site-packages/flash_attn/flash_attn_interface.py", line 1124, in flash_attn_varlen_func
    return FlashAttnVarlenFunc.apply(
  File "/env/lib/conda/gritkto/lib/python3.10/site-packages/torch/autograd/function.py", line 574, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/env/lib/conda/gritkto/lib/python3.10/site-packages/flash_attn/flash_attn_interface.py", line 620, in forward
    out, q, k, v, out_padded, softmax_lse, S_dmask, rng_state = _flash_attn_varlen_forward(
  File "/env/lib/conda/gritkto/lib/python3.10/site-packages/flash_attn/flash_attn_interface.py", line 90, in _flash_attn_varlen_forward
    out, q, k, v, out_padded, softmax_lse, S_dmask, rng_state = flash_attn_cuda.varlen_fwd(
RuntimeError: FlashAttention only support fp16 and bf16 data type


@Samoed
Copy link
Collaborator

Samoed commented Jan 4, 2025

In future versions of transformers>=4.48.0 (I tested with ModernBERT), the model will break. I've started a discussion about this: https://huggingface.co/openbmb/MiniCPM-Embedding/discussions/9

@Samoed Samoed linked a pull request Jan 4, 2025 that will close this issue
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants