Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: 【CANN】ggml-cann/aclnn_ops.cpp:3007: GGML_ASSERT(n_dims == src0->ne[0]) failed #10451

Closed
zyp2 opened this issue Nov 22, 2024 · 8 comments
Labels
Ascend NPU issues specific to Ascend NPUs

Comments

@zyp2
Copy link

zyp2 commented Nov 22, 2024

What happened?

按照readme中的固件驱动版本,推理时出现报错

Name and Version

最新版本

What operating system are you seeing the problem on?

No response

Relevant log output

llama_new_context_with_model: freq_scale    = 1
llama_new_context_with_model: n_ctx_per_seq (4096) < n_ctx_train (131072) -- the full capacity of the model will not be utilized
llama_kv_cache_init:      CANN0 KV buffer size =   132.00 MiB
llama_kv_cache_init:        CPU KV buffer size =    28.00 MiB
llama_new_context_with_model: KV self size  =  160.00 MiB, K (f16):   80.00 MiB, V (f16):   80.00 MiB
llama_new_context_with_model:        CPU  output buffer size =     0.58 MiB
llama_new_context_with_model:      CANN0 compute buffer size =  1488.00 MiB
llama_new_context_with_model:  CANN_Host compute buffer size =    16.01 MiB
llama_new_context_with_model: graph nodes  = 1606
llama_new_context_with_model: graph splits = 67 (with bs=512), 3 (with bs=1)
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
/docker_files/zyp/202411/llama.cpp-master/ggml/src/ggml-cann/aclnn_ops.cpp:3007: GGML_ASSERT(n_dims == src0->ne[0]) failed
Aborted (core dumped)
@zyp2 zyp2 added bug-unconfirmed critical severity Used to report critical severity bugs in llama.cpp (e.g. Crashing, Corrupted, Dataloss) labels Nov 22, 2024
@zyp2
Copy link
Author

zyp2 commented Nov 22, 2024

910B server

@MengqingCao
Copy link
Contributor

Could you provide the command to reproduce the problem? Including the model and other info.

@zyp2
Copy link
Author

zyp2 commented Nov 22, 2024

model_name:
glm-4-9b-chat
Weight conversion command:
python convert_hf_to_gguf.py --outfile /docker_files/zyp/202411/ggufs/glm4.gguf /docker_files/zyp/weight/glm-4-9b-chat
start inference command:
./build/bin/llama-cli -m /docker_files/zyp/202411/ggufs/glm4.gguf -p "请推荐一部电视剧:" -n 400 -e -ngl 33 -sm none -mg 0

@hipudding hipudding added Ascend NPU issues specific to Ascend NPUs and removed bug-unconfirmed critical severity Used to report critical severity bugs in llama.cpp (e.g. Crashing, Corrupted, Dataloss) labels Nov 25, 2024
@hipudding
Copy link
Collaborator

Not all model is support by Ascend NPU. Please check this table,This assert is that operator RoPE not support all kinds of input shapes, we are still working on it now.

@slaren
Copy link
Collaborator

slaren commented Nov 25, 2024

The crash could be avoided by reporting that this operation is not supported in the supports_op function of the backend.

@hipudding
Copy link
Collaborator

The crash could be avoided by reporting that this operation is not supported in the supports_op function of the backend.

Yes, We will add this to supports_op function.

@zyp2
Copy link
Author

zyp2 commented Nov 25, 2024 via email

@hipudding
Copy link
Collaborator

@zyp2 This bug has fixed. The performance is low because RoPE will fallback to CPU.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ascend NPU issues specific to Ascend NPUs
Projects
None yet
Development

No branches or pull requests

4 participants