You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for the great project! When I run examples/deepseek_v2/run_mcore_deepseek.sh I got an error as below:
Traceback (most recent call last):
File "/mnt/task_runtime/examples/deepseek_v2/pretrain_deepseek.py", line 37, in <module>
from megatron_patch.model.deepseek_v2.layer_specs import (
File "/mnt/task_runtime/megatron_patch/model/deepseek_v2/layer_specs.py", line 19, in <module>
from megatron.core.transformer.custom_layers.transformer_engine import (
ImportError: cannot import name 'TEDotProductAttentionMLA' from 'megatron.core.transformer.custom_layers.transformer_engine' (/mnt/task_runtime/PAI-Megatron-LM-240718/megatron/core/transformer/custom_layers/transformer_engine.py)
Hi, have you solved the problem? I 'm trying to use TEDotProductAttentionMLA, too, and I found the difference between it and its original TEDotProductAttention is only the definition of kv_channels. So I just manually change the kv_channels and keep using the TEDotProductAttention. I'm not sure if this's right.
Hi we've solved the issue. You can just update the git submodule to the latest version
Hi, I've tested the latest TEDotProductAttentionMLA but I found the training speed has dropped a bit (from 5.9 tokens/s to 4.4 on 8*8B model). Would this be normal?
I used torch.profiler and I found the mainly difference of training time between these two versions is caused by this funtion : void transformer_engine::scaled_aligned_causal_masked_softmax_warp_forward<__nv_bfloat16, __nv_bfloat16, float, 13>(__nv_bfloat16*, __nv_bfloat16 const*, float, int, int, int)
Thank you for the great project! When I run
examples/deepseek_v2/run_mcore_deepseek.sh
I got an error as below:It appears that in this link the code is attempting to import 'TEDotProductAttentionMLA', but when I checked the
megatron.core.transformer.custom_layers.transformer_engine
file, I did not find 'TEDotProductAttentionMLA'.Any help appreciated!
The text was updated successfully, but these errors were encountered: