cannot import name 'TEDotProductAttentionMLA' when running `examples/deepseek_v2/run_mcore_deepseek.sh` #359

dreasysnail · 2024-10-04T22:34:13Z

Thank you for the great project! When I run examples/deepseek_v2/run_mcore_deepseek.sh I got an error as below:

Traceback (most recent call last):
  File "/mnt/task_runtime/examples/deepseek_v2/pretrain_deepseek.py", line 37, in <module>
    from megatron_patch.model.deepseek_v2.layer_specs import (
  File "/mnt/task_runtime/megatron_patch/model/deepseek_v2/layer_specs.py", line 19, in <module>
    from megatron.core.transformer.custom_layers.transformer_engine import (
ImportError: cannot import name 'TEDotProductAttentionMLA' from 'megatron.core.transformer.custom_layers.transformer_engine' (/mnt/task_runtime/PAI-Megatron-LM-240718/megatron/core/transformer/custom_layers/transformer_engine.py)

It appears that in this link the code is attempting to import 'TEDotProductAttentionMLA', but when I checked the megatron.core.transformer.custom_layers.transformer_engine file, I did not find 'TEDotProductAttentionMLA'.

Any help appreciated!

The text was updated successfully, but these errors were encountered:

dreasysnail · 2024-10-04T22:34:53Z

@Jiayi-Pan

NiuMa-1234 · 2024-10-28T09:35:36Z

Hi, have you solved the problem? I 'm trying to use TEDotProductAttentionMLA, too, and I found the difference between it and its original TEDotProductAttention is only the definition of kv_channels. So I just manually change the kv_channels and keep using the TEDotProductAttention. I'm not sure if this's right.

Jiayi-Pan · 2024-10-28T16:16:23Z

Hi we've solved the issue. You can just update the git submodule to the latest version

NiuMa-1234 · 2024-10-29T01:20:59Z

Hi we've solved the issue. You can just update the git submodule to the latest version

Hi, I've tested the latest TEDotProductAttentionMLA but I found the training speed has dropped a bit (from 5.9 tokens/s to 4.4 on 8*8B model). Would this be normal?

I used torch.profiler and I found the mainly difference of training time between these two versions is caused by this funtion : void transformer_engine::scaled_aligned_causal_masked_softmax_warp_forward<__nv_bfloat16, __nv_bfloat16, float, 13>(__nv_bfloat16*, __nv_bfloat16 const*, float, int, int, int)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cannot import name 'TEDotProductAttentionMLA' when running `examples/deepseek_v2/run_mcore_deepseek.sh` #359

cannot import name 'TEDotProductAttentionMLA' when running `examples/deepseek_v2/run_mcore_deepseek.sh` #359

dreasysnail commented Oct 4, 2024

dreasysnail commented Oct 4, 2024

NiuMa-1234 commented Oct 28, 2024

Jiayi-Pan commented Oct 28, 2024

NiuMa-1234 commented Oct 29, 2024 •

edited

Loading

cannot import name 'TEDotProductAttentionMLA' when running examples/deepseek_v2/run_mcore_deepseek.sh #359

cannot import name 'TEDotProductAttentionMLA' when running examples/deepseek_v2/run_mcore_deepseek.sh #359

Comments

dreasysnail commented Oct 4, 2024

dreasysnail commented Oct 4, 2024

NiuMa-1234 commented Oct 28, 2024

Jiayi-Pan commented Oct 28, 2024

NiuMa-1234 commented Oct 29, 2024 • edited Loading

cannot import name 'TEDotProductAttentionMLA' when running `examples/deepseek_v2/run_mcore_deepseek.sh` #359

cannot import name 'TEDotProductAttentionMLA' when running `examples/deepseek_v2/run_mcore_deepseek.sh` #359

NiuMa-1234 commented Oct 29, 2024 •

edited

Loading