We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
micro-batch-size为1时正常,大于1时报错,具体环境信息如下
**模型:**Qwen2.5-7B 训练脚本: cd /home/kas/kas_workspace/temp/liuxd/Pai-Megatron-Patch-241113/examples/qwen2_5 export MP_SFT_PACKING="true" sh run_mcore_qwen.sh dsw 7B 2 512 1e-5 1e-6 4096 4096 bf16 2 1 1 true true true true false false 100 /home/kas/kas_workspace/temp/liuxd/qwen-datasets/mmap_qwen2_sft_single_packing_datasets_text_document /home/kas/kas_workspace/temp/liuxd/qwen-datasets/mmap_qwen2_sft_single_packing_datasets_text_document /home/kas/kas_workspace/model/Qwen2.5/Qwen2.5-7B-hf-to-mcore-te-tp2-pp1 10000 100 /home/kas/kas_workspace/temp/liuxd/Pai-Megatron-Patch-241113/logs/qwen2.5_packing_finetune_$(date +'%Y%m%d_%H%M').log
报错信息: [rank1]: Traceback (most recent call last): [rank1]: File "/home/kas/kas_workspace/temp/liuxd/Pai-Megatron-Patch-241113/examples/qwen2_5/../qwen2/pretrain_qwen.py", line 290, in [rank1]: pretrain( [rank1]: File "/home/kas/kas_workspace/temp/liuxd/Pai-Megatron-Patch-241113/PAI-Megatron-LM-240718/megatron/training/training.py", line 326, in pretrain [rank1]: iteration, num_floating_point_operations_so_far = train( [rank1]: File "/home/kas/kas_workspace/temp/liuxd/Pai-Megatron-Patch-241113/PAI-Megatron-LM-240718/megatron/training/training.py", line 1247, in train [rank1]: loss_dict, skipped_iter, grad_norm, num_zeros_in_grad = train_step( [rank1]: File "/home/kas/kas_workspace/temp/liuxd/Pai-Megatron-Patch-241113/PAI-Megatron-LM-240718/megatron/training/training.py", line 688, in train_step [rank1]: losses_reduced = forward_backward_func( [rank1]: File "/home/kas/kas_workspace/temp/liuxd/Pai-Megatron-Patch-241113/PAI-Megatron-LM-240718/megatron/core/pipeline_parallel/schedules.py", line 395, in forward_backward_no_pipelining [rank1]: output_tensor, num_tokens = forward_step( [rank1]: File "/home/kas/kas_workspace/temp/liuxd/Pai-Megatron-Patch-241113/PAI-Megatron-LM-240718/megatron/core/pipeline_parallel/schedules.py", line 219, in forward_step [rank1]: output_tensor, loss_func = forward_step_func(data_iterator, model) [rank1]: File "/home/kas/kas_workspace/temp/liuxd/Pai-Megatron-Patch-241113/examples/qwen2_5/../qwen2/pretrain_qwen.py", line 219, in forward_step [rank1]: output_tensor = model(tokens, position_ids, attention_mask, labels=labels, packed_seq_params=packed_seq_params) [rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1552, in _wrapped_call_impl [rank1]: return self._call_impl(*args, **kwargs) [rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1602, in _call_impl [rank1]: result = forward_call(*args, **kwargs) [rank1]: File "/home/kas/kas_workspace/temp/liuxd/Pai-Megatron-Patch-241113/PAI-Megatron-LM-240718/megatron/core/distributed/distributed_data_parallel.py", line 204, in forward [rank1]: return self.module(*inputs, **kwargs) [rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1552, in _wrapped_call_impl [rank1]: return self._call_impl(*args, **kwargs) [rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1602, in _call_impl [rank1]: result = forward_call(*args, **kwargs) [rank1]: File "/home/kas/kas_workspace/temp/liuxd/Pai-Megatron-Patch-241113/PAI-Megatron-LM-240718/megatron/legacy/model/module.py", line 189, in forward [rank1]: outputs = self.module(*inputs, **kwargs) [rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1552, in _wrapped_call_impl [rank1]: return self._call_impl(*args, **kwargs) [rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1602, in _call_impl [rank1]: result = forward_call(*args, **kwargs) [rank1]: File "/home/kas/kas_workspace/temp/liuxd/Pai-Megatron-Patch-241113/megatron_patch/model/qwen2/model.py", line 203, in forward [rank1]: hidden_states = self.decoder( [rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1552, in _wrapped_call_impl [rank1]: return self._call_impl(*args, **kwargs) [rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1602, in _call_impl [rank1]: result = forward_call(*args, **kwargs) [rank1]: File "/home/kas/kas_workspace/temp/liuxd/Pai-Megatron-Patch-241113/megatron_patch/model/qwen2/transformer_block.py", line 402, in forward [rank1]: hidden_states, context = layer( [rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1552, in _wrapped_call_impl [rank1]: return self._call_impl(*args, **kwargs) [rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1602, in _call_impl [rank1]: result = forward_call(*args, **kwargs) [rank1]: File "/home/kas/kas_workspace/temp/liuxd/Pai-Megatron-Patch-241113/megatron_patch/model/qwen2/transformer_layer.py", line 188, in forward [rank1]: attention_output_with_bias = self.self_attention( [rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1552, in _wrapped_call_impl [rank1]: return self._call_impl(*args, **kwargs) [rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1602, in _call_impl [rank1]: result = forward_call(*args, **kwargs) [rank1]: File "/home/kas/kas_workspace/temp/liuxd/Pai-Megatron-Patch-241113/megatron_patch/model/qwen2/transformer/attention.py", line 294, in forward [rank1]: query = apply_rotary_pos_emb( [rank1]: File "/home/kas/kas_workspace/temp/liuxd/Pai-Megatron-Patch-241113/PAI-Megatron-LM-240718/megatron/core/models/common/embeddings/rotary_pos_embedding.py", line 244, in apply_rotary_pos_emb [rank1]: return fused_apply_rotary_pos_emb_thd(t, cu_seqlens, freqs) [rank1]: File "/usr/local/lib/python3.10/dist-packages/apex/transformer/functional/fused_rope.py", line 211, in fused_apply_rotary_pos_emb_thd [rank1]: return FusedRoPETHDFunc.apply(t, cu_seqlens, freqs) [rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 573, in apply [rank1]: return super().apply(*args, **kwargs) # type: ignore[misc] [rank1]: File "/usr/local/lib/python3.10/dist-packages/apex/transformer/functional/fused_rope.py", line 170, in forward [rank1]: output = fused_rotary_positional_embedding.forward_thd( [rank1]: RuntimeError: expected 3D tensor
The text was updated successfully, but these errors were encountered:
No branches or pull requests
micro-batch-size为1时正常,大于1时报错,具体环境信息如下
**模型:**Qwen2.5-7B
训练脚本:
cd /home/kas/kas_workspace/temp/liuxd/Pai-Megatron-Patch-241113/examples/qwen2_5
export MP_SFT_PACKING="true"
sh run_mcore_qwen.sh
dsw
7B
2
512
1e-5
1e-6
4096
4096
bf16
2
1
1
true
true
true
true
false
false
100
/home/kas/kas_workspace/temp/liuxd/qwen-datasets/mmap_qwen2_sft_single_packing_datasets_text_document
/home/kas/kas_workspace/temp/liuxd/qwen-datasets/mmap_qwen2_sft_single_packing_datasets_text_document
/home/kas/kas_workspace/model/Qwen2.5/Qwen2.5-7B-hf-to-mcore-te-tp2-pp1
10000
100
/home/kas/kas_workspace/temp/liuxd/Pai-Megatron-Patch-241113/logs/qwen2.5_packing_finetune_$(date +'%Y%m%d_%H%M').log
报错信息:
[rank1]: Traceback (most recent call last):
[rank1]: File "/home/kas/kas_workspace/temp/liuxd/Pai-Megatron-Patch-241113/examples/qwen2_5/../qwen2/pretrain_qwen.py", line 290, in
[rank1]: pretrain(
[rank1]: File "/home/kas/kas_workspace/temp/liuxd/Pai-Megatron-Patch-241113/PAI-Megatron-LM-240718/megatron/training/training.py", line 326, in pretrain
[rank1]: iteration, num_floating_point_operations_so_far = train(
[rank1]: File "/home/kas/kas_workspace/temp/liuxd/Pai-Megatron-Patch-241113/PAI-Megatron-LM-240718/megatron/training/training.py", line 1247, in train
[rank1]: loss_dict, skipped_iter, grad_norm, num_zeros_in_grad = train_step(
[rank1]: File "/home/kas/kas_workspace/temp/liuxd/Pai-Megatron-Patch-241113/PAI-Megatron-LM-240718/megatron/training/training.py", line 688, in train_step
[rank1]: losses_reduced = forward_backward_func(
[rank1]: File "/home/kas/kas_workspace/temp/liuxd/Pai-Megatron-Patch-241113/PAI-Megatron-LM-240718/megatron/core/pipeline_parallel/schedules.py", line 395, in forward_backward_no_pipelining
[rank1]: output_tensor, num_tokens = forward_step(
[rank1]: File "/home/kas/kas_workspace/temp/liuxd/Pai-Megatron-Patch-241113/PAI-Megatron-LM-240718/megatron/core/pipeline_parallel/schedules.py", line 219, in forward_step
[rank1]: output_tensor, loss_func = forward_step_func(data_iterator, model)
[rank1]: File "/home/kas/kas_workspace/temp/liuxd/Pai-Megatron-Patch-241113/examples/qwen2_5/../qwen2/pretrain_qwen.py", line 219, in forward_step
[rank1]: output_tensor = model(tokens, position_ids, attention_mask, labels=labels, packed_seq_params=packed_seq_params)
[rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1552, in _wrapped_call_impl
[rank1]: return self._call_impl(*args, **kwargs)
[rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1602, in _call_impl
[rank1]: result = forward_call(*args, **kwargs)
[rank1]: File "/home/kas/kas_workspace/temp/liuxd/Pai-Megatron-Patch-241113/PAI-Megatron-LM-240718/megatron/core/distributed/distributed_data_parallel.py", line 204, in forward
[rank1]: return self.module(*inputs, **kwargs)
[rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1552, in _wrapped_call_impl
[rank1]: return self._call_impl(*args, **kwargs)
[rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1602, in _call_impl
[rank1]: result = forward_call(*args, **kwargs)
[rank1]: File "/home/kas/kas_workspace/temp/liuxd/Pai-Megatron-Patch-241113/PAI-Megatron-LM-240718/megatron/legacy/model/module.py", line 189, in forward
[rank1]: outputs = self.module(*inputs, **kwargs)
[rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1552, in _wrapped_call_impl
[rank1]: return self._call_impl(*args, **kwargs)
[rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1602, in _call_impl
[rank1]: result = forward_call(*args, **kwargs)
[rank1]: File "/home/kas/kas_workspace/temp/liuxd/Pai-Megatron-Patch-241113/megatron_patch/model/qwen2/model.py", line 203, in forward
[rank1]: hidden_states = self.decoder(
[rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1552, in _wrapped_call_impl
[rank1]: return self._call_impl(*args, **kwargs)
[rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1602, in _call_impl
[rank1]: result = forward_call(*args, **kwargs)
[rank1]: File "/home/kas/kas_workspace/temp/liuxd/Pai-Megatron-Patch-241113/megatron_patch/model/qwen2/transformer_block.py", line 402, in forward
[rank1]: hidden_states, context = layer(
[rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1552, in _wrapped_call_impl
[rank1]: return self._call_impl(*args, **kwargs)
[rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1602, in _call_impl
[rank1]: result = forward_call(*args, **kwargs)
[rank1]: File "/home/kas/kas_workspace/temp/liuxd/Pai-Megatron-Patch-241113/megatron_patch/model/qwen2/transformer_layer.py", line 188, in forward
[rank1]: attention_output_with_bias = self.self_attention(
[rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1552, in _wrapped_call_impl
[rank1]: return self._call_impl(*args, **kwargs)
[rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1602, in _call_impl
[rank1]: result = forward_call(*args, **kwargs)
[rank1]: File "/home/kas/kas_workspace/temp/liuxd/Pai-Megatron-Patch-241113/megatron_patch/model/qwen2/transformer/attention.py", line 294, in forward
[rank1]: query = apply_rotary_pos_emb(
[rank1]: File "/home/kas/kas_workspace/temp/liuxd/Pai-Megatron-Patch-241113/PAI-Megatron-LM-240718/megatron/core/models/common/embeddings/rotary_pos_embedding.py", line 244, in apply_rotary_pos_emb
[rank1]: return fused_apply_rotary_pos_emb_thd(t, cu_seqlens, freqs)
[rank1]: File "/usr/local/lib/python3.10/dist-packages/apex/transformer/functional/fused_rope.py", line 211, in fused_apply_rotary_pos_emb_thd
[rank1]: return FusedRoPETHDFunc.apply(t, cu_seqlens, freqs)
[rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 573, in apply
[rank1]: return super().apply(*args, **kwargs) # type: ignore[misc]
[rank1]: File "/usr/local/lib/python3.10/dist-packages/apex/transformer/functional/fused_rope.py", line 170, in forward
[rank1]: output = fused_rotary_positional_embedding.forward_thd(
[rank1]: RuntimeError: expected 3D tensor
The text was updated successfully, but these errors were encountered: