Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SFT使用sequence packing后的数据训练报错 #413

Open
Eisenhower opened this issue Jan 5, 2025 · 0 comments
Open

SFT使用sequence packing后的数据训练报错 #413

Eisenhower opened this issue Jan 5, 2025 · 0 comments

Comments

@Eisenhower
Copy link

micro-batch-size为1时正常,大于1时报错,具体环境信息如下

**模型:**Qwen2.5-7B
训练脚本:
cd /home/kas/kas_workspace/temp/liuxd/Pai-Megatron-Patch-241113/examples/qwen2_5
export MP_SFT_PACKING="true"
sh run_mcore_qwen.sh
dsw
7B
2
512
1e-5
1e-6
4096
4096
bf16
2
1
1
true
true
true
true
false
false
100
/home/kas/kas_workspace/temp/liuxd/qwen-datasets/mmap_qwen2_sft_single_packing_datasets_text_document
/home/kas/kas_workspace/temp/liuxd/qwen-datasets/mmap_qwen2_sft_single_packing_datasets_text_document
/home/kas/kas_workspace/model/Qwen2.5/Qwen2.5-7B-hf-to-mcore-te-tp2-pp1
10000
100
/home/kas/kas_workspace/temp/liuxd/Pai-Megatron-Patch-241113/logs/qwen2.5_packing_finetune_$(date +'%Y%m%d_%H%M').log

报错信息:
[rank1]: Traceback (most recent call last):
[rank1]: File "/home/kas/kas_workspace/temp/liuxd/Pai-Megatron-Patch-241113/examples/qwen2_5/../qwen2/pretrain_qwen.py", line 290, in
[rank1]: pretrain(
[rank1]: File "/home/kas/kas_workspace/temp/liuxd/Pai-Megatron-Patch-241113/PAI-Megatron-LM-240718/megatron/training/training.py", line 326, in pretrain
[rank1]: iteration, num_floating_point_operations_so_far = train(
[rank1]: File "/home/kas/kas_workspace/temp/liuxd/Pai-Megatron-Patch-241113/PAI-Megatron-LM-240718/megatron/training/training.py", line 1247, in train
[rank1]: loss_dict, skipped_iter, grad_norm, num_zeros_in_grad = train_step(
[rank1]: File "/home/kas/kas_workspace/temp/liuxd/Pai-Megatron-Patch-241113/PAI-Megatron-LM-240718/megatron/training/training.py", line 688, in train_step
[rank1]: losses_reduced = forward_backward_func(
[rank1]: File "/home/kas/kas_workspace/temp/liuxd/Pai-Megatron-Patch-241113/PAI-Megatron-LM-240718/megatron/core/pipeline_parallel/schedules.py", line 395, in forward_backward_no_pipelining
[rank1]: output_tensor, num_tokens = forward_step(
[rank1]: File "/home/kas/kas_workspace/temp/liuxd/Pai-Megatron-Patch-241113/PAI-Megatron-LM-240718/megatron/core/pipeline_parallel/schedules.py", line 219, in forward_step
[rank1]: output_tensor, loss_func = forward_step_func(data_iterator, model)
[rank1]: File "/home/kas/kas_workspace/temp/liuxd/Pai-Megatron-Patch-241113/examples/qwen2_5/../qwen2/pretrain_qwen.py", line 219, in forward_step
[rank1]: output_tensor = model(tokens, position_ids, attention_mask, labels=labels, packed_seq_params=packed_seq_params)
[rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1552, in _wrapped_call_impl
[rank1]: return self._call_impl(*args, **kwargs)
[rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1602, in _call_impl
[rank1]: result = forward_call(*args, **kwargs)
[rank1]: File "/home/kas/kas_workspace/temp/liuxd/Pai-Megatron-Patch-241113/PAI-Megatron-LM-240718/megatron/core/distributed/distributed_data_parallel.py", line 204, in forward
[rank1]: return self.module(*inputs, **kwargs)
[rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1552, in _wrapped_call_impl
[rank1]: return self._call_impl(*args, **kwargs)
[rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1602, in _call_impl
[rank1]: result = forward_call(*args, **kwargs)
[rank1]: File "/home/kas/kas_workspace/temp/liuxd/Pai-Megatron-Patch-241113/PAI-Megatron-LM-240718/megatron/legacy/model/module.py", line 189, in forward
[rank1]: outputs = self.module(*inputs, **kwargs)
[rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1552, in _wrapped_call_impl
[rank1]: return self._call_impl(*args, **kwargs)
[rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1602, in _call_impl
[rank1]: result = forward_call(*args, **kwargs)
[rank1]: File "/home/kas/kas_workspace/temp/liuxd/Pai-Megatron-Patch-241113/megatron_patch/model/qwen2/model.py", line 203, in forward
[rank1]: hidden_states = self.decoder(
[rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1552, in _wrapped_call_impl
[rank1]: return self._call_impl(*args, **kwargs)
[rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1602, in _call_impl
[rank1]: result = forward_call(*args, **kwargs)
[rank1]: File "/home/kas/kas_workspace/temp/liuxd/Pai-Megatron-Patch-241113/megatron_patch/model/qwen2/transformer_block.py", line 402, in forward
[rank1]: hidden_states, context = layer(
[rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1552, in _wrapped_call_impl
[rank1]: return self._call_impl(*args, **kwargs)
[rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1602, in _call_impl
[rank1]: result = forward_call(*args, **kwargs)
[rank1]: File "/home/kas/kas_workspace/temp/liuxd/Pai-Megatron-Patch-241113/megatron_patch/model/qwen2/transformer_layer.py", line 188, in forward
[rank1]: attention_output_with_bias = self.self_attention(
[rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1552, in _wrapped_call_impl
[rank1]: return self._call_impl(*args, **kwargs)
[rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1602, in _call_impl
[rank1]: result = forward_call(*args, **kwargs)
[rank1]: File "/home/kas/kas_workspace/temp/liuxd/Pai-Megatron-Patch-241113/megatron_patch/model/qwen2/transformer/attention.py", line 294, in forward
[rank1]: query = apply_rotary_pos_emb(
[rank1]: File "/home/kas/kas_workspace/temp/liuxd/Pai-Megatron-Patch-241113/PAI-Megatron-LM-240718/megatron/core/models/common/embeddings/rotary_pos_embedding.py", line 244, in apply_rotary_pos_emb
[rank1]: return fused_apply_rotary_pos_emb_thd(t, cu_seqlens, freqs)
[rank1]: File "/usr/local/lib/python3.10/dist-packages/apex/transformer/functional/fused_rope.py", line 211, in fused_apply_rotary_pos_emb_thd
[rank1]: return FusedRoPETHDFunc.apply(t, cu_seqlens, freqs)
[rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 573, in apply
[rank1]: return super().apply(*args, **kwargs) # type: ignore[misc]
[rank1]: File "/usr/local/lib/python3.10/dist-packages/apex/transformer/functional/fused_rope.py", line 170, in forward
[rank1]: output = fused_rotary_positional_embedding.forward_thd(
[rank1]: RuntimeError: expected 3D tensor

@Eisenhower Eisenhower changed the title SFT使用packing数据训练报错 SFT使用sequence packing后的数据训练报错 Jan 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant