Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

多个lora接续训练求助! #6342

Closed
1 task done
yhy-2000 opened this issue Dec 16, 2024 · 0 comments
Closed
1 task done

多个lora接续训练求助! #6342

yhy-2000 opened this issue Dec 16, 2024 · 0 comments
Labels
wontfix This will not be worked on

Comments

@yhy-2000
Copy link

yhy-2000 commented Dec 16, 2024

Reminder

  • I have read the README and searched the existing issues.

System Info

作者你好,
我采用sft -> dpo两阶段训练方式,每一阶段的训练都是采用lora,运行命令如下:

sft:

FORCE_TORCHRUN=1 NNODES=$WORLD_SIZE NODE_RANK=$RANK MASTER_ADDR=$MASTER_ADDR MASTER_PORT=$MASTER_PORT TORCH_USE_CUDA_DSA=1 CUDA_LAUNCH_BLOCKING=1 WANDB_MODE=disabled llamafactory-cli train
--model_name_or_path Qwen2-VL-7B-Instruct
--stage sft
--do_train
--finetuning_type lora
--deepspeed examples/deepspeed/ds_z3_config.json
--dataset $dataset
--template qwen2_vl
--cutoff_len 1000000
--max_samples 100000000
--preprocessing_num_workers 128
--output_dir $output_dir
--logging_steps 10
--save_steps 50
--plot_loss
--per_device_train_batch_size 1
--gradient_accumulation_steps 2
--learning_rate 1e-4
--num_train_epochs 3.0
--lr_scheduler_type cosine
--warmup_ratio 0.1
--bf16
--ddp_timeout 180000000
--val_size 0.05
--per_device_eval_batch_size 1
--eval_strategy steps
--eval_steps 10000
--video_maxlen 768
--overwrite_output_dir
--overwrite_cache True \

dpo:

WANDB_MODE=disabled llamafactory-cli train
--model_name_or_path ./LLaMA-Factory/saves/qwen2_vl-7b/mix_sft_72b_bs64/1e-4/full_model
--stage dpo
--do_train true
--finetuning_type lora
--lora_target all
--pref_beta 0.1
--pref_loss sigmoid
--deepspeed examples/deepspeed/ds_z3_config.json
--dataset $dataset
--template qwen2_vl
--cutoff_len 10000000
--max_samples 100000
--preprocessing_num_workers 32
--output_dir $output_dir
--logging_steps 10
--save_steps 20
--plot_loss
--per_device_train_batch_size 1
--gradient_accumulation_steps 16
--learning_rate 5e-6
--num_train_epochs 10
--lr_scheduler_type cosine
--warmup_ratio 0.1
--bf16
--ddp_timeout 180000000
--val_size 0.0001
--per_device_eval_batch_size 1
--eval_strategy steps
--eval_steps 500
--video_maxlen 128
--overwrite_output_dir
--overwrite_cache True

其中,./LLaMA-Factory/saves/qwen2_vl-7b/mix_sft_72b_bs64/1e-4/full_model保存了合并sft lora之后的模型;

但是我打印了一下dpo训练之前和训练之后的模型,发现二者的adapter_config.json也完全一致,workflow.py传入trainer之前的网络结构完全一致,如下:

PeftModelForCausalLM(
(base_model): LoraModel(
(model): Qwen2VLForConditionalGeneration(
(visual): Qwen2VisionTransformerPretrainedModel(
(patch_embed): PatchEmbed(
(proj): Conv3d(3, 1280, kernel_size=(2, 14, 14), stride=(2, 14, 14), bias=False)
)
(rotary_pos_emb): VisionRotaryEmbedding()
(blocks): ModuleList(
(0-31): 32 x Qwen2VLVisionBlock(
(norm1): LayerNorm((1280,), eps=1e-06, elementwise_affine=True)
(norm2): LayerNorm((1280,), eps=1e-06, elementwise_affine=True)
(attn): VisionSdpaAttention(
(qkv): Linear(in_features=1280, out_features=3840, bias=True)
(proj): Linear(in_features=1280, out_features=1280, bias=True)
)
(mlp): VisionMlp(
(fc1): Linear(in_features=1280, out_features=5120, bias=True)
(act): QuickGELUActivation()
(fc2): Linear(in_features=5120, out_features=1280, bias=True)
)
)
)
(merger): PatchMerger(
(ln_q): LayerNorm((1280,), eps=1e-06, elementwise_affine=True)
(mlp): Sequential(
(0): Linear(in_features=5120, out_features=5120, bias=True)
(1): GELU(approximate='none')
(2): Linear(in_features=5120, out_features=3584, bias=True)
)
)
)
(model): Qwen2VLModel(
(embed_tokens): Embedding(152064, 3584)
(layers): ModuleList(
(0-27): 28 x Qwen2VLDecoderLayer(
(self_attn): Qwen2VLSdpaAttention(
(q_proj): lora.Linear(
(base_layer): Linear(in_features=3584, out_features=3584, bias=True)
(lora_dropout): ModuleDict(
(default): Identity()
)
(lora_A): ModuleDict(
(default): Linear(in_features=3584, out_features=8, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=8, out_features=3584, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
(lora_magnitude_vector): ModuleDict()
)
(k_proj): lora.Linear(
(base_layer): Linear(in_features=3584, out_features=512, bias=True)
(lora_dropout): ModuleDict(
(default): Identity()
)
(lora_A): ModuleDict(
(default): Linear(in_features=3584, out_features=8, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=8, out_features=512, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
(lora_magnitude_vector): ModuleDict()
)
(v_proj): lora.Linear(
(base_layer): Linear(in_features=3584, out_features=512, bias=True)
(lora_dropout): ModuleDict(
(default): Identity()
)
(lora_A): ModuleDict(
(default): Linear(in_features=3584, out_features=8, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=8, out_features=512, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
(lora_magnitude_vector): ModuleDict()
)
(o_proj): lora.Linear(
(base_layer): Linear(in_features=3584, out_features=3584, bias=False)
(lora_dropout): ModuleDict(
(default): Identity()
)
(lora_A): ModuleDict(
(default): Linear(in_features=3584, out_features=8, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=8, out_features=3584, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
(lora_magnitude_vector): ModuleDict()
)
(rotary_emb): Qwen2VLRotaryEmbedding()
)
(mlp): Qwen2MLP(
(gate_proj): lora.Linear(
(base_layer): Linear(in_features=3584, out_features=18944, bias=False)
(lora_dropout): ModuleDict(
(default): Identity()
)
(lora_A): ModuleDict(
(default): Linear(in_features=3584, out_features=8, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=8, out_features=18944, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
(lora_magnitude_vector): ModuleDict()
)
(up_proj): lora.Linear(
(base_layer): Linear(in_features=3584, out_features=18944, bias=False)
(lora_dropout): ModuleDict(
(default): Identity()
)
(lora_A): ModuleDict(
(default): Linear(in_features=3584, out_features=8, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=8, out_features=18944, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
(lora_magnitude_vector): ModuleDict()
)
(down_proj): lora.Linear(
(base_layer): Linear(in_features=18944, out_features=3584, bias=False)
(lora_dropout): ModuleDict(
(default): Identity()
)
(lora_A): ModuleDict(
(default): Linear(in_features=18944, out_features=8, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=8, out_features=3584, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
(lora_magnitude_vector): ModuleDict()
)
(act_fn): SiLU()
)
(input_layernorm): Qwen2RMSNorm((0,), eps=1e-06)
(post_attention_layernorm): Qwen2RMSNorm((0,), eps=1e-06)
)
)
(norm): Qwen2RMSNorm((0,), eps=1e-06)
(rotary_emb): Qwen2VLRotaryEmbedding()
)
(lm_head): Linear(in_features=3584, out_features=152064, bias=False)
)
)
)

现在这个网络看起来只是dpo lora从sft lora初始化,而不是在此基础上合并一个新的lora;如果我希望dpo训练完应该合并一个新的lora,请问应该如何正确操作呢?

Reproduction

Expected behavior

No response

Others

No response

@github-actions github-actions bot added the pending This problem is yet to be addressed label Dec 16, 2024
Repository owner locked and limited conversation to collaborators Dec 17, 2024
@hiyouga hiyouga converted this issue into discussion #6361 Dec 17, 2024
@hiyouga hiyouga added wontfix This will not be worked on and removed pending This problem is yet to be addressed labels Dec 17, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

2 participants