Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

使用adma-mini进行Lora微调报错RuntimeError: shape '[-1, 655360]' is invalid for input of size 40960 #6397

Closed
1 task done
TC10127 opened this issue Dec 19, 2024 · 1 comment
Labels
pending This problem is yet to be addressed

Comments

@TC10127
Copy link

TC10127 commented Dec 19, 2024

Reminder

  • I have read the README and searched the existing issues.

System Info

image

Reproduction

CUDA_VISIBLE_DEVICES=1 python src/train.py
--stage sft
--do_train True
--model_name_or_path /home/vvv/llm_model/Qwen1.5-32B-Chat-AWQ
--finetuning_type lora
--template qwen
--dataset_dir /home/vvv/LLaMA-Factory-0.9.1/data
--dataset train-explore
--cutoff_len 8192
--learning_rate 5.0e-5
--num_train_epochs 3
--max_samples 100000
--per_device_train_batch_size 2
--gradient_accumulation_steps 4
--lr_scheduler_type cosine
--max_grad_norm 1.0
--logging_steps 2
--save_steps 30
--output_dir /home/vvv/LLaMA-Factory-0.9.1/saves/train-test/phone-num-liger
--quantization_bit 4
--quantization_type fp4
--lora_rank 8
--lora_alpha 8
--lora_dropout 0.1
--lora_target all
--plot_loss True
--overwrite_output_dir True
--overwrite_cache True
--seed 1
--enable_liger_kernel True
--use_adam_mini True \

Expected behavior

使用adam-mini优化器减少训练时间

Others

No response

@github-actions github-actions bot added the pending This problem is yet to be addressed label Dec 19, 2024
@TC10127
Copy link
Author

TC10127 commented Dec 19, 2024

image

@TC10127 TC10127 closed this as completed Dec 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pending This problem is yet to be addressed
Projects
None yet
Development

No branches or pull requests

1 participant