使用adma-mini进行Lora微调报错RuntimeError: shape '[-1, 655360]' is invalid for input of size 40960 #6397

TC10127 · 2024-12-19T14:57:18Z

Reminder

I have read the README and searched the existing issues.

System Info

Reproduction

CUDA_VISIBLE_DEVICES=1 python src/train.py
--stage sft
--do_train True
--model_name_or_path /home/vvv/llm_model/Qwen1.5-32B-Chat-AWQ
--finetuning_type lora
--template qwen
--dataset_dir /home/vvv/LLaMA-Factory-0.9.1/data
--dataset train-explore
--cutoff_len 8192
--learning_rate 5.0e-5
--num_train_epochs 3
--max_samples 100000
--per_device_train_batch_size 2
--gradient_accumulation_steps 4
--lr_scheduler_type cosine
--max_grad_norm 1.0
--logging_steps 2
--save_steps 30
--output_dir /home/vvv/LLaMA-Factory-0.9.1/saves/train-test/phone-num-liger
--quantization_bit 4
--quantization_type fp4
--lora_rank 8
--lora_alpha 8
--lora_dropout 0.1
--lora_target all
--plot_loss True
--overwrite_output_dir True
--overwrite_cache True
--seed 1
--enable_liger_kernel True
--use_adam_mini True \

Expected behavior

使用adam-mini优化器减少训练时间

Others

No response

TC10127 · 2024-12-19T14:58:46Z

github-actions bot added the pending This problem is yet to be addressed label Dec 19, 2024

TC10127 closed this as completed Dec 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

使用adma-mini进行Lora微调报错RuntimeError: shape '[-1, 655360]' is invalid for input of size 40960 #6397

使用adma-mini进行Lora微调报错RuntimeError: shape '[-1, 655360]' is invalid for input of size 40960 #6397

TC10127 commented Dec 19, 2024

TC10127 commented Dec 19, 2024

使用adma-mini进行Lora微调报错RuntimeError: shape '[-1, 655360]' is invalid for input of size 40960 #6397

使用adma-mini进行Lora微调报错RuntimeError: shape '[-1, 655360]' is invalid for input of size 40960 #6397

Comments

TC10127 commented Dec 19, 2024

Reminder

System Info

Reproduction

Expected behavior

Others

TC10127 commented Dec 19, 2024