Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: none of output has requires_grad=True, this checkpoint() is not necessary #6358

Closed
1 task done
TimeFlysLeo opened this issue Dec 17, 2024 · 6 comments · Fixed by #6364
Closed
1 task done
Labels
solved This problem has been already solved

Comments

@TimeFlysLeo
Copy link

Reminder

  • I have read the README and searched the existing issues.

System Info

  • llamafactory version: 0.9.2.dev0
  • Platform: Windows-10-10.0.22631-SP0
  • Python version: 3.11.11
  • PyTorch version: 2.5.1+cu121 (GPU)
  • Transformers version: 4.46.1
  • Datasets version: 3.1.0
  • Accelerate version: 1.0.1
  • PEFT version: 0.12.0
  • TRL version: 0.9.6
  • GPU type: NVIDIA GeForce RTX 3090

Reproduction

llamafactory-cli train .\examples\train_full\llama3_full_sft.yaml
:\Users\W10\miniconda3\envs\Qwen2-VL-Finetune\Lib\site-packages\torch\utils\checkpoint.py:87: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
warnings.warn(
[WARNING|logging.py:168] 2024-12-17 17:52:54,102 >> use_cache=True is incompatible with gradient checkpointing. Setting use_cache=False.
Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in run_code
File "C:\Users\W10\miniconda3\envs\Qwen2-VL-Finetune\Scripts\llamafactory-cli.exe_main
.py", line 7, in
File "D:\LLM_Privacy\LLaMA-Factory\src\llamafactory\cli.py", line 111, in main
run_exp()
File "D:\LLM_Privacy\LLaMA-Factory\src\llamafactory\train\tuner.py", line 50, in run_exp
run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
File "D:\LLM_Privacy\LLaMA-Factory\src\llamafactory\train\sft\workflow.py", line 159, in run_sft
train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\W10\miniconda3\envs\Qwen2-VL-Finetune\Lib\site-packages\transformers\trainer.py", line 2122, in train
return inner_training_loop(
^^^^^^^^^^^^^^^^^^^^
File "C:\Users\W10\miniconda3\envs\Qwen2-VL-Finetune\Lib\site-packages\transformers\trainer.py", line 2474, in inner_training_loop
tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\W10\miniconda3\envs\Qwen2-VL-Finetune\Lib\site-packages\transformers\trainer.py", line 3606, in training_step
self.accelerator.backward(loss, **kwargs)
File "C:\Users\W10\miniconda3\envs\Qwen2-VL-Finetune\Lib\site-packages\accelerate\accelerator.py", line 2246, in backward
loss.backward(**kwargs)
File "C:\Users\W10\miniconda3\envs\Qwen2-VL-Finetune\Lib\site-packages\torch_tensor.py", line 581, in backward
torch.autograd.backward(
File "C:\Users\W10\miniconda3\envs\Qwen2-VL-Finetune\Lib\site-packages\torch\autograd_init
.py", line 347, in backward
_engine_run_backward(
File "C:\Users\W10\miniconda3\envs\Qwen2-VL-Finetune\Lib\site-packages\torch\autograd\graph.py", line 825, in _engine_run_backward
return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\W10\miniconda3\envs\Qwen2-VL-Finetune\Lib\site-packages\torch\autograd\function.py", line 307, in apply
return user_fn(self, *args)
^^^^^^^^^^^^^^^^^^^^
File "C:\Users\W10\miniconda3\envs\Qwen2-VL-Finetune\Lib\site-packages\torch\utils\checkpoint.py", line 317, in backward
raise RuntimeError(
RuntimeError: none of output has requires_grad=True, this checkpoint() is not necessary

Expected behavior

我想冻住大部分参数对其余参数进行微调,我通过修改workshop中的代码:
for lay_name, param in model.named_parameters():
if lay_name in llama3_vl_train_layers:
param.requires_grad = True
logger.info(f"name: {lay_name}")
else:
param.requires_grad = False
成功微调了qwen2模型,但是在微调llama3时遇到了这个问题

Others

No response

@github-actions github-actions bot added the pending This problem is yet to be addressed label Dec 17, 2024
@hiyouga
Copy link
Owner

hiyouga commented Dec 17, 2024

不影响训练

@hiyouga hiyouga closed this as completed Dec 17, 2024
@hiyouga hiyouga added solved This problem has been already solved and removed pending This problem is yet to be addressed labels Dec 17, 2024
@TimeFlysLeo
Copy link
Author

最后出现runtime error了,程序自动停止了😂影响训练了

@TimeFlysLeo
Copy link
Author

在我上面放的报错信息里,raise RuntimeError(
RuntimeError: none of output has requires_grad=True, this checkpoint() is not necessary这一句输出后程序自动停止就停止训练了

@TimeFlysLeo
Copy link
Author

请问改了这两个文件的代码就好了对吗😂

@hiyouga
Copy link
Owner

hiyouga commented Dec 17, 2024

set use_reentrant_gc: false

@TimeFlysLeo
Copy link
Author

可以了,非常感谢您,不过请问这是为什么呀?为什么false这个参数就好了

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
solved This problem has been already solved
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants