RuntimeError: none of output has requires_grad=True, this checkpoint() is not necessary #6358

TimeFlysLeo · 2024-12-17T10:05:35Z

Reminder

I have read the README and searched the existing issues.

System Info

llamafactory version: 0.9.2.dev0
Platform: Windows-10-10.0.22631-SP0
Python version: 3.11.11
PyTorch version: 2.5.1+cu121 (GPU)
Transformers version: 4.46.1
Datasets version: 3.1.0
Accelerate version: 1.0.1
PEFT version: 0.12.0
TRL version: 0.9.6
GPU type: NVIDIA GeForce RTX 3090

Reproduction

llamafactory-cli train .\examples\train_full\llama3_full_sft.yaml
:\Users\W10\miniconda3\envs\Qwen2-VL-Finetune\Lib\site-packages\torch\utils\checkpoint.py:87: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
warnings.warn(
[WARNING|logging.py:168] 2024-12-17 17:52:54,102 >> use_cache=True is incompatible with gradient checkpointing. Setting use_cache=False.
Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in run_code
File "C:\Users\W10\miniconda3\envs\Qwen2-VL-Finetune\Scripts\llamafactory-cli.exe_main.py", line 7, in
File "D:\LLM_Privacy\LLaMA-Factory\src\llamafactory\cli.py", line 111, in main
run_exp()
File "D:\LLM_Privacy\LLaMA-Factory\src\llamafactory\train\tuner.py", line 50, in run_exp
run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
File "D:\LLM_Privacy\LLaMA-Factory\src\llamafactory\train\sft\workflow.py", line 159, in run_sft
train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\W10\miniconda3\envs\Qwen2-VL-Finetune\Lib\site-packages\transformers\trainer.py", line 2122, in train
return inner_training_loop(
^^^^^^^^^^^^^^^^^^^^
File "C:\Users\W10\miniconda3\envs\Qwen2-VL-Finetune\Lib\site-packages\transformers\trainer.py", line 2474, in inner_training_loop
tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\W10\miniconda3\envs\Qwen2-VL-Finetune\Lib\site-packages\transformers\trainer.py", line 3606, in training_step
self.accelerator.backward(loss, **kwargs)
File "C:\Users\W10\miniconda3\envs\Qwen2-VL-Finetune\Lib\site-packages\accelerate\accelerator.py", line 2246, in backward
loss.backward(**kwargs)
File "C:\Users\W10\miniconda3\envs\Qwen2-VL-Finetune\Lib\site-packages\torch_tensor.py", line 581, in backward
torch.autograd.backward(
File "C:\Users\W10\miniconda3\envs\Qwen2-VL-Finetune\Lib\site-packages\torch\autograd_init.py", line 347, in backward
_engine_run_backward(
File "C:\Users\W10\miniconda3\envs\Qwen2-VL-Finetune\Lib\site-packages\torch\autograd\graph.py", line 825, in _engine_run_backward
return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\W10\miniconda3\envs\Qwen2-VL-Finetune\Lib\site-packages\torch\autograd\function.py", line 307, in apply
return user_fn(self, *args)
^^^^^^^^^^^^^^^^^^^^
File "C:\Users\W10\miniconda3\envs\Qwen2-VL-Finetune\Lib\site-packages\torch\utils\checkpoint.py", line 317, in backward
raise RuntimeError(
RuntimeError: none of output has requires_grad=True, this checkpoint() is not necessary

Expected behavior

我想冻住大部分参数对其余参数进行微调，我通过修改workshop中的代码：
for lay_name, param in model.named_parameters():
if lay_name in llama3_vl_train_layers:
param.requires_grad = True
logger.info(f"name: {lay_name}")
else:
param.requires_grad = False
成功微调了qwen2模型，但是在微调llama3时遇到了这个问题

Others

No response

The text was updated successfully, but these errors were encountered:

hiyouga · 2024-12-17T10:09:39Z

不影响训练

TimeFlysLeo · 2024-12-17T10:51:34Z

最后出现runtime error了，程序自动停止了😂影响训练了

TimeFlysLeo · 2024-12-17T10:53:23Z

在我上面放的报错信息里，raise RuntimeError(
RuntimeError: none of output has requires_grad=True, this checkpoint() is not necessary这一句输出后程序自动停止就停止训练了

TimeFlysLeo · 2024-12-17T11:46:55Z

请问改了这两个文件的代码就好了对吗😂

hiyouga · 2024-12-17T11:48:44Z

set use_reentrant_gc: false

TimeFlysLeo · 2024-12-17T15:22:40Z

可以了，非常感谢您，不过请问这是为什么呀？为什么false这个参数就好了

github-actions bot added the pending This problem is yet to be addressed label Dec 17, 2024

hiyouga closed this as completed Dec 17, 2024

hiyouga added solved This problem has been already solved and removed pending This problem is yet to be addressed labels Dec 17, 2024

hiyouga added a commit that referenced this issue Dec 17, 2024

support non-reenterent-gc & fix #6358

f319da6

hiyouga mentioned this issue Dec 17, 2024

[model] support non-reenterent-gc #6364

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: none of output has requires_grad=True, this checkpoint() is not necessary #6358

RuntimeError: none of output has requires_grad=True, this checkpoint() is not necessary #6358

TimeFlysLeo commented Dec 17, 2024

hiyouga commented Dec 17, 2024

TimeFlysLeo commented Dec 17, 2024

TimeFlysLeo commented Dec 17, 2024

TimeFlysLeo commented Dec 17, 2024

hiyouga commented Dec 17, 2024

TimeFlysLeo commented Dec 17, 2024

RuntimeError: none of output has requires_grad=True, this checkpoint() is not necessary #6358

RuntimeError: none of output has requires_grad=True, this checkpoint() is not necessary #6358

Comments

TimeFlysLeo commented Dec 17, 2024

Reminder

System Info

Reproduction

Expected behavior

Others

hiyouga commented Dec 17, 2024

TimeFlysLeo commented Dec 17, 2024

TimeFlysLeo commented Dec 17, 2024

TimeFlysLeo commented Dec 17, 2024

hiyouga commented Dec 17, 2024

TimeFlysLeo commented Dec 17, 2024