Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

paligemma2-3b-pt-448在进行eval&predict时出现Logging error导致无法继续测试 #6499

Closed
1 task done
shr0305 opened this issue Dec 31, 2024 · 0 comments · Fixed by huggingface/transformers#35486 or #6512
Closed
1 task done
Labels
solved This problem has been already solved

Comments

@shr0305
Copy link

shr0305 commented Dec 31, 2024

Reminder

  • I have read the README and searched the existing issues.

System Info

  • llamafactory version: 0.9.2.dev0
  • Platform: Linux-5.15.0-101-generic-x86_64-with-glibc2.35
  • Python version: 3.12.3
  • PyTorch version: 2.3.0+cu121 (GPU)
  • Transformers version: 4.46.1
  • Datasets version: 3.1.0
  • Accelerate version: 1.0.1
  • PEFT version: 0.12.0
  • TRL version: 0.9.6
  • GPU type: NVIDIA GeForce RTX 4090 D

Reproduction

--- Logging error ---
Traceback (most recent call last):
  File "/root/miniconda3/lib/python3.12/logging/__init__.py", line 1160, in emit
    msg = self.format(record)
          ^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.12/logging/__init__.py", line 999, in format
    return fmt.format(record)
           ^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.12/logging/__init__.py", line 703, in format
    record.message = record.getMessage()
                     ^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.12/logging/__init__.py", line 392, in getMessage
    msg = msg % self.args
          ~~~~^~~~~~~~~~~
TypeError: not all arguments converted during string formatting
Call stack:
  File "/root/autodl-tmp/SHR-LLaMA-Factory/src/llamafactory/launcher.py", line 23, in <module>
    launch()
  File "/root/autodl-tmp/SHR-LLaMA-Factory/src/llamafactory/launcher.py", line 19, in launch
    run_exp()
  File "/root/autodl-tmp/SHR-LLaMA-Factory/src/llamafactory/train/tuner.py", line 59, in run_exp
    run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
  File "/root/autodl-tmp/SHR-LLaMA-Factory/src/llamafactory/train/sft/workflow.py", line 128, in run_sft
    predict_results = trainer.predict(dataset_module["eval_dataset"], metric_key_prefix="predict", **gen_kwargs)
  File "/root/miniconda3/lib/python3.12/site-packages/transformers/trainer_seq2seq.py", line 259, in predict
    return super().predict(test_dataset, ignore_keys=ignore_keys, metric_key_prefix=metric_key_prefix)
  File "/root/miniconda3/lib/python3.12/site-packages/transformers/trainer.py", line 4042, in predict
    output = eval_loop(
  File "/root/miniconda3/lib/python3.12/site-packages/transformers/trainer.py", line 4158, in evaluation_loop
    losses, logits, labels = self.prediction_step(model, inputs, prediction_loss_only, ignore_keys=ignore_keys)
  File "/root/autodl-tmp/SHR-LLaMA-Factory/src/llamafactory/train/sft/trainer.py", line 130, in prediction_step
    loss, generated_tokens, _ = super().prediction_step(  # ignore the returned labels (may be truncated)
  File "/root/miniconda3/lib/python3.12/site-packages/transformers/trainer_seq2seq.py", line 350, in prediction_step
    outputs = model(**inputs)
  File "/root/miniconda3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniconda3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/lib/python3.12/site-packages/transformers/models/paligemma/modeling_paligemma.py", line 512, in forward
    logger.warning_once(
  File "/root/autodl-tmp/SHR-LLaMA-Factory/src/llamafactory/extras/logging.py", line 168, in warning_once
    self.warning(*args, **kwargs)
Message: '`labels` contains `pad_token_id` which will be masked with `config.ignore_index`. '
Arguments: ('You have to mask out `pad_token_id` when preparing `labels`, this behavior will be removed in v.4.46.',)
[rank0]: Traceback (most recent call last):
[rank0]:   File "/root/autodl-tmp/SHR-LLaMA-Factory/src/llamafactory/launcher.py", line 23, in <module>
[rank0]:     launch()
[rank0]:   File "/root/autodl-tmp/SHR-LLaMA-Factory/src/llamafactory/launcher.py", line 19, in launch
[rank0]:     run_exp()
[rank0]:   File "/root/autodl-tmp/SHR-LLaMA-Factory/src/llamafactory/train/tuner.py", line 59, in run_exp
[rank0]:     run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
[rank0]:   File "/root/autodl-tmp/SHR-LLaMA-Factory/src/llamafactory/train/sft/workflow.py", line 128, in run_sft
[rank0]:     predict_results = trainer.predict(dataset_module["eval_dataset"], metric_key_prefix="predict", **gen_kwargs)
[rank0]:                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/root/miniconda3/lib/python3.12/site-packages/transformers/trainer_seq2seq.py", line 259, in predict
[rank0]:     return super().predict(test_dataset, ignore_keys=ignore_keys, metric_key_prefix=metric_key_prefix)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/root/miniconda3/lib/python3.12/site-packages/transformers/trainer.py", line 4042, in predict
[rank0]:     output = eval_loop(
[rank0]:              ^^^^^^^^^^
[rank0]:   File "/root/miniconda3/lib/python3.12/site-packages/transformers/trainer.py", line 4158, in evaluation_loop
[rank0]:     losses, logits, labels = self.prediction_step(model, inputs, prediction_loss_only, ignore_keys=ignore_keys)
[rank0]:                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/root/autodl-tmp/SHR-LLaMA-Factory/src/llamafactory/train/sft/trainer.py", line 130, in prediction_step
[rank0]:     loss, generated_tokens, _ = super().prediction_step(  # ignore the returned labels (may be truncated)
[rank0]:                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/root/miniconda3/lib/python3.12/site-packages/transformers/trainer_seq2seq.py", line 350, in prediction_step
[rank0]:     outputs = model(**inputs)
[rank0]:               ^^^^^^^^^^^^^^^
[rank0]:   File "/root/miniconda3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/root/miniconda3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/root/miniconda3/lib/python3.12/site-packages/transformers/models/paligemma/modeling_paligemma.py", line 512, in forward
[rank0]:     logger.warning_once(
[rank0]:   File "/root/autodl-tmp/SHR-LLaMA-Factory/src/llamafactory/extras/logging.py", line 168, in warning_once
[rank0]:     self.warning(*args, **kwargs)
[rank0]:   File "/root/miniconda3/lib/python3.12/logging/__init__.py", line 1551, in warning
[rank0]:     self._log(WARNING, msg, args, **kwargs)
[rank0]:   File "/root/miniconda3/lib/python3.12/logging/__init__.py", line 1684, in _log
[rank0]:     self.handle(record)
[rank0]:   File "/root/miniconda3/lib/python3.12/logging/__init__.py", line 1700, in handle
[rank0]:     self.callHandlers(record)
[rank0]:   File "/root/miniconda3/lib/python3.12/logging/__init__.py", line 1762, in callHandlers
[rank0]:     hdlr.handle(record)
[rank0]:   File "/root/miniconda3/lib/python3.12/logging/__init__.py", line 1028, in handle
[rank0]:     self.emit(record)
[rank0]:   File "/root/autodl-tmp/SHR-LLaMA-Factory/src/llamafactory/extras/logging.py", line 61, in emit
[rank0]:     log_entry = self._formatter.format(record)
[rank0]:                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/root/miniconda3/lib/python3.12/logging/__init__.py", line 703, in format
[rank0]:     record.message = record.getMessage()
[rank0]:                      ^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/root/miniconda3/lib/python3.12/logging/__init__.py", line 392, in getMessage
[rank0]:     msg = msg % self.args
[rank0]:           ~~~~^~~~~~~~~~~
[rank0]: TypeError: not all arguments converted during string formatting
/root/miniconda3/lib/python3.12/site-packages/transformers/models/paligemma/configuration_paligemma.py:134: FutureWarning: The `ignore_index` attribute is deprecated and will be removed in v4.47.
  warnings.warn(
W1231 14:47:18.214000 140311803704512 torch/distributed/elastic/multiprocessing/api.py:851] Sending process 2248 closing signal SIGTERM
E1231 14:47:18.529000 140311803704512 torch/distributed/elastic/multiprocessing/api.py:826] failed (exitcode: 1) local_rank: 0 (pid: 2247) of binary: /root/miniconda3/bin/python
Traceback (most recent call last):
  File "/root/miniconda3/bin/torchrun", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/root/miniconda3/lib/python3.12/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 347, in wrapper
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.12/site-packages/torch/distributed/run.py", line 879, in main
    run(args)
  File "/root/miniconda3/lib/python3.12/site-packages/torch/distributed/run.py", line 870, in run
    elastic_launch(
  File "/root/miniconda3/lib/python3.12/site-packages/torch/distributed/launcher/api.py", line 132, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.12/site-packages/torch/distributed/launcher/api.py", line 263, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
/root/autodl-tmp/SHR-LLaMA-Factory/src/llamafactory/launcher.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2024-12-31_14:47:18
  host      : autodl-container-e9e04d9e7a-7d8eafec
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 2247)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

Expected behavior

No response

Others

No response

@github-actions github-actions bot added the pending This problem is yet to be addressed label Dec 31, 2024
hiyouga added a commit that referenced this issue Jan 2, 2025
hiyouga added a commit that referenced this issue Jan 2, 2025
@hiyouga hiyouga added solved This problem has been already solved and removed pending This problem is yet to be addressed labels Jan 2, 2025
@hiyouga hiyouga closed this as completed in 1800f8c Jan 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
solved This problem has been already solved
Projects
None yet
2 participants