Error Saving Model Due to Incorrect Relative Import #6399

maksimstw · 2024-12-19T20:48:21Z

Reminder

I have read the README and searched the existing issues.

System Info

llamafactory version: 0.9.2.dev0
Platform: Linux-5.15.0-1064-azure-x86_64-with-glibc2.31
Python version: 3.10.14
PyTorch version: 2.5.1+cu124 (GPU)
Transformers version: 4.46.1
Datasets version: 3.1.0
Accelerate version: 1.0.1
PEFT version: 0.12.0
TRL version: 0.9.6
GPU type: NVIDIA A100-SXM4-80GB
DeepSpeed version: 0.14.4
vLLM version: 0.6.4.post1

Reproduction

command:

llamafactory-cli train phi3.yaml

phi3.yaml:

### model
model_name_or_path: ../../models/Phi-3-mini-4k-instruct

### method
stage: sft
do_train: true
finetuning_type: full

### ddp
ddp_timeout: 180000000
deepspeed: examples/deepspeed/ds_z3_config.json

### dataset
dataset: wildfeedback-gpt4o-sft
template: phi
cutoff_len: 4096
max_samples: 10
overwrite_cache: true
preprocessing_num_workers: 16
mask_history: true

### output
output_dir: ../../models/wildfeedback-december/phi-wildfeedback-gpt4o-sft
logging_steps: 10
save_steps: 50000
plot_loss: true
overwrite_output_dir: true

### train
per_device_train_batch_size: 2
gradient_accumulation_steps: 8
learning_rate: 5.0e-6
num_train_epochs: 1
lr_scheduler_type: cosine
warmup_ratio: 0.1
fp16: true

### eval
val_size: 0.005
per_device_eval_batch_size: 8
eval_strategy: steps
eval_steps: 40

# report
report_to: wandb
run_name: phi-wildfeedback-gpt4o-sft

After training, when saving the model, I got the following error. It seems that the relative path is not constructed correctly.

100%███████████████████████████████████████████| 1/1 [00:05<00:00,  5.78s/it][INFO|trainer.py:3801] 2024-12-19 20:23:29,329 >> Saving model checkpoint to ../../models/wildfeedback-december/phi-wildfeedback-gpt4o-sft/checkpoint-1
Traceback (most recent call last):
  File "/app/src/llamafactory/launcher.py", line 23, in <module>
    launch()
  File "/app/src/llamafactory/launcher.py", line 19, in launch
    run_exp()
  File "/app/src/llamafactory/train/tuner.py", line 50, in run_exp
    run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
  File "/app/src/llamafactory/train/sft/workflow.py", line 101, in run_sft
    train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
  File "/opt/conda/envs/ptca/lib/python3.10/site-packages/transformers/trainer.py", line 2122, in train
    return inner_training_loop(
  File "/opt/conda/envs/ptca/lib/python3.10/site-packages/transformers/trainer.py", line 2541, in _inner_training_loop
    self._maybe_log_save_evaluate(tr_loss, grad_norm, model, trial, epoch, ignore_keys_for_eval)
  File "/opt/conda/envs/ptca/lib/python3.10/site-packages/transformers/trainer.py", line 3000, in _maybe_log_save_evaluate
    self._save_checkpoint(model, trial, metrics=metrics)
  File "/opt/conda/envs/ptca/lib/python3.10/site-packages/transformers/trainer.py", line 3090, in _save_checkpoint
    self.save_model(output_dir, _internal_call=True)
  File "/opt/conda/envs/ptca/lib/python3.10/site-packages/transformers/trainer.py", line 3706, in save_model
    self._save(output_dir, state_dict=state_dict)
  File "/opt/conda/envs/ptca/lib/python3.10/site-packages/transformers/trainer.py", line 3823, in _save
    self.model.save_pretrained(
  File "/opt/conda/envs/ptca/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2809, in save_pretrained
    custom_object_save(self, save_directory, config=self.config)
  File "/opt/conda/envs/ptca/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 623, in custom_object_save
    for needed_file in get_relative_import_files(object_file):
  File "/opt/conda/envs/ptca/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 128, in get_relative_import_files
    new_imports.extend(get_relative_imports(f))
  File "/opt/conda/envs/ptca/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 97, in get_relative_imports
    with open(module_file, "r", encoding="utf-8") as f:
FileNotFoundError: [Errno 2] No such file or directory: '/opt/conda/envs/ptca/lib/python3.10/site-packages/transformers/models/phi3/..generation.py'

Expected behavior

No response

Others

I used the same yaml config to train llama and qwen but received no error. I can also save the model using the following code, which supposedly uses the same save_pretrained method.

from transformers import AutoModel, AutoTokenizer

# Path to your model
model_path = "../../models/Phi-3-mini-4k-instruct"  # Update with your path
output_path = "../../models/wildfeedback-december/phi-wildfeedback-gpt4o-sft"  # Directory to save the model

try:
    # Load the model and tokenizer
    model = AutoModel.from_pretrained(model_path)
    tokenizer = AutoTokenizer.from_pretrained(model_path)

    print("Model and tokenizer loaded successfully.")

    # Save the model and tokenizer
    model.save_pretrained(output_path)
    tokenizer.save_pretrained(output_path)

    print(f"Model and tokenizer saved successfully to {output_path}.")
except Exception as e:
    print(f"An error occurred: {e}")

Could anyone please help? Thank you!

The text was updated successfully, but these errors were encountered:

maksimstw · 2024-12-19T20:56:07Z

Similar issue is found here #6210, but my model path seems correct.

hiyouga · 2024-12-20T08:42:02Z

use absolute path

maksimstw · 2024-12-20T17:30:00Z

I tried using absolute paths for both the model_name_or_path and output_dir, but it did not work. I can use the same script to train the LLaMA and Qwen models (also using relative paths) without any problems. The issue seems to be that Phi-3 uses a custom Python file for its model configuration (e.g., configuration_phi3.py), which triggers the get_relative_imports function in the Transformers library, resulting this error.

Do you have any other suggestions for this issue? Thanks!

github-actions bot added the pending This problem is yet to be addressed label Dec 19, 2024

hiyouga closed this as completed Dec 20, 2024

hiyouga added duplicate This issue or pull request already exists solved This problem has been already solved and removed pending This problem is yet to be addressed labels Dec 20, 2024

maksimstw mentioned this issue Dec 20, 2024

Error Saving Model Due to Incorrect Relative Import #6411

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error Saving Model Due to Incorrect Relative Import #6399

Error Saving Model Due to Incorrect Relative Import #6399

maksimstw commented Dec 19, 2024 •

edited

Loading

maksimstw commented Dec 19, 2024 •

edited

Loading

hiyouga commented Dec 20, 2024

maksimstw commented Dec 20, 2024

Error Saving Model Due to Incorrect Relative Import #6399

Error Saving Model Due to Incorrect Relative Import #6399

Comments

maksimstw commented Dec 19, 2024 • edited Loading

Reminder

System Info

Reproduction

Expected behavior

Others

maksimstw commented Dec 19, 2024 • edited Loading

hiyouga commented Dec 20, 2024

maksimstw commented Dec 20, 2024

maksimstw commented Dec 19, 2024 •

edited

Loading

maksimstw commented Dec 19, 2024 •

edited

Loading