Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error Saving Model Due to Incorrect Relative Import #6399

Closed
1 task done
maksimstw opened this issue Dec 19, 2024 · 3 comments
Closed
1 task done

Error Saving Model Due to Incorrect Relative Import #6399

maksimstw opened this issue Dec 19, 2024 · 3 comments
Labels
duplicate This issue or pull request already exists solved This problem has been already solved

Comments

@maksimstw
Copy link

maksimstw commented Dec 19, 2024

Reminder

  • I have read the README and searched the existing issues.

System Info

  • llamafactory version: 0.9.2.dev0
  • Platform: Linux-5.15.0-1064-azure-x86_64-with-glibc2.31
  • Python version: 3.10.14
  • PyTorch version: 2.5.1+cu124 (GPU)
  • Transformers version: 4.46.1
  • Datasets version: 3.1.0
  • Accelerate version: 1.0.1
  • PEFT version: 0.12.0
  • TRL version: 0.9.6
  • GPU type: NVIDIA A100-SXM4-80GB
  • DeepSpeed version: 0.14.4
  • vLLM version: 0.6.4.post1

Reproduction

command:

llamafactory-cli train phi3.yaml

phi3.yaml:

### model
model_name_or_path: ../../models/Phi-3-mini-4k-instruct

### method
stage: sft
do_train: true
finetuning_type: full

### ddp
ddp_timeout: 180000000
deepspeed: examples/deepspeed/ds_z3_config.json

### dataset
dataset: wildfeedback-gpt4o-sft
template: phi
cutoff_len: 4096
max_samples: 10
overwrite_cache: true
preprocessing_num_workers: 16
mask_history: true

### output
output_dir: ../../models/wildfeedback-december/phi-wildfeedback-gpt4o-sft
logging_steps: 10
save_steps: 50000
plot_loss: true
overwrite_output_dir: true

### train
per_device_train_batch_size: 2
gradient_accumulation_steps: 8
learning_rate: 5.0e-6
num_train_epochs: 1
lr_scheduler_type: cosine
warmup_ratio: 0.1
fp16: true

### eval
val_size: 0.005
per_device_eval_batch_size: 8
eval_strategy: steps
eval_steps: 40

# report
report_to: wandb
run_name: phi-wildfeedback-gpt4o-sft

After training, when saving the model, I got the following error. It seems that the relative path is not constructed correctly.

100%███████████████████████████████████████████| 1/1 [00:05<00:00,  5.78s/it][INFO|trainer.py:3801] 2024-12-19 20:23:29,329 >> Saving model checkpoint to ../../models/wildfeedback-december/phi-wildfeedback-gpt4o-sft/checkpoint-1
Traceback (most recent call last):
  File "/app/src/llamafactory/launcher.py", line 23, in <module>
    launch()
  File "/app/src/llamafactory/launcher.py", line 19, in launch
    run_exp()
  File "/app/src/llamafactory/train/tuner.py", line 50, in run_exp
    run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
  File "/app/src/llamafactory/train/sft/workflow.py", line 101, in run_sft
    train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
  File "/opt/conda/envs/ptca/lib/python3.10/site-packages/transformers/trainer.py", line 2122, in train
    return inner_training_loop(
  File "/opt/conda/envs/ptca/lib/python3.10/site-packages/transformers/trainer.py", line 2541, in _inner_training_loop
    self._maybe_log_save_evaluate(tr_loss, grad_norm, model, trial, epoch, ignore_keys_for_eval)
  File "/opt/conda/envs/ptca/lib/python3.10/site-packages/transformers/trainer.py", line 3000, in _maybe_log_save_evaluate
    self._save_checkpoint(model, trial, metrics=metrics)
  File "/opt/conda/envs/ptca/lib/python3.10/site-packages/transformers/trainer.py", line 3090, in _save_checkpoint
    self.save_model(output_dir, _internal_call=True)
  File "/opt/conda/envs/ptca/lib/python3.10/site-packages/transformers/trainer.py", line 3706, in save_model
    self._save(output_dir, state_dict=state_dict)
  File "/opt/conda/envs/ptca/lib/python3.10/site-packages/transformers/trainer.py", line 3823, in _save
    self.model.save_pretrained(
  File "/opt/conda/envs/ptca/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2809, in save_pretrained
    custom_object_save(self, save_directory, config=self.config)
  File "/opt/conda/envs/ptca/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 623, in custom_object_save
    for needed_file in get_relative_import_files(object_file):
  File "/opt/conda/envs/ptca/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 128, in get_relative_import_files
    new_imports.extend(get_relative_imports(f))
  File "/opt/conda/envs/ptca/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 97, in get_relative_imports
    with open(module_file, "r", encoding="utf-8") as f:
FileNotFoundError: [Errno 2] No such file or directory: '/opt/conda/envs/ptca/lib/python3.10/site-packages/transformers/models/phi3/..generation.py'

Expected behavior

No response

Others

I used the same yaml config to train llama and qwen but received no error. I can also save the model using the following code, which supposedly uses the same save_pretrained method.

from transformers import AutoModel, AutoTokenizer

# Path to your model
model_path = "../../models/Phi-3-mini-4k-instruct"  # Update with your path
output_path = "../../models/wildfeedback-december/phi-wildfeedback-gpt4o-sft"  # Directory to save the model

try:
    # Load the model and tokenizer
    model = AutoModel.from_pretrained(model_path)
    tokenizer = AutoTokenizer.from_pretrained(model_path)

    print("Model and tokenizer loaded successfully.")

    # Save the model and tokenizer
    model.save_pretrained(output_path)
    tokenizer.save_pretrained(output_path)

    print(f"Model and tokenizer saved successfully to {output_path}.")
except Exception as e:
    print(f"An error occurred: {e}")

Could anyone please help? Thank you!

@github-actions github-actions bot added the pending This problem is yet to be addressed label Dec 19, 2024
@maksimstw
Copy link
Author

maksimstw commented Dec 19, 2024

Similar issue is found here #6210, but my model path seems correct.

@hiyouga
Copy link
Owner

hiyouga commented Dec 20, 2024

use absolute path

@hiyouga hiyouga closed this as completed Dec 20, 2024
@hiyouga hiyouga added duplicate This issue or pull request already exists solved This problem has been already solved and removed pending This problem is yet to be addressed labels Dec 20, 2024
@maksimstw
Copy link
Author

I tried using absolute paths for both the model_name_or_path and output_dir, but it did not work. I can use the same script to train the LLaMA and Qwen models (also using relative paths) without any problems. The issue seems to be that Phi-3 uses a custom Python file for its model configuration (e.g., configuration_phi3.py), which triggers the get_relative_imports function in the Transformers library, resulting this error.

Do you have any other suggestions for this issue? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists solved This problem has been already solved
Projects
None yet
Development

No branches or pull requests

2 participants