-
Notifications
You must be signed in to change notification settings - Fork 7.1k
Description
Reminder
- I have read the above rules and searched the existing issues.
System Info
llamafactory
version: 0.9.4.dev0- Platform: Linux-6.6.87.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
- Python version: 3.12.9
- PyTorch version: 2.8.0+cu128 (GPU)
- Transformers version: 4.52.4
- Datasets version: 3.4.1
- Accelerate version: 1.7.0
- PEFT version: 0.15.2
- TRL version: 0.9.6
- GPU type: NVIDIA GeForce RTX 5060 Ti
- GPU number: 1
- GPU memory: 15.93GB
- DeepSpeed version: 0.17.2
- Bitsandbytes version: 0.46.0
- Git commit: 4ba7de0
- Default data directory: detected
Reproduction
Put your message here.
[INFO|processing_utils.py:928] 2025-09-03 15:36:34,353 >> loading configuration file InternVL3_5-4B/processor_config.json
Traceback (most recent call last):
File "/home/zysoft/Leo/LLaMA-Factory-main/src/llamafactory/model/loader.py", line 102, in load_tokenizer
processor = AutoProcessor.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/envs/llama-factory/lib/python3.12/site-packages/transformers/models/auto/processing_auto.py", line 376, in from_pretrained
return processor_class.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/envs/llama-factory/lib/python3.12/site-packages/transformers/processing_utils.py", line 1187, in from_pretrained
return cls.from_args_and_dict(args, processor_dict, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/envs/llama-factory/lib/python3.12/site-packages/transformers/processing_utils.py", line 982, in from_args_and_dict
processor = cls(*args, **processor_dict)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/envs/llama-factory/lib/python3.12/site-packages/transformers/models/internvl/processing_internvl.py", line 95, in init
self.start_image_token = tokenizer.start_image_token
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/envs/llama-factory/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 1111, in getattr
raise AttributeError(f"{self.class.name} has no attribute {key}")
AttributeError: Qwen2TokenizerFast has no attribute start_image_token
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/root/anaconda3/envs/llama-factory/bin/llamafactory-cli", line 8, in
sys.exit(main())
^^^^^^
File "/home/zysoft/Leo/LLaMA-Factory-main/src/llamafactory/cli.py", line 151, in main
COMMAND_MAPcommand
File "/home/zysoft/Leo/LLaMA-Factory-main/src/llamafactory/train/tuner.py", line 110, in run_exp
_training_function(config={"args": args, "callbacks": callbacks})
File "/home/zysoft/Leo/LLaMA-Factory-main/src/llamafactory/train/tuner.py", line 72, in _training_function
run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
File "/home/zysoft/Leo/LLaMA-Factory-main/src/llamafactory/train/sft/workflow.py", line 48, in run_sft
tokenizer_module = load_tokenizer(model_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zysoft/Leo/LLaMA-Factory-main/src/llamafactory/model/loader.py", line 114, in load_tokenizer
raise OSError("Failed to load processor.") from e
OSError: Failed to load processor.
Others
- 我已经看过并且下载的是从huggingface上的OpenGVLab/InternVL3_5-4B模型,非魔搭上下载的,并且对照了readme中的配置template,为intern_vl
- 配置文件如下:
model_name_or_path: InternVL3_5-4B
image_max_pixels: 1053696
image_min_pixels: 200704
min_pixels=256 * 28 * 28, # 200,704
max_pixels=1280 * 28 * 28 # 1,003,520
2116800
trust_remote_code: true
flash_attn: fa2
enable_liger_kernel: true
use_unsloth_gc: false
use_unsloth: false
seed: 64
method
stage: sft
do_train: true
finetuning_type: lora
lora_rank: 64
lora_target: all
freeze_vision_tower: false
dataset
dataset: 0902_planner_dataset_1,0902_planner_dataset_2
template: qwen2_vl
template: intern_vl
cutoff_len: 12288
max_samples: 1000000
overwrite_cache: true
preprocessing_num_workers: 4
dataloader_num_workers: 4
media_dir: data/0902_planner_dataset
output
output_dir: saves/0902_planner_dataset_1
logging_steps: 1
save_steps: 500
save_total_limit: 0
save_strategy: steps
plot_loss: true
overwrite_output_dir: true
save_only_model: false
report_to: swanlab # choices: [none, wandb, tensorboard, swanlab, mlflow]
train
per_device_train_batch_size: 2
gradient_accumulation_steps: 2
learning_rate: 5.0e-5
num_train_epochs: 6.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000
resume_from_checkpoint: null