During the execution of XPO, a 'tokenizer' KeyError suddenly occurred in callbacks.py #2264

ArcherShirou · 2024-10-23T02:46:25Z

System Info

Platform: Linux-5.4.0-42-generic-x86_64-with-glibc2.35
Python version: 3.10.15
PyTorch version: 2.5.0
CUDA device(s): NVIDIA A800-SXM4-80GB, NVIDIA A800-SXM4-80GB, NVIDIA A800-SXM4-80GB, NVIDIA A800-SXM4-80GB, NVIDIA A800-SXM4-80GB, NVIDIA A800-SXM4-80GB, NVIDIA A800-SXM4-80GB, NVIDIA A800-SXM4-80GB
Transformers version: 4.46.0.dev0
Accelerate version: 1.0.1
Accelerate config: not found
Datasets version: 3.0.1
HF Hub version: 0.25.2
TRL version: 0.12.0.dev0+31b7820
bitsandbytes version: 0.44.1
DeepSpeed version: 0.15.2
Diffusers version: not installed
Liger-Kernel version: 0.3.1
LLM-Blender version: not installed
OpenAI version: not installed
PEFT version: 0.13.2

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder
My own task or dataset (give details below)

Reproduction

I encountered a troubling issue while running the XPO program: the first 500 steps ran smoothly, but suddenly, an error occurred in the middle, as shown below:

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
 21%|█████████████████▎                                                                | 499/2361 [7:31:45<28:06:19, 54.34s/it]Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
 21%|█████████████████▎                                                                | 500/2361 [7:32:39<28:04:17, 54.30s/it]Traceback (most recent call last):
  File "/llm-align/trl/xpo.py", line 118, in <module>
    trainer.train()
  File "/llm-align/miniconda3/envs/trl/lib/python3.10/site-packages/transformers/trainer.py", line 2112, in train
    return inner_training_loop(
  File "/llm-align/miniconda3/envs/trl/lib/python3.10/site-packages/transformers/trainer.py", line 2533, in _inner_training_loop
    self.control = self.callback_handler.on_step_end(args, self.state, self.control)
  File "/llm-align/miniconda3/envs/trl/lib/python3.10/site-packages/transformers/trainer_callback.py", line 496, in on_step_end
    return self.call_event("on_step_end", args, state, control)
  File "/llm-align/miniconda3/envs/trl/lib/python3.10/site-packages/transformers/trainer_callback.py", line 518, in call_event
    result = getattr(callback, event)(
  File "/llm-align/trl/trl/trainer/callbacks.py", line 404, in on_step_end
    tokenizer = kwargs["tokenizer"]
KeyError: 'tokenizer'

Prior to this, the LogCompletionsCallback function was running normally and produced the following records:

{'loss': 0.6948, 'grad_norm': 0.6043311953544617, 'learning_rate': 4.826991329231957e-06, 'loss/dpo': 0.6947265625, 'loss/xpo': -0.000594329833984375, 'objective/kl': -0.00389404296875, 'objective/entropy': 56.7625, 'objective/model_scores': -3.3757302939891813, 'objective/ref_scores': -3.0598522454500197, 'objective/scores_margin': -0.3158780336380005, 'rewards/chosen': -0.00250091552734375, 'rewards/rejected': 0.000923919677734375, 'rewards/accuracies': 0.36875, 'rewards/margins': -0.0034198760986328125, 'logps/chosen': -109.475, 'logps/rejected': -117.65, 'val/model_contain_eos_token': 0.0, 'val/ref_contain_eos_token': 0.0, 'alpha': 1e-05, 'beta': 0.10000000000000002, 'epoch': 0.21}

I use trl-lib/ultrafeedback-prompt](https://huggingface.co/datasets/trl-lib/ultrafeedback-prompt) prompt only dataset like this:

[
    {
        "prompt": "create a table with 5 meals per day for 2 days, this is prepared for a 20 year old female. \nit sould be vegan, i should not contain nuts.\nshow a table with the meal, description, calorie count \nshow it in this style:\nDay n\nMeal n: meal name\n\nn ingredient\nn ingredient\nn ingredient\nn calories"
    },
    {
        "prompt": "In this task you will be given a list of integers. You should find the maximum absolute difference between 2 integers in the list. The absolute difference is the absolute value of one integer subtracted by another. The output should be a single integer which is the largest possible absolute distance.\nQ: [31, 28, -27]\nA:"
    },
...
]

Could you please advise on how to resolve this bug? Thanks

More Info

my script is:

export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
deepspeed  --num_gpus 8 --master_port=29501  xpo.py \
    --deepspeed ds_config.json \
    --do_train \
    --model_name_or_path  /llm-align/qwen2.5-14B-update2 \
    --reward_model_path /llm-align/qwen2-0.5B-reward \
    --dataset_name ultrafeedback \
    --learning_rate 5.0e-6 \
    --beta 0.1 \
    --torch_dtype bfloat16 \
    --output_dir /llm-align/qwen2.5-14B-xpo-lora \
    --num_train_epochs 1 \
    --max_new_tokens 64 \
    --warmup_ratio 0.1 \
    --missing_eos_penalty 1.0 \
    --overwrite_output_dir \
    --logging_steps 10 \
    --optim paged_adamw_32bit \
    --save_steps 100 \
    --save_total_limit 5 \
    --lr_scheduler_type 'cosine' \
    --load_in_4bit \
    --use_bnb_nested_quant \
    --use_peft  \
    --lora_r 16 \
    --lora_alpha 16 \
    --lora_target_modules all-linear \
    --attn_implementation flash_attention_2 \
    --per_device_train_batch_size 1 \
    --gradient_accumulation_steps 2 \
    --missing_eos_penalty 1.0  \
    --ddp_timeout 180000000

and I revise the offcial xpo.py as fllow:

   dataset = load_dataset('json', data_files={'train': '/llm-align/ultrafeedback-prompt-train.json',
                                               'test': '/llm-align/ultrafeedback-prompt-test.json'})   # use local dataset
    trainer = XPOTrainer(
            model=model,
            ref_model=ref_model,
            reward_model=reward_model,
            args=training_args,
            train_dataset=dataset['train'],
            eval_dataset=dataset['test'],
            processing_class=tokenizer,
            peft_config=get_peft_config(model_config),  # add this line
        )
  
  model.save_pretrained(training_args.output_dir) # save lora model

Expected behavior

NO

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

During the execution of XPO, a 'tokenizer' KeyError suddenly occurred in callbacks.py #2264

During the execution of XPO, a 'tokenizer' KeyError suddenly occurred in callbacks.py #2264

ArcherShirou commented Oct 23, 2024 •

edited

Loading

During the execution of XPO, a 'tokenizer' KeyError suddenly occurred in callbacks.py #2264

During the execution of XPO, a 'tokenizer' KeyError suddenly occurred in callbacks.py #2264

Comments

ArcherShirou commented Oct 23, 2024 • edited Loading

System Info

Information

Tasks

Reproduction

More Info

Expected behavior

ArcherShirou commented Oct 23, 2024 •

edited

Loading