Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

During the execution of XPO, a 'tokenizer' KeyError suddenly occurred in callbacks.py #2264

Open
2 of 4 tasks
ArcherShirou opened this issue Oct 23, 2024 · 0 comments
Open
2 of 4 tasks

Comments

@ArcherShirou
Copy link

ArcherShirou commented Oct 23, 2024

System Info

  • Platform: Linux-5.4.0-42-generic-x86_64-with-glibc2.35
  • Python version: 3.10.15
  • PyTorch version: 2.5.0
  • CUDA device(s): NVIDIA A800-SXM4-80GB, NVIDIA A800-SXM4-80GB, NVIDIA A800-SXM4-80GB, NVIDIA A800-SXM4-80GB, NVIDIA A800-SXM4-80GB, NVIDIA A800-SXM4-80GB, NVIDIA A800-SXM4-80GB, NVIDIA A800-SXM4-80GB
  • Transformers version: 4.46.0.dev0
  • Accelerate version: 1.0.1
  • Accelerate config: not found
  • Datasets version: 3.0.1
  • HF Hub version: 0.25.2
  • TRL version: 0.12.0.dev0+31b7820
  • bitsandbytes version: 0.44.1
  • DeepSpeed version: 0.15.2
  • Diffusers version: not installed
  • Liger-Kernel version: 0.3.1
  • LLM-Blender version: not installed
  • OpenAI version: not installed
  • PEFT version: 0.13.2

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder
  • My own task or dataset (give details below)

Reproduction

I encountered a troubling issue while running the XPO program: the first 500 steps ran smoothly, but suddenly, an error occurred in the middle, as shown below:

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
 21%|█████████████████▎                                                                | 499/2361 [7:31:45<28:06:19, 54.34s/it]Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
 21%|█████████████████▎                                                                | 500/2361 [7:32:39<28:04:17, 54.30s/it]Traceback (most recent call last):
  File "/llm-align/trl/xpo.py", line 118, in <module>
    trainer.train()
  File "/llm-align/miniconda3/envs/trl/lib/python3.10/site-packages/transformers/trainer.py", line 2112, in train
    return inner_training_loop(
  File "/llm-align/miniconda3/envs/trl/lib/python3.10/site-packages/transformers/trainer.py", line 2533, in _inner_training_loop
    self.control = self.callback_handler.on_step_end(args, self.state, self.control)
  File "/llm-align/miniconda3/envs/trl/lib/python3.10/site-packages/transformers/trainer_callback.py", line 496, in on_step_end
    return self.call_event("on_step_end", args, state, control)
  File "/llm-align/miniconda3/envs/trl/lib/python3.10/site-packages/transformers/trainer_callback.py", line 518, in call_event
    result = getattr(callback, event)(
  File "/llm-align/trl/trl/trainer/callbacks.py", line 404, in on_step_end
    tokenizer = kwargs["tokenizer"]
KeyError: 'tokenizer'

Prior to this, the LogCompletionsCallback function was running normally and produced the following records:

{'loss': 0.6948, 'grad_norm': 0.6043311953544617, 'learning_rate': 4.826991329231957e-06, 'loss/dpo': 0.6947265625, 'loss/xpo': -0.000594329833984375, 'objective/kl': -0.00389404296875, 'objective/entropy': 56.7625, 'objective/model_scores': -3.3757302939891813, 'objective/ref_scores': -3.0598522454500197, 'objective/scores_margin': -0.3158780336380005, 'rewards/chosen': -0.00250091552734375, 'rewards/rejected': 0.000923919677734375, 'rewards/accuracies': 0.36875, 'rewards/margins': -0.0034198760986328125, 'logps/chosen': -109.475, 'logps/rejected': -117.65, 'val/model_contain_eos_token': 0.0, 'val/ref_contain_eos_token': 0.0, 'alpha': 1e-05, 'beta': 0.10000000000000002, 'epoch': 0.21}

I use trl-lib/ultrafeedback-prompt](https://huggingface.co/datasets/trl-lib/ultrafeedback-prompt) prompt only dataset like this:

[
    {
        "prompt": "create a table with 5 meals per day for 2 days, this is prepared for a 20 year old female. \nit sould be vegan, i should not contain nuts.\nshow a table with the meal, description, calorie count \nshow it in this style:\nDay n\nMeal n: meal name\n\nn ingredient\nn ingredient\nn ingredient\nn calories"
    },
    {
        "prompt": "In this task you will be given a list of integers. You should find the maximum absolute difference between 2 integers in the list. The absolute difference is the absolute value of one integer subtracted by another. The output should be a single integer which is the largest possible absolute distance.\nQ: [31, 28, -27]\nA:"
    },
...
]

Could you please advise on how to resolve this bug? Thanks

More Info

my script is:

export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
deepspeed  --num_gpus 8 --master_port=29501  xpo.py \
    --deepspeed ds_config.json \
    --do_train \
    --model_name_or_path  /llm-align/qwen2.5-14B-update2 \
    --reward_model_path /llm-align/qwen2-0.5B-reward \
    --dataset_name ultrafeedback \
    --learning_rate 5.0e-6 \
    --beta 0.1 \
    --torch_dtype bfloat16 \
    --output_dir /llm-align/qwen2.5-14B-xpo-lora \
    --num_train_epochs 1 \
    --max_new_tokens 64 \
    --warmup_ratio 0.1 \
    --missing_eos_penalty 1.0 \
    --overwrite_output_dir \
    --logging_steps 10 \
    --optim paged_adamw_32bit \
    --save_steps 100 \
    --save_total_limit 5 \
    --lr_scheduler_type 'cosine' \
    --load_in_4bit \
    --use_bnb_nested_quant \
    --use_peft  \
    --lora_r 16 \
    --lora_alpha 16 \
    --lora_target_modules all-linear \
    --attn_implementation flash_attention_2 \
    --per_device_train_batch_size 1 \
    --gradient_accumulation_steps 2 \
    --missing_eos_penalty 1.0  \
    --ddp_timeout 180000000

and I revise the offcial xpo.py as fllow:

   dataset = load_dataset('json', data_files={'train': '/llm-align/ultrafeedback-prompt-train.json',
                                               'test': '/llm-align/ultrafeedback-prompt-test.json'})   # use local dataset
    trainer = XPOTrainer(
            model=model,
            ref_model=ref_model,
            reward_model=reward_model,
            args=training_args,
            train_dataset=dataset['train'],
            eval_dataset=dataset['test'],
            processing_class=tokenizer,
            peft_config=get_peft_config(model_config),  # add this line
        )
  
  model.save_pretrained(training_args.output_dir) # save lora model

Expected behavior

NO

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant