Skip to content

'Tokenizer class TokenizersBackend does not exist' when training RLVR with Llama-3.2 #5078

@cvoelcker

Description

@cvoelcker

System Info

----------Python Info----------
Version : 3.12.12
Compiler : Clang 21.1.4
Build : ('main', 'Oct 31 2025 23:00:46')
Arch : ('64bit', 'ELF')
------------Pip Info-----------
No corresponding pip install for current python.
vllm : 0.12.0
sglang : not found.
ray : 2.53.0
torch : 2.9.0+cu129
----------verl Info-----------
Version : 0.8.0.dev
Directory : verl/verl
Commit Hash : 5ab9290
----------Platform Info----------
Platform : Linux-5.14.0-362.24.1.el9_3.aarch64+64k-aarch64-with-glibc2.34
system : Linux
node :
release : 5.14.0-362.24.1.el9_3.aarch64+64k
version :
----------Environment----------
OMP_NUM_THREADS="1"
CC="gcc"
CXX="nvc++"
CUDA Runtime : 12.9
CUDA Compiler : Cuda compilation tools, release 13.0, V13.0.48
----------System Info----------
CPU Memory : 212.75 GB
GPU Count : 1
GPU 1 Type : NVIDIA GH200 120GB
GPU 1 Memory : 95.58 GB

Additional library version information:

transfomers == 4.57.6
torch == 2.9.0+cu129
flash_attn==2.8.3

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

I installed the verl codebase using a slightly modified script/install.sh variant. Changed packages above

I then tried to run PPO training on a Llama-3.2 model. Call below:

`
policy_lr="1e-6"
critic_lr="1.0e-5"
use_kl_loss=True
kl_loss_coeff="0.01"
use_kl_in_reward=False
kl_reward_coeff="0.01"
norm_adv=True
low_clip="1e10"
high_clip="1e10"
seed="1785319203"
combo_id=1
run_id=1
warmup_steps=50
rollout=8

python3 -m verl.trainer.main_ppo
algorithm.adv_estimator=gae
data.train_files="$train_files"
data.val_files="$test_files"
data.train_batch_size=64
data.max_prompt_length=1024
data.max_response_length=1024
data.filter_overlong_prompts=True
critic.optim.lr=$critic_lr \
trainer.critic_warmup=$warmup_steps
algorithm.kl_ctrl.kl_coef=$kl_reward_coeff
data.seed=$seed
critic.ppo_micro_batch_size_per_gpu=16
actor_rollout_ref.actor.kl_loss_type=low_var_kl
actor_rollout_ref.actor.entropy_coeff=0
actor_rollout_ref.actor.fsdp_config.param_offload=False
actor_rollout_ref.actor.fsdp_config.optimizer_offload=False
actor_rollout_ref.rollout.name=vllm
actor_rollout_ref.rollout.n=$rollout
actor_rollout_ref.rollout.tensor_model_parallel_size=1
actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=1
actor_rollout_ref.rollout.gpu_memory_utilization=0.4
actor_rollout_ref.rollout.enforce_eager=True
actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=1
actor_rollout_ref.ref.fsdp_config.param_offload=True
critic.model.path=${model_dir}
critic.model.enable_gradient_checkpointing=True
critic.model.use_remove_padding=True
trainer.logger='["console","wandb"]'
trainer.project_name='llama3-1b-instruct'
trainer.experiment_name="ppo_exp_c${combo_id}_r${run_id}_s${seed}"
trainer.n_gpus_per_node=1
trainer.nnodes=1
trainer.save_freq=100
trainer.test_freq=100
trainer.total_epochs=5
2>&1 | tee -a "$log_file"`

The train and test files are the standard math dataset. The verl code is unmodified

Expected behavior

The training should run. Instead, I receive the following trace

File "<string>", line 30, in __init__ File "verl/verl/workers/config/critic.py", line 204, in __post_init__ super().__post_init__() File "verl/verl/workers/config/critic.py", line 96, in __post_init__ self.model_config = HFModelConfig( ^^^^^^^^^^^^^^ File "<string>", line 35, in __init__ File "verl/verl/workers/config/model.py", line 156, in __post_init__ self.tokenizer = hf_tokenizer(self.local_tokenizer_path, trust_remote_code=self.trust_remote_code) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "verl/verl/utils/tokenizer.py", line 61, in hf_tokenizer tokenizer = AutoTokenizer.from_pretrained(name_or_path, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File ".venv/lib/python3.12/site-packages/transformers/models/auto/tokenization_auto.py", line 1153, in from_pretrained raise ValueError( ValueError: Tokenizer class TokenizersBackend does not exist or is not currently imported.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions