'Tokenizer class TokenizersBackend does not exist' when training RLVR with Llama-3.2

### System Info

----------Python Info----------
Version      : 3.12.12
Compiler     : Clang 21.1.4
Build        : ('main', 'Oct 31 2025 23:00:46')
Arch         : ('64bit', 'ELF')
------------Pip Info-----------
No corresponding pip install for current python.
vllm         : 0.12.0
sglang       : not found.
ray          : 2.53.0
torch        : 2.9.0+cu129
----------verl Info-----------
Version      : 0.8.0.dev
Directory    : verl/verl
Commit Hash  : 5ab92908781e7e62d769997acc1c8174cf1f2cdf
----------Platform Info----------
Platform     : Linux-5.14.0-362.24.1.el9_3.aarch64+64k-aarch64-with-glibc2.34
system       : Linux
node         : 
release      : 5.14.0-362.24.1.el9_3.aarch64+64k
version      : 
----------Environment----------
OMP_NUM_THREADS="1"
CC="gcc"
CXX="nvc++"
CUDA Runtime : 12.9
CUDA Compiler : Cuda compilation tools, release 13.0, V13.0.48
----------System Info----------
CPU Memory      : 212.75 GB
GPU Count       : 1
GPU 1   Type    : NVIDIA GH200 120GB
GPU 1   Memory  : 95.58 GB

Additional library version information:

transfomers == 4.57.6
torch == 2.9.0+cu129
flash_attn==2.8.3


### Information

- [ ] The official example scripts
- [x] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [x] My own task or dataset (give details below)

### Reproduction

I installed the verl codebase using a slightly modified script/install.sh variant. Changed packages above

I then tried to run PPO training on a Llama-3.2 model. Call below:

`
policy_lr="1e-6"
critic_lr="1.0e-5"
use_kl_loss=True
kl_loss_coeff="0.01"
use_kl_in_reward=False
kl_reward_coeff="0.01"
norm_adv=True
low_clip="1e10"
high_clip="1e10"
seed="1785319203"
combo_id=1
run_id=1
warmup_steps=50
rollout=8

python3 -m verl.trainer.main_ppo \
    algorithm.adv_estimator=gae \
    data.train_files="$train_files" \
    data.val_files="$test_files" \
    data.train_batch_size=64 \
    data.max_prompt_length=1024 \
    data.max_response_length=1024 \
    data.filter_overlong_prompts=True \
    critic.optim.lr=$critic_lr \                                                                                                                                                   
    trainer.critic_warmup=$warmup_steps \
    algorithm.kl_ctrl.kl_coef=$kl_reward_coeff \
    data.seed=$seed \
    critic.ppo_micro_batch_size_per_gpu=16 \
    actor_rollout_ref.actor.kl_loss_type=low_var_kl \
    actor_rollout_ref.actor.entropy_coeff=0 \
    actor_rollout_ref.actor.fsdp_config.param_offload=False \
    actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
    actor_rollout_ref.rollout.name=vllm \
    actor_rollout_ref.rollout.n=$rollout \
    actor_rollout_ref.rollout.tensor_model_parallel_size=1 \
    actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=1 \
    actor_rollout_ref.rollout.gpu_memory_utilization=0.4 \
    actor_rollout_ref.rollout.enforce_eager=True \
    actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=1 \
    actor_rollout_ref.ref.fsdp_config.param_offload=True \
    critic.model.path=${model_dir} \
    critic.model.enable_gradient_checkpointing=True \
    critic.model.use_remove_padding=True \
    trainer.logger='["console","wandb"]' \
    trainer.project_name='llama3-1b-instruct' \
    trainer.experiment_name="ppo_exp_c${combo_id}_r${run_id}_s${seed}" \
    trainer.n_gpus_per_node=1 \
    trainer.nnodes=1 \
    trainer.save_freq=100 \
    trainer.test_freq=100 \
    trainer.total_epochs=5 \
    2>&1 | tee -a "$log_file"`

The train and test files are the standard math dataset. The verl code is unmodified

### Expected behavior

The training should run. Instead, I receive the following trace

`File "<string>", line 30, in __init__
  File "verl/verl/workers/config/critic.py", line 204, in __post_init__
    super().__post_init__()
  File "verl/verl/workers/config/critic.py", line 96, in __post_init__
    self.model_config = HFModelConfig(
                        ^^^^^^^^^^^^^^
  File "<string>", line 35, in __init__
  File "verl/verl/workers/config/model.py", line 156, in __post_init__
    self.tokenizer = hf_tokenizer(self.local_tokenizer_path, trust_remote_code=self.trust_remote_code)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "verl/verl/utils/tokenizer.py", line 61, in hf_tokenizer
    tokenizer = AutoTokenizer.from_pretrained(name_or_path, **kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/transformers/models/auto/tokenization_auto.py", line 1153, in from_pretrained
    raise ValueError(
ValueError: Tokenizer class TokenizersBackend does not exist or is not currently imported.`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

'Tokenizer class TokenizersBackend does not exist' when training RLVR with Llama-3.2 #5078

System Info

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

'Tokenizer class TokenizersBackend does not exist' when training RLVR with Llama-3.2 #5078

Description

System Info

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions