-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Description
System Info
----------Python Info----------
Version : 3.12.12
Compiler : Clang 21.1.4
Build : ('main', 'Oct 31 2025 23:00:46')
Arch : ('64bit', 'ELF')
------------Pip Info-----------
No corresponding pip install for current python.
vllm : 0.12.0
sglang : not found.
ray : 2.53.0
torch : 2.9.0+cu129
----------verl Info-----------
Version : 0.8.0.dev
Directory : verl/verl
Commit Hash : 5ab9290
----------Platform Info----------
Platform : Linux-5.14.0-362.24.1.el9_3.aarch64+64k-aarch64-with-glibc2.34
system : Linux
node :
release : 5.14.0-362.24.1.el9_3.aarch64+64k
version :
----------Environment----------
OMP_NUM_THREADS="1"
CC="gcc"
CXX="nvc++"
CUDA Runtime : 12.9
CUDA Compiler : Cuda compilation tools, release 13.0, V13.0.48
----------System Info----------
CPU Memory : 212.75 GB
GPU Count : 1
GPU 1 Type : NVIDIA GH200 120GB
GPU 1 Memory : 95.58 GB
Additional library version information:
transfomers == 4.57.6
torch == 2.9.0+cu129
flash_attn==2.8.3
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
I installed the verl codebase using a slightly modified script/install.sh variant. Changed packages above
I then tried to run PPO training on a Llama-3.2 model. Call below:
`
policy_lr="1e-6"
critic_lr="1.0e-5"
use_kl_loss=True
kl_loss_coeff="0.01"
use_kl_in_reward=False
kl_reward_coeff="0.01"
norm_adv=True
low_clip="1e10"
high_clip="1e10"
seed="1785319203"
combo_id=1
run_id=1
warmup_steps=50
rollout=8
python3 -m verl.trainer.main_ppo
algorithm.adv_estimator=gae
data.train_files="$train_files"
data.val_files="$test_files"
data.train_batch_size=64
data.max_prompt_length=1024
data.max_response_length=1024
data.filter_overlong_prompts=True
critic.optim.lr=$critic_lr \
trainer.critic_warmup=$warmup_steps
algorithm.kl_ctrl.kl_coef=$kl_reward_coeff
data.seed=$seed
critic.ppo_micro_batch_size_per_gpu=16
actor_rollout_ref.actor.kl_loss_type=low_var_kl
actor_rollout_ref.actor.entropy_coeff=0
actor_rollout_ref.actor.fsdp_config.param_offload=False
actor_rollout_ref.actor.fsdp_config.optimizer_offload=False
actor_rollout_ref.rollout.name=vllm
actor_rollout_ref.rollout.n=$rollout
actor_rollout_ref.rollout.tensor_model_parallel_size=1
actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=1
actor_rollout_ref.rollout.gpu_memory_utilization=0.4
actor_rollout_ref.rollout.enforce_eager=True
actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=1
actor_rollout_ref.ref.fsdp_config.param_offload=True
critic.model.path=${model_dir}
critic.model.enable_gradient_checkpointing=True
critic.model.use_remove_padding=True
trainer.logger='["console","wandb"]'
trainer.project_name='llama3-1b-instruct'
trainer.experiment_name="ppo_exp_c${combo_id}_r${run_id}_s${seed}"
trainer.n_gpus_per_node=1
trainer.nnodes=1
trainer.save_freq=100
trainer.test_freq=100
trainer.total_epochs=5
2>&1 | tee -a "$log_file"`
The train and test files are the standard math dataset. The verl code is unmodified
Expected behavior
The training should run. Instead, I receive the following trace
File "<string>", line 30, in __init__ File "verl/verl/workers/config/critic.py", line 204, in __post_init__ super().__post_init__() File "verl/verl/workers/config/critic.py", line 96, in __post_init__ self.model_config = HFModelConfig( ^^^^^^^^^^^^^^ File "<string>", line 35, in __init__ File "verl/verl/workers/config/model.py", line 156, in __post_init__ self.tokenizer = hf_tokenizer(self.local_tokenizer_path, trust_remote_code=self.trust_remote_code) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "verl/verl/utils/tokenizer.py", line 61, in hf_tokenizer tokenizer = AutoTokenizer.from_pretrained(name_or_path, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File ".venv/lib/python3.12/site-packages/transformers/models/auto/tokenization_auto.py", line 1153, in from_pretrained raise ValueError( ValueError: Tokenizer class TokenizersBackend does not exist or is not currently imported.