Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Onlinedpo Support rm with different vocab size #368

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from
Draft

Conversation

vwxyzjn
Copy link
Collaborator

@vwxyzjn vwxyzjn commented Sep 25, 2024

To test it out run

python mason.py \
    --cluster ai2/jupiter-cirrascale-2 --image costah/online_dpo_rm2 --pure_docker_mode \
    --workspace ai2/tulu-3-dev \
    --priority high \
    --budget ai2/allennlp \
    --preemptible \
    --gpus 8 -- accelerate launch --num_processes 7 --config_file configs/ds_configs/deepspeed_zero3.yaml \
    open_instruct/online_dpo_vllm_thread.py \
    --exp_name "online_dpo_vllm_thread_different_rm" \
    --dataset_mixer '{"HuggingFaceH4/no_robots": 9500, "AI-MO/NuminaMath-TIR": 72441}' \
    --dataset_train_splits train \
    --dataset_eval_mixer '{"HuggingFaceH4/no_robots": 1.0}' \
    --dataset_eval_splits test \
    --max_token_length 2048 \
    --max_prompt_token_lenth 2048 \
    --learning_rate 8e-7 \
    --output_dir /output/ \
    --chat_template tulu \
    --per_device_train_batch_size 1 \
    --per_device_eval_batch_size 1 \
    --gradient_accumulation_steps 64 \
    --local_rollout_forward_batch_size 1 \
    --vllm_device cuda:7 \
    --num_epochs 1 \
    --num_mini_batches 1 \
    --total_episodes 300000 \
    --model_name_or_path allenai/open_instruct_dev \
    --model_revision finetune__meta-llama_Meta-Llama-3.1-8B__42__1726352218 \
    --reward_model_path Skywork/Skywork-Reward-Llama-3.1-8B \
    --non_stop_penalty \
    --stop_token eos \
    --penalty_reward_value -10.0 \
    --beta 0.03 \
    --num_evals 3 \
    --seed 3 \
    --response_length 1536 \
    --gradient_checkpointing \
    --with_tracking \
    --push_to_hub

It seems to work properly

https://wandb.ai/ai2-llm/open_instruct_internal/reports/online-DPO-with-different-RM-tokenizer--Vmlldzo5NDk2OTE4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant