Value Augmented Sampling

This repository contains the official implementation for our paper, Value Augmented Sampling for Aligning and Personalizing Language Models.

Abstract

Aligning Large Language Models (LLMs) to cater to different human preferences, learning new skills, and unlearning harmful behavior is an important problem. Search-based methods, such as Best-of-N or Monte-Carlo Tree Search, are performant, but impractical for LLM adaptation due to their high inference cost. On the other hand, using Reinforcement Learning (RL) for adaptation is computationally efficient, but performs worse due to the optimization challenges in co-training the value function and the policy. We present a new framework for reward optimization, Value Augmented Sampling (VAS), that can maximize different reward functions using data sampled from only the initial, frozen LLM. VAS solves for the optimal reward-maximizing policy without co-training the policy and the value function, making the optimization stable, outperforming established baselines, such as PPO and DPO, on standard benchmarks, and achieving comparable results to Best-of-128 with lower inference cost. Unlike existing RL methods that require changing the weights of the LLM, VAS does not require access to the weights of the pre-trained LLM. Thus, it can even adapt LLMs (e.g., ChatGPT), which are available only as APIs. In addition, our algorithm unlocks the new capability of composing several rewards and controlling the extent of each one during deployment time, paving the road ahead for the future of aligned, personalized LLMs.

Installation

Install our custom version of trl:

git clone [email protected]:idanshen/trl.git
cd trl
python setup.py install

Clone and install the codebase:

git clone [email protected]:idanshen/Value-Augmented-Sampling.git
cd Value-Augmented-Sampling
pip install -e .

How to use

We provide a script for training a TinyLlama-1B model as the value estimator of a Llama-2 7B model on Anthropic's HH dataset.

To follow the pipeline described in the paper, we provide a supervised fine tuned version of these models:

python tinyllama_hh.py --log_with=wandb --ref_model_name hanseungwook/vas-llama-2-7b-hh-sft --model_name hanseungwook/vas-tiny-llama-1.1b-hh-sft

Citation

@inproceedings{
  han2024value,
  title={Value Augmented Sampling for Language Model Alignment and Personalization},
  author={Seungwook Han and Idan Shenfeld and Akash Srivastava and Yoon Kim and Pulkit Agrawal},
  booktitle={ICLR 2024 Workshop on Reliable and Responsible Foundation Models},
  year={2024},
  url={https://arxiv.org/abs/2405.06639}
}

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
tinyllama_hh.py		tinyllama_hh.py
vas_config.py		vas_config.py
vas_trainer.py		vas_trainer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Value Augmented Sampling

Abstract

Installation

How to use

Citation

About

Releases

Packages

Languages

License

idanshen/Value-Augmented-Sampling

Folders and files

Latest commit

History

Repository files navigation

Value Augmented Sampling

Abstract

Installation

How to use

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages