🌾 OAT: Online AlignmenT for LLMs
thompson-sampling
alignment
distributed-training
dueling-bandits
dpo
distributed-rl
llm
rlhf
llm-aligment
online-alignment
llm-exploration
-
Updated
Dec 1, 2024 - Python