You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to figure out how to run DPO on Llama 3.1 405B, already tuned with LoRA with save_adapter_weights_only: True. Tuning the 405B with LoRA works great, saving only the adapter, which I then use in VLLM on top of the original meta weights (without any weight-adapter merging whatsoever).
So now I just need a little preference alignment. I tried several options, including continuing from a checkpoint. The thing is that for adapters only, I don't have a recipe checkpoint to proceed from, and this most probably is wrong, since DPO would continue from another recipe?
I looked a little into the recipes code, before I start digging in, I decided to post my question here,
thanks in advance
The text was updated successfully, but these errors were encountered:
Hmmm, interesting question. This is definitely something we can help you hack together! Off the top of my head, you could merge the adapter weights back into the base Llama3.1 405B model (You can look at our internal code for how to do this or use something like merge-kit), then start the DPO training from there.
Unfortunately, we don't have OOTB support for resuming training via an adapter config, but this is definitely something that we could look into adding.
in the meantime I LoRA tuned the 405B with saving weights - it takes a total of 6 hours on a 8xH200 VM, from which training is just 1 hour (to loss .3) and copying GPU to CPU + writing to disk is another ~5 hours (and another ~800 GB disk space) - a little wasteful compared to just ~1 GB for the adapters, but all good, for now I can live with that.
I then am running DPO with some minor tweaks (cpu offloading) to your original 8B_lora_dpo.config.
I am trying to figure out how to run DPO on Llama 3.1 405B, already tuned with LoRA with
save_adapter_weights_only: True
. Tuning the 405B with LoRA works great, saving only the adapter, which I then use in VLLM on top of the original meta weights (without any weight-adapter merging whatsoever).So now I just need a little preference alignment. I tried several options, including continuing from a checkpoint. The thing is that for adapters only, I don't have a recipe checkpoint to proceed from, and this most probably is wrong, since DPO would continue from another recipe?
I looked a little into the recipes code, before I start digging in, I decided to post my question here,
thanks in advance
The text was updated successfully, but these errors were encountered: