DPO after / on top of LoRA tuning #2272

albertbn · 2025-01-16T02:45:00Z

I am trying to figure out how to run DPO on Llama 3.1 405B, already tuned with LoRA with save_adapter_weights_only: True. Tuning the 405B with LoRA works great, saving only the adapter, which I then use in VLLM on top of the original meta weights (without any weight-adapter merging whatsoever).

So now I just need a little preference alignment. I tried several options, including continuing from a checkpoint. The thing is that for adapters only, I don't have a recipe checkpoint to proceed from, and this most probably is wrong, since DPO would continue from another recipe?

I looked a little into the recipes code, before I start digging in, I decided to post my question here,

thanks in advance

The text was updated successfully, but these errors were encountered:

joecummings · 2025-01-16T19:39:08Z

Hmmm, interesting question. This is definitely something we can help you hack together! Off the top of my head, you could merge the adapter weights back into the base Llama3.1 405B model (You can look at our internal code for how to do this or use something like merge-kit), then start the DPO training from there.

Unfortunately, we don't have OOTB support for resuming training via an adapter config, but this is definitely something that we could look into adding.

albertbn · 2025-01-18T03:46:34Z

thanks,

in the meantime I LoRA tuned the 405B with saving weights - it takes a total of 6 hours on a 8xH200 VM, from which training is just 1 hour (to loss .3) and copying GPU to CPU + writing to disk is another ~5 hours (and another ~800 GB disk space) - a little wasteful compared to just ~1 GB for the adapters, but all good, for now I can live with that.

I then am running DPO with some minor tweaks (cpu offloading) to your original 8B_lora_dpo.config.

joecummings added discussion Start a discussion triaged This issue has been assigned an owner and appropriate label labels Jan 16, 2025

joecummings self-assigned this Jan 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DPO after / on top of LoRA tuning #2272

DPO after / on top of LoRA tuning #2272

albertbn commented Jan 16, 2025

joecummings commented Jan 16, 2025

albertbn commented Jan 18, 2025 •

edited

Loading

DPO after / on top of LoRA tuning #2272

DPO after / on top of LoRA tuning #2272

Comments

albertbn commented Jan 16, 2025

joecummings commented Jan 16, 2025

albertbn commented Jan 18, 2025 • edited Loading

albertbn commented Jan 18, 2025 •

edited

Loading