Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DPO after / on top of LoRA tuning #2272

Open
albertbn opened this issue Jan 16, 2025 · 2 comments
Open

DPO after / on top of LoRA tuning #2272

albertbn opened this issue Jan 16, 2025 · 2 comments
Assignees
Labels
discussion Start a discussion triaged This issue has been assigned an owner and appropriate label

Comments

@albertbn
Copy link

I am trying to figure out how to run DPO on Llama 3.1 405B, already tuned with LoRA with save_adapter_weights_only: True. Tuning the 405B with LoRA works great, saving only the adapter, which I then use in VLLM on top of the original meta weights (without any weight-adapter merging whatsoever).

So now I just need a little preference alignment. I tried several options, including continuing from a checkpoint. The thing is that for adapters only, I don't have a recipe checkpoint to proceed from, and this most probably is wrong, since DPO would continue from another recipe?

I looked a little into the recipes code, before I start digging in, I decided to post my question here,

thanks in advance

@joecummings joecummings added discussion Start a discussion triaged This issue has been assigned an owner and appropriate label labels Jan 16, 2025
@joecummings joecummings self-assigned this Jan 16, 2025
@joecummings
Copy link
Contributor

Hmmm, interesting question. This is definitely something we can help you hack together! Off the top of my head, you could merge the adapter weights back into the base Llama3.1 405B model (You can look at our internal code for how to do this or use something like merge-kit), then start the DPO training from there.

Unfortunately, we don't have OOTB support for resuming training via an adapter config, but this is definitely something that we could look into adding.

@albertbn
Copy link
Author

albertbn commented Jan 18, 2025

thanks,

in the meantime I LoRA tuned the 405B with saving weights - it takes a total of 6 hours on a 8xH200 VM, from which training is just 1 hour (to loss .3) and copying GPU to CPU + writing to disk is another ~5 hours (and another ~800 GB disk space) - a little wasteful compared to just ~1 GB for the adapters, but all good, for now I can live with that.

I then am running DPO with some minor tweaks (cpu offloading) to your original 8B_lora_dpo.config.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Start a discussion triaged This issue has been assigned an owner and appropriate label
Projects
None yet
Development

No branches or pull requests

2 participants