Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lora loading for bf16 and fp8, as separate models #24

Merged
merged 12 commits into from
Oct 31, 2024
Merged

Conversation

andreasjansson
Copy link
Member

@andreasjansson andreasjansson commented Sep 27, 2024

Had to fix some bugs in the original lora loading code.

Outputs are here: https://replicate.com/replicate-internal/test-flux-dev-lora

  • bf16 lora loading works
  • For fp8 lora loading, the lora strength is much lower and I need to crank up lora_scale to 1.7 to match lora_scale 1.1 at bf16

Fusing and unfusing is also slightly lossy so the model slowly degrades over time. We could do something like what peft and add a new node that does the matmul on the fly, instead of fusing. But that would slow down inference. Curious if you have ideas @daanelson

Had to fix some bugs in the original lora loading code.

Outputs are here: https://replicate.com/replicate-internal/test-flux-dev-lora
* bf16 lora loading works
* For fp8 lora loading, the lora strength is much lower and I need to crank up lora_scale to 1.7 to match lora_scale 1.1 at bf16
@Averylamp
Copy link

Very excited for this PR. Thanks for doing this! I was looking for an H100 lora inference provider and this seems like it would do the trick. I was curious as well if pricing for fast generations would be any different than per image because the GPU usage time is much less?

@Averylamp
Copy link

Hi, I was curious if this work is continuing on this PR? I believe this should make flux dev lora inference fast enough that you'd gain a customer versus using Fal as they currently are a few seconds faster (but from my benchmarking, this should be faster in the end). Happy to take on any tasks if you would like as well.

@daanelson daanelson marked this pull request as ready for review October 31, 2024 23:22
Copy link
Collaborator

@daanelson daanelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clean up a few things and then we're good!

.github/workflows/push.yaml Outdated Show resolved Hide resolved
cog-safe-push-dev-lora.yaml Outdated Show resolved Hide resolved
fp8/lora_loading.py Show resolved Hide resolved
@daanelson daanelson merged commit 1d75bdd into main Oct 31, 2024
1 check passed
@daanelson daanelson deleted the lora-loading branch October 31, 2024 23:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants