Make fp8 work on older GPUs #34

yorickvP · 2024-10-10T15:32:22Z

Tested on RTX A5000.

Use float32 matmul on older GPUs
Offload based on total GPU memory instead of 'A40' in name
Tell the FP8 code about offloading and compiling

daanelson · 2024-10-10T19:25:42Z

@yorickvP this is great, making 8-bit inference work regardless of hardware is useful. That said, when I run cog predict -i prompt=<whatever> on an A40, it takes ~10 minutes to compile and then there's no output. can you take a look? wary of pushing a broken path here

yorickvP · 2024-10-29T14:35:51Z

I checked again, and the prediction eventually succeeded: https://replicate.com/p/8r81q9z2b5rgp0cjv26977a6kg .
What this changed is that it now attempts to compile fp8 on A40 where it didn't previously, which brings the boot time over 10 minutes.
I'll disable fp8 compile in the offload case!

This re-enables the use of fp8 on older GPUs, which can be useful to save vram.

daanelson · 2024-11-04T21:53:50Z

@yorickvP fantastic! works great now, merging.

yorickvP requested a review from daanelson October 10, 2024 15:32

yorickvP force-pushed the yorickvp/pre-ada-mm branch from 33a9fcb to f4709cb Compare October 29, 2024 13:41

yorickvP added 4 commits November 1, 2024 16:36

fp8: fall back to float32 matmul on cuda capability < 8.9

27181bd

This re-enables the use of fp8 on older GPUs, which can be useful to save vram.

fp8: override offload/compile config based on what we're doing

f06f157

Check video memory to decide when to offload

51e7ee4

Ruff format

0aac308

yorickvP force-pushed the yorickvp/pre-ada-mm branch from f4709cb to 0aac308 Compare November 1, 2024 15:37

Don't compile fp8 when offloaded, it's going to be slow anyways

4fcda4d

daanelson approved these changes Nov 4, 2024

View reviewed changes

daanelson merged commit a9a42fb into main Nov 4, 2024
1 check passed

daanelson deleted the yorickvp/pre-ada-mm branch November 4, 2024 21:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make fp8 work on older GPUs #34

Make fp8 work on older GPUs #34

yorickvP commented Oct 10, 2024

daanelson commented Oct 10, 2024

yorickvP commented Oct 29, 2024

daanelson commented Nov 4, 2024

Make fp8 work on older GPUs #34

Make fp8 work on older GPUs #34

Conversation

yorickvP commented Oct 10, 2024

daanelson commented Oct 10, 2024

yorickvP commented Oct 29, 2024

daanelson commented Nov 4, 2024