-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Low performance on mps backed #2041
Comments
Thanks for reporting this issue. I don't have a Mac to try to reproduce the issue, so I cannot really help you here. Honestly, I don't know much about MPS in general and how well it is supported by PyTorch. Still, maybe you could provide some further information and maybe other users who see this issue can give further advise:
PS: Please don't ping "saya", they're not related to this project. |
Honestly, it's awful. I mean, it is presented in some sense, but the list of missed primitives for mps is huge and it doesn't seem getting any shorter. So if this lib leveraging PyTorch as mps backed — it's a bad luck for me. But anyway I raised this one as a starting point, because as long as I get it, this lib leveraging accelerate lib which is smth like back-end managing layer for all of the different gpu related stuff, and if so it's quite clear that the pain point comes from there. Am I right with that? ps: sorry for saya mention it was gh completion failure 😅 |
So what you're saying is that the slowness stems from the lackluster support of MPS in PyTorch, and as PEFT is using PyTorch, slow MPS performance is expected. Is that right? If there are specific operations in PEFT that could be replaced with alternatives that are more efficient for MPS, let us know, apart from that I don't think there is much we can do.
I'd say not quite. PEFT uses a few functions from accelerate, e.g. for moving tensors on and off devices if the base model requires it, but apart from that is pretty much independent of accelerate. Also, accelerate is not so much a "back-end managing layer of the different gpu related stuff", but more so an library for providing a seamless integration of (mostly) training features, like parallelization, dealing with large models, mixed precision, etc. Managing devices is just a "side effect" of dealing with those. If you use PEFT with |
Yeah, I'm gonna to dig this thing through sooner than later to get what operations fall back on cpu in peft. Thank you for the thorough overview of peft stack, it would help on the next step of debugging thing. So not sure about long open issues treatment here, I'd keep it open that be an eyesore for me, but it's annoying for you feel free to close it, I'll open new one later. |
All right. We can keep this open for the time being, maybe it helps get some eyes on the topic. |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. |
Cmon dude stop pushing me, I'm doing what I can |
System Info
Who can help?
Information
Tasks
examples
folderReproduction
Expected behavior
This pipeline utilises gpu on 10-15% meanwhile cpu is utilised 30-50%.
mlx framework with the quite same lora train setup on the same model utilises gpu twice to third times more.
Such low utilisation leads to quite a slow training progress in comparison to mlx one.
The text was updated successfully, but these errors were encountered: