-
-
Notifications
You must be signed in to change notification settings - Fork 433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Issue]: Flux Model load hanging forever out of nowhere #3484
Comments
first in windows disable nvidia usage of shared memory (google for instructions)! then, lets look at memory utilization: go to windows task manger:
|
sorry to be a pain, but you cropped the screenshot so numbers below the graphs are not visible - need to see dedicated/shared splits. |
pls try with the latest update |
Just after upgrading to latest dev version :
|
well, its not hanging anymore - but you do have issues - and they are specific to qint quantized model, not general:
can you check this?
and rerun model load? |
Erf... new error :
|
|
Actually I tried with nf4 flux model so it might be an issue with Disty0/FLUX.1-dev-qint4_tf-qint8_te. It was working before Flux was completely supported in SDNext and now, maybe I have to go with the models you suggest in embedded list of models. |
float-quantized models use bitsandbytes so if nf4 is working, great. but fyi - thats using different quantization engine - nothing wrong with that, just noting. |
Hi, I’m experiencing the same issue as OP on Windows 11 on the master branch. The Flux models are stuck loading indefinitely. I tested the same model, "Disty0/FLUX.1-dev-qint4_tf-qint8_te," along with others from the recommended models (as the Wiki suggests), but without success. I’ve also tried various suggestions mentioned here, like disabling shared memory, but the problem persists. I’ll gather logs and additional information to share later this week and try the dev branch. |
@vladmandic I agree that something is wrong with a dependency but I don't think it is optimum.quanto directly. I think there is a combo of diffusers/torch/quanto on windows that currently breaks the workflow somewhere. I can't figure out what it is sadly... But definitely it was working before and something happened with GPU memory allocation as I don't see the GPU being used whenever this issue arise (despite having CPU and system memory heavily used while the model is loading indefinitely) |
this is optimum.quanto error. yes, it may be linked to the fact that you're using newer torch than before, but its optimum.quanto non-the-less. |
Issue Description
Hi, it's been a month now that I'm stuck with my setup trying to make FLUX.dev work again. For the record, I tried FLUX on my PC early september with the model "Disty0/FLUX.1-dev-qint4_tf-qint8_te" and it was working on my PC which was a big surprise but a good one.
After being away for a few days, I came back and had many updates (windows, Nvidia and SDNext) to do but after doing all updates nothing was working.
There was multiple errors when reinstalling sdNext so I decided to go with a fresh install and upgrading python to 3.11 (which I read was recommended).
I saw that it was installing Torch with CUDA 12.4 and I realised I didn't have this one installed so I did.
And now comes my issue : after starting SDNext, downloading the Flux model I was using before, puting back the settings as they were. The model "loading" is hanging forever, using a lot of CPU and Memory but nothing really happens in the UI nor produce any logs to debug on.
I thought it could be the system memory offload from my GPU so I made sure it is not activated and it didn't change anything.
I tried going back to previous dev version I was using at the time it was working but it didn't change anything either.
So I thought it was maybe Nvidia firmware and installed the previous version : didn't work as well.
Then I started tweak SDNext settings : model, balanced, sequential offload modes.
For sequential I got an error instead of hanging sometimes :
11:20:49-742597 INFO Autodetect model: detect="FLUX" class=FluxPipeline
file="models\Diffusers\models--Disty0--FLUX.1-dev-qint4_tf-qint8_te\snapshots\e40bd0d879eff11b5
9d5b6fca9233accfaed08e0" size=0MB
Downloading shards: 100%|██████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 2002.05it/s]
Diffusers 3.61s/it █████████████ 100% 2/2 00:07 00:00 Loading checkpoint shards
Diffusers 15.58it/s ████████ 100% 7/7 00:00 00:00 Loading pipeline components...
11:21:15-487263 INFO Load network: type=embeddings loaded=0 skipped=0 time=0.00
11:21:15-527261 ERROR Setting model: offload=sequential WeightQBytesTensor.new() missing 6 required positional
The only thing that is bothering me is that while it's hanging, CPU and RAM are at max usage but GPU is not used at all... And this is happening before inference even starts.
I didn't see anyone having the same issues so I guess this is a very tricky one but I hope someone will have fresh ideas on things I could try to make it work again.
Version Platform Description
Setup :
Relevant log output
Backend
Diffusers
UI
Standard
Branch
Dev
Model
Other
Acknowledgements
The text was updated successfully, but these errors were encountered: