-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: ROCm Fooocus doesn't garbage collect allocated VRAM #3257
Comments
Note this problem is unique to Fooocus/ROCm. With InvokeAI/ROCm, I can observe the VRAM being used as the image is generated, but it is correctly released after the generation is finished. Fooocus however hangs onto the memory indefinitely (I waited literally days), preventing other AI tools from working. There is no UI way to force it to release the memory, the only way is to restart Fooocus. I'm using git master @ 5a71495 dated 2024-07-01. |
Also, both InvokeAI and Fooocus are using PyTorch/ROCm, so what I am asking for is clearly possible. Someone more familiar with the code could probably have a look at how InvokeAI handles VRAM allocations, and port that into Fooocus. |
I assume you're not using low vram mode, which would force unloading after generation (afaik). |
I'm running
Example usage Low vram mode ( |
This is somewhat normal, some things are kept in cache / RAM / VRAM for Fooocus to generate images faster the next time, as they would have to be loaded again. There also currently is no offload button, but If you do not want this behaviour you can change the code and try to manually trigger the offload after generation yourself. https://github.com/lllyasviel/Fooocus/blob/main/ldm_patched/modules/model_management.py#L357 I sadly don't have an AMD card and can't confirm the issue, so please connect with other community members having one by opening a new discussion and by referencing this issue. Thanks! |
|
I got that, but can't confirm for AMD as i don't have an AMD GPU. Please get in touch with other users by opening a new discussion. |
Are you saying you don't believe bug reports until at least 1 other person have corroborated?? I don't see every issue being duplicated in "Discussions" in this way, but alright if you insist. In the meantime I've written a script to automatically restart Fooocus if there are no console logs for 120 seconds. For Fooocus this needs to be run as #!/usr/bin/python
"""Run a command, except kill-and-re-run it if it doesn't produce stdout/stderr
within a given timeout.
If the command is a python script, you MOST LIKELY need to run it as `python -u`
for this wrapper to work properly, since python has nonstandard nonline
buffering by default.
"""
import psutil
import select
import sys
import subprocess
import signal
import threading
output_t_s = 120
sigint_t_s = 10
trmkil_t_s = 5
log_prefix = "================"
autorestart = True
def stop(subproc):
global autorestart
autorestart = True # only autorestart if the process was stopped by us
print(log_prefix, 'send SIGINT', subproc)
subproc.send_signal(signal.SIGINT)
# send SIGINT to all children processes too, this matches the behaviour
# when you ctrl-C in a shell, and is required for many complex programs to
# interpret SIGINT in the expected way.
for c in subproc.children(True):
print(log_prefix, 'send SIGINT', c)
c.send_signal(signal.SIGINT)
try:
subproc.wait(timeout=sigint_t_s)
except subprocess.TimeoutExpired:
print(log_prefix, 'send SIGTERM')
subproc.terminate()
try:
subproc.wait(timeout=trmkil_t_s)
except subprocess.TimeoutExpired:
print(log_prefix, 'send SIGKILL')
subproc.kill()
try:
subproc.wait(timeout=trmkil_t_s)
except subprocess.TimeoutExpired:
pass
def run(args): # run the command which is passed as a parameter to this script
global autorestart
autorestart = False # don't autorestart unless we called stop()
subproc = psutil.Popen(args, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
stopper = None
print(log_prefix, 'running', args, subproc)
while subproc.returncode is None:
rs, _, _ = select.select([subproc.stdout, subproc.stderr], [], [], output_t_s)
for rf in rs:
data = rf.read1(65536)
buf = sys.stdout.buffer if rf is subproc.stdout else sys.stderr.buffer
buf.write(data)
buf.flush()
if not rs and stopper is None:
stopper = threading.Thread(target = lambda: stop(subproc))
stopper.start()
if stopper:
stopper.join()
while autorestart:
run(sys.argv[1::]) Code uses |
I have asked the community here: #3258 |
I also can't debug and/or fix this as i don't have the necessary hardware, so somebody else has to fix this. |
The current code intentionally does not free memory on ROCm, with a comment "seems to make things worse on ROCm". ldm_patched/modules/model_management.py#L769 - blame, original commit by @lllyasviel I don't see that it makes anything "worse", so here is a PR that fixes that and makes ROCm behave the same as CUDA: #3262 If @lllyasviel can remember what "worse" actually means, then here is an alternative more conservative PR that forces the free only when |
With #3262, the current code will free memory between every image generation on ROCm - which is what's already happening on CUDA. A more ideal behaviour would be to have a timeout to free the memory, so that we don't unnecessarily free it when we are about to immediately generate another image. However the current code doesn't do this for CUDA or anything else, so I consider it out-of-scope for this issue. |
Thanks @infinity0, I have also noticed in the past that the VRAM is not freed while Fooocus is running, needing to shut it down when using other applications wanting to make use of the GPU. I've tried the fix from #3262 on my system with a RDNA2 card (ROCM 6.1, Kernel 6.7) and it works perfectly fine so far. |
Checklist
What happened?
On ROCm / amdgpu, Fooocus doesn't garbage collect used VRAM even after several hours. This means that other applications, such as other AI image generators, cannot use the VRAM and give "out of memory" errors.
Steps to reproduce the problem
radeontop
. See that VRAM allocation is still many GB.radeontop
is back down to normal levels. Also, the other AI tool succeeds.What should have happened?
What browsers do you use to access Fooocus?
No response
Where are you running Fooocus?
Locally
What operating system are you using?
Debian GNU/Linux
Console logs
Additional information
No response
The text was updated successfully, but these errors were encountered: