You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a minimal example, involving only the forward pass, on Flux's master:
using Flux
using Statistics, Random
using CUDA
functiontrain_mlp()
d_in =128
d_out =128
batch_size =128
num_iters =10
device =gpu_device()
model =Dense(d_in => d_out) |> device
x =randn(Float32, d_in, batch_size) |> device
for iter in1:num_iters
ŷ =model(x)
@info iter
# GC.gc(true)
CUDA.pool_status()
endendtrain_mlp()
# GC.gc(true)# CUDA.raclaim()
This issue has emerged multiple times on discord
https://discourse.julialang.org/t/memory-usage-increasing-with-each-epoch/121798
https://discourse.julialang.org/t/flux-memory-usage-high-in-srcnn/115174
https://discourse.julialang.org/t/out-of-memory-using-flux-cnn-during-back-propagation-phase/24492
https://discourse.julialang.org/t/flux-gpu-memory-problems/79783
and it could be related to #828 #302 #736 and JuliaGPU/CUDA.jl#137
This is a minimal example, involving only the forward pass, on Flux's master:
with output
Running multiple times
train_mlp()
the memory usage keeps ever increasing and more and more memory is reserved.Mitigation strategies are to set memory limit like
or to manually run the garbage collector
which slows done a lot if done every iteration.
This behavior is highly problematic because training runs quickly fill the gpu and one cannot run other gpu processes.
cc @maleadt
The text was updated successfully, but these errors were encountered: