You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Followed the guide examples/dreambooth/README_flux.md guide setting up and training, got cuda OOM with 3090Ti 24GB.
Reproduction
PC got 256GB RAM
3090Ti VRAM 24GB
torch 2.4.1 + cuda 12.1
export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
accelerate==1.0.1
transformers==4.45.2
Logs
2024-10-21 22:23:00.221007: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-10-21 22:23:00.231181: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-10-21 22:23:00.243106: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-10-21 22:23:00.246839: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-10-21 22:23:00.256022: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-10-21 22:23:01.042086: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
10/21/2024 22:23:01 - INFO - __main__ - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Mixed precision type: bf16
You set`add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
You are using a model of type clip_text_model to instantiate a model of type. This is not supported for all configurations of models and can yield errors.
You are using a model of type t5 to instantiate a model of type. This is not supported for all configurations of models and can yield errors.
Downloading shards: 100%|██████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 11602.50it/s]
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████| 2/2 [00:06<00:00, 3.39s/it]
Fetching 3 files: 100%|█████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 7227.40it/s]
{'axes_dims_rope'} was not found in config. Values will be initialized to default values.
Traceback (most recent call last):
File "/mnt/sat/ai/diffusers-train/diffusers/examples/dreambooth/train_dreambooth_lora_flux.py", line 1892, in<module>
main(args)
File "/mnt/sat/ai/diffusers-train/diffusers/examples/dreambooth/train_dreambooth_lora_flux.py", line 1182, in main
text_encoder_two.to(accelerator.device, dtype=weight_dtype)
File "/mnt/sat/ai/diffusers-train/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2958, in to
returnsuper().to(*args, **kwargs)
File "/mnt/sat/ai/diffusers-train/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1174, in to
return self._apply(convert)
File "/mnt/sat/ai/diffusers-train/lib/python3.10/site-packages/torch/nn/modules/module.py", line 780, in _apply
module._apply(fn)
File "/mnt/sat/ai/diffusers-train/lib/python3.10/site-packages/torch/nn/modules/module.py", line 780, in _apply
module._apply(fn)
File "/mnt/sat/ai/diffusers-train/lib/python3.10/site-packages/torch/nn/modules/module.py", line 780, in _apply
module._apply(fn)
[Previous line repeated 4 more times]
File "/mnt/sat/ai/diffusers-train/lib/python3.10/site-packages/torch/nn/modules/module.py", line 805, in _apply
param_applied = fn(param)
File "/mnt/sat/ai/diffusers-train/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1160, in convert
return t.to(
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 80.00 MiB. GPU 0 has a total capacity of 23.69 GiB of which 13.25 MiB is free. Including non-PyTorch memory, this process has 23.62 GiB memory in use. Of the allocated memory 23.36 GiB is allocated by PyTorch, and 16.20 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
Traceback (most recent call last):
File "/mnt/sat/ai/diffusers-train/bin/accelerate", line 8, in<module>sys.exit(main())
File "/mnt/sat/ai/diffusers-train/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
args.func(args)
File "/mnt/sat/ai/diffusers-train/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1168, in launch_command
simple_launcher(args)
File "/mnt/sat/ai/diffusers-train/lib/python3.10/site-packages/accelerate/commands/launch.py", line 763, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/mnt/sat/ai/diffusers-train/bin/python', 'train_dreambooth_lora_flux.py', '--pretrained_model_name_or_path=black-forest-labs/FLUX.1-dev', '--instance_data_dir=../../../../SD-Downloads/AnnieOnly1024', '--output_dir=lora-flux', '--mixed_precision=bf16', '--instance_prompt=sk3anni3', '--resolution=1024', '--train_batch_size=1', '--guidance_scale=1', '--gradient_accumulation_steps=4', '--optimizer=prodigy', '--learning_rate=1e-1', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--max_train_steps=500', '--validation_prompt=sk3anni3 in apartment', '--validation_epochs=25', '--seed=0']' returned non-zero exit status 1.
System Info
Diffusers version is latest main branch code today, 2024-10-21, coz previous release tag still not yet support dreambooth Flux Lora training.
Who can help?
No response
The text was updated successfully, but these errors were encountered:
Describe the bug
Followed the guide examples/dreambooth/README_flux.md guide setting up and training, got cuda OOM with 3090Ti 24GB.
Reproduction
PC got 256GB RAM
3090Ti VRAM 24GB
torch 2.4.1 + cuda 12.1
export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
accelerate==1.0.1
transformers==4.45.2
Logs
System Info
Diffusers version is latest main branch code today, 2024-10-21, coz previous release tag still not yet support dreambooth Flux Lora training.
Who can help?
No response
The text was updated successfully, but these errors were encountered: