Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't use stylegan2 256x256 models #42

Open
NitayGitHub opened this issue Nov 19, 2024 · 8 comments
Open

Can't use stylegan2 256x256 models #42

NitayGitHub opened this issue Nov 19, 2024 · 8 comments

Comments

@NitayGitHub
Copy link

Describe the bug
I wanted to use transfer learning on 256x256 pkl and my data contains images of 256x256 yet I got this error.

Training options:
{
  "G_kwargs": {
    "class_name": "training.networks_stylegan2.Generator",
    "z_dim": 256,
    "w_dim": 256,
    "mapping_kwargs": {
      "num_layers": 8,
      "freeze_layers": 0,
      "freeze_embed": false
    },
    "channel_base": 32768,
    "channel_max": 256,
    "fused_modconv_default": "inference_only"
  },
  "D_kwargs": {
    "class_name": "training.networks_stylegan2.Discriminator",
    "block_kwargs": {
      "freeze_layers": 0
    },
    "mapping_kwargs": {},
    "epilogue_kwargs": {
      "mbstd_group_size": 4
    },
    "channel_base": 32768,
    "channel_max": 256
  },
  "G_opt_kwargs": {
    "class_name": "torch.optim.Adam",
    "betas": [
      0,
      0.99
    ],
    "eps": 1e-08,
    "lr": 0.002
  },
  "D_opt_kwargs": {
    "class_name": "torch.optim.Adam",
    "betas": [
      0,
      0.99
    ],
    "eps": 1e-08,
    "lr": 0.002
  },
  "loss_kwargs": {
    "class_name": "training.loss.StyleGAN2Loss",
    "r1_gamma": 16.0,
    "style_mixing_prob": 0.9,
    "pl_weight": 2,
    "pl_no_weight_grad": true,
    "blur_init_sigma": 0
  },
  "data_loader_kwargs": {
    "pin_memory": true,
    "prefetch_factor": 2,
    "num_workers": 3
  },
  "training_set_kwargs": {
    "class_name": "training.dataset.ImageFolderDataset",
    "path": "./datasets/FH.zip",
    "use_labels": false,
    "max_size": 4592,
    "xflip": false,
    "yflip": false,
    "resolution": 256,
    "random_seed": 0
  },
  "num_gpus": 2,
  "batch_size": 16,
  "batch_gpu": 8,
  "metrics": [],
  "total_kimg": 25000,
  "resume_kimg": 360,
  "kimg_per_tick": 4,
  "network_snapshot_ticks": 10,
  "image_snapshot_ticks": 10,
  "snap_res": "4k",
  "random_seed": 0,
  "ema_kimg": 5.0,
  "G_reg_interval": 4,
  "augment_kwargs": {
    "class_name": "training.augment.AugmentPipe",
    "xflip": 1,
    "rotate90": 1,
    "xint": 1,
    "scale": 1,
    "rotate": 1,
    "aniso": 1,
    "xfrac": 1,
    "brightness": 1,
    "contrast": 1,
    "lumaflip": 1,
    "hue": 1,
    "saturation": 1
  },
  "ada_target": 0.6,
  "resume_pkl": "https://nvlabs-fi-cdn.nvidia.com/stylegan2/networks/stylegan2-cat-config-f.pkl",
  "ada_kimg": 100,
  "ema_rampup": null,
  "run_dir": "./results/00004-stylegan2-FH-gpus2-batch16-gamma16-resume_lsuncat256"
}

Output directory:    ./results/00004-stylegan2-FH-gpus2-batch16-gamma16-resume_lsuncat256
Number of GPUs:      2
Batch size:          16 images
Training duration:   25000 kimg
Dataset path:        ./datasets/FH.zip
Dataset size:        4592 images
Dataset resolution:  256
Dataset labels:      False
Dataset x-flips:     False
Dataset y-flips:     False

Launching processes...
Loading training set...

Num images:  4592
Image shape: [3, 256, 256]
Label shape: [0]
Downloading https://api.ngc.nvidia.com/v2/models/nvidia/research/stylegan2/versions/1/files/stylegan2-ffhq-256x256.pkl ... done
Traceback (most recent call last):
  File "train.py", line 369, in <module>
    main()  # pylint: disable=no-value-for-parameter
  File "/opt/conda/lib/python3.7/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python3.7/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python3.7/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "train.py", line 362, in main
    launch_training(c=c, desc=desc, outdir=opts.outdir, dry_run=opts.dry_run)
  File "train.py", line 94, in launch_training
    torch.multiprocessing.spawn(fn=subprocess_fn, args=(c, temp_dir), nprocs=c.num_gpus)
  File "/opt/conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 240, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/opt/conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 198, in start_processes
    while not context.join():
  File "/opt/conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 160, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException: 

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
    fn(i, *args)
  File "/kaggle/working/stylegan3-fun/train.py", line 50, in subprocess_fn
    training_loop.training_loop(rank=rank, **c)
  File "/kaggle/working/stylegan3-fun/training/training_loop.py", line 163, in training_loop
    misc.copy_params_and_buffers(resume_data[name], module, require_all=False)
  File "/kaggle/working/stylegan3-fun/torch_utils/misc.py", line 162, in copy_params_and_buffers
    tensor.copy_(src_tensors[name].detach()).requires_grad_(tensor.requires_grad)
RuntimeError: The size of tensor a (256) must match the size of tensor b (512) at non-singleton dimension 0

@nuclearsugar
Copy link

I believe that the model that is shown in the log above is 512x512. Instead you need to use a 256x256 model, such as:
https://api.ngc.nvidia.com/v2/models/org/nvidia/team/research/stylegan2/1/files?redirect=true&path=stylegan2-ffhq-256x256.pkl

Also when training a 256x256 model then be sure to include the following attribute in your training parameters.
--cbase=16384

@NitayGitHub
Copy link
Author

I run
!python train.py --outdir=./results --cbase=16384 --snap=10 --img-snap=10 --cfg=stylegan2 --data=./datasets/FH.zip --augpipe=bgc --gpus=2 --metrics=None --gamma=12 --batch=16 --resume='https://api.ngc.nvidia.com/v2/models/org/nvidia/team/research/stylegan2/1/files?redirect=true&path=stylegan2-ffhq-256x256.pkl'

and got the same issue

@nuclearsugar
Copy link

I realize that I gave you that URL for the 256x256 model, but it's not a valid download link.

Try the code below (as documented here).

!python train.py --outdir=./results --cbase=16384 --snap=10 --img-snap=10 --cfg=stylegan2 --data=./datasets/FH.zip --augpipe=bgc --gpus=2 --metrics=None --gamma=12 --batch=16 --resume=ffhq256

@PDillis
Copy link
Owner

PDillis commented Nov 21, 2024

The error comes from the dimensionality in the latent space, as you have at the top of your configuration: "G_kwargs": {..., "z_dim": 256, "w_dim": 256, ...}. This is bizarre, as we set up the correct dimensionality here (and is the one that the pre-trained models are using). Perhaps there's somewhere else these values are being changed, but I'll have to look into it as the train.py file only changes this value for --cfg=stylegan2-ext.

@NitayGitHub
Copy link
Author

NitayGitHub commented Nov 21, 2024

I changed "z_dim" and "w_dim" to 256, thinking it might help but it didn't. However, I believe the problem was with the dataset I used where for some reason some images were not in 256x256 size. I added
if img.size != (256, 256):
img = img.resize((256, 256))

and it fixed it.
Although torchvision transforms.RandomCrop(size=256) should have made sure all images are in 256x256 resolution, it didn't.

@PDillis
Copy link
Owner

PDillis commented Nov 21, 2024

Yeah you need to exactly match the model you are finetuning from, otherwise there's no way to use the weights. For the reshaping of your data, do you mean you used dataset_tool.py and it still resulted in images of different size, or do you have another pipeline there?

@NitayGitHub
Copy link
Author

Actually, it seems the reason it was fixed was thanks to adding --cbase=16384

@nuclearsugar
Copy link

Indeed, in my experience the --cbase=16384 is required when fine-tuning a 256x256 model. Otherwise it will throw an error prompt.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants