Error when training Stylegan2-ext #38

nuclearsugar · 2024-04-07T22:59:59Z

When I try to start training using --cfg=stylegan2-ext then it errors out with the following message:

"TypeError: __init__() got an unexpected keyword argument 'extended_sgan2'"

The text was updated successfully, but these errors were encountered:

PDillis · 2024-04-10T14:19:36Z

I removed certain things to make it easier, but completely forgot to thoroughly removed these extra parameters. I removed lines 275 and 277 in train.py (both say c.G_kwargs.extended_sgan2 = True), and it should run. I tested now with a small dataset and training started, so let me know if this works for you before I push the fix.

nuclearsugar · 2024-04-10T15:29:07Z

I removed lines 275 and 277 in train.py and tried to start training, but I'm seeing a different error now:

  File "C:\Users\Zenith\Desktop\stylegan3-fun\train.py", line 367, in <module>
    main()  # pylint: disable=no-value-for-parameter
  File "C:\Users\Zenith\miniconda3\envs\stylegan3\lib\site-packages\click\core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "C:\Users\Zenith\miniconda3\envs\stylegan3\lib\site-packages\click\core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "C:\Users\Zenith\miniconda3\envs\stylegan3\lib\site-packages\click\core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "C:\Users\Zenith\miniconda3\envs\stylegan3\lib\site-packages\click\core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "C:\Users\Zenith\Desktop\stylegan3-fun\train.py", line 360, in main
    launch_training(c=c, desc=desc, outdir=opts.outdir, dry_run=opts.dry_run)
  File "C:\Users\Zenith\Desktop\stylegan3-fun\train.py", line 94, in launch_training
    torch.multiprocessing.spawn(fn=subprocess_fn, args=(c, temp_dir), nprocs=c.num_gpus)
  File "C:\Users\Zenith\miniconda3\envs\stylegan3\lib\site-packages\torch\multiprocessing\spawn.py", line 240, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "C:\Users\Zenith\miniconda3\envs\stylegan3\lib\site-packages\torch\multiprocessing\spawn.py", line 198, in start_processes
    while not context.join():
  File "C:\Users\Zenith\miniconda3\envs\stylegan3\lib\site-packages\torch\multiprocessing\spawn.py", line 160, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "C:\Users\Zenith\miniconda3\envs\stylegan3\lib\site-packages\torch\multiprocessing\spawn.py", line 69, in _wrap
    fn(i, *args)
  File "C:\Users\Zenith\Desktop\stylegan3-fun\train.py", line 50, in subprocess_fn
    training_loop.training_loop(rank=rank, **c)
  File "C:\Users\Zenith\Desktop\stylegan3-fun\training\training_loop.py", line 163, in training_loop
    misc.copy_params_and_buffers(resume_data[name], module, require_all=False)
  File "C:\Users\Zenith\Desktop\stylegan3-fun\torch_utils\misc.py", line 162, in copy_params_and_buffers
    tensor.copy_(src_tensors[name].detach()).requires_grad_(tensor.requires_grad)
RuntimeError: The size of tensor a (512) must match the size of tensor b (1024) at non-singleton dimension 0

PDillis · 2024-04-10T16:26:09Z

Hmm this is related to #39. Thing is, I cannot reproduce it all the time, as I'm able to load pre-trained models as starting points (using --resume), but sometimes it fails in the same line in torch_utils/misc.py. Can you help me by specifying both: which pre-trained model are you starting from (resolution, RGB/RGBA, etc.), and the same for the data you are using now for training?

nuclearsugar · 2024-04-11T15:49:37Z

I'm trying to do some transfer learning. Here are the details:

Pre-Trained Model

StyleGAN2Extended-Aydao-AnimeDanbooru2019s-512x512-5268480kimg.pkl
512x512
RGB

Dataset

71,791 PNG images
512x512
RGB

nuclearsugar · 2024-04-11T15:50:14Z

Interesting to note, if I instead use a snapshot of this repo that I have saved from when Stylegan2-ext was newly implemented (2023-02-22) then the training starts up without any issues.

PDillis · 2024-04-15T11:36:29Z

Ok thanks that helps, I was thinking that since it worked then and there were no issues like this. I don't remember updating much, but I'll see the diff in case I moved something.

nuclearsugar · 2024-09-20T19:30:08Z

Quick update, I'm experiencing this issue using Google Colan notebook. Same issue as before.

PDillis added the bug Something isn't working label Apr 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error when training Stylegan2-ext #38

Error when training Stylegan2-ext #38

nuclearsugar commented Apr 7, 2024

PDillis commented Apr 10, 2024

nuclearsugar commented Apr 10, 2024

PDillis commented Apr 10, 2024

nuclearsugar commented Apr 11, 2024

nuclearsugar commented Apr 11, 2024

PDillis commented Apr 15, 2024

nuclearsugar commented Sep 20, 2024

Error when training Stylegan2-ext #38

Error when training Stylegan2-ext #38

Comments

nuclearsugar commented Apr 7, 2024

PDillis commented Apr 10, 2024

nuclearsugar commented Apr 10, 2024

PDillis commented Apr 10, 2024

nuclearsugar commented Apr 11, 2024

nuclearsugar commented Apr 11, 2024

PDillis commented Apr 15, 2024

nuclearsugar commented Sep 20, 2024