Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when training Stylegan2-ext #38

Open
nuclearsugar opened this issue Apr 7, 2024 · 7 comments
Open

Error when training Stylegan2-ext #38

nuclearsugar opened this issue Apr 7, 2024 · 7 comments
Labels
bug Something isn't working

Comments

@nuclearsugar
Copy link

When I try to start training using --cfg=stylegan2-ext then it errors out with the following message:

"TypeError: __init__() got an unexpected keyword argument 'extended_sgan2'"

@PDillis
Copy link
Owner

PDillis commented Apr 10, 2024

I removed certain things to make it easier, but completely forgot to thoroughly removed these extra parameters. I removed lines 275 and 277 in train.py (both say c.G_kwargs.extended_sgan2 = True), and it should run. I tested now with a small dataset and training started, so let me know if this works for you before I push the fix.

@PDillis PDillis added the bug Something isn't working label Apr 10, 2024
@nuclearsugar
Copy link
Author

I removed lines 275 and 277 in train.py and tried to start training, but I'm seeing a different error now:

  File "C:\Users\Zenith\Desktop\stylegan3-fun\train.py", line 367, in <module>
    main()  # pylint: disable=no-value-for-parameter
  File "C:\Users\Zenith\miniconda3\envs\stylegan3\lib\site-packages\click\core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "C:\Users\Zenith\miniconda3\envs\stylegan3\lib\site-packages\click\core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "C:\Users\Zenith\miniconda3\envs\stylegan3\lib\site-packages\click\core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "C:\Users\Zenith\miniconda3\envs\stylegan3\lib\site-packages\click\core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "C:\Users\Zenith\Desktop\stylegan3-fun\train.py", line 360, in main
    launch_training(c=c, desc=desc, outdir=opts.outdir, dry_run=opts.dry_run)
  File "C:\Users\Zenith\Desktop\stylegan3-fun\train.py", line 94, in launch_training
    torch.multiprocessing.spawn(fn=subprocess_fn, args=(c, temp_dir), nprocs=c.num_gpus)
  File "C:\Users\Zenith\miniconda3\envs\stylegan3\lib\site-packages\torch\multiprocessing\spawn.py", line 240, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "C:\Users\Zenith\miniconda3\envs\stylegan3\lib\site-packages\torch\multiprocessing\spawn.py", line 198, in start_processes
    while not context.join():
  File "C:\Users\Zenith\miniconda3\envs\stylegan3\lib\site-packages\torch\multiprocessing\spawn.py", line 160, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "C:\Users\Zenith\miniconda3\envs\stylegan3\lib\site-packages\torch\multiprocessing\spawn.py", line 69, in _wrap
    fn(i, *args)
  File "C:\Users\Zenith\Desktop\stylegan3-fun\train.py", line 50, in subprocess_fn
    training_loop.training_loop(rank=rank, **c)
  File "C:\Users\Zenith\Desktop\stylegan3-fun\training\training_loop.py", line 163, in training_loop
    misc.copy_params_and_buffers(resume_data[name], module, require_all=False)
  File "C:\Users\Zenith\Desktop\stylegan3-fun\torch_utils\misc.py", line 162, in copy_params_and_buffers
    tensor.copy_(src_tensors[name].detach()).requires_grad_(tensor.requires_grad)
RuntimeError: The size of tensor a (512) must match the size of tensor b (1024) at non-singleton dimension 0

@PDillis
Copy link
Owner

PDillis commented Apr 10, 2024

Hmm this is related to #39. Thing is, I cannot reproduce it all the time, as I'm able to load pre-trained models as starting points (using --resume), but sometimes it fails in the same line in torch_utils/misc.py. Can you help me by specifying both: which pre-trained model are you starting from (resolution, RGB/RGBA, etc.), and the same for the data you are using now for training?

@nuclearsugar
Copy link
Author

I'm trying to do some transfer learning. Here are the details:

Pre-Trained Model

  • StyleGAN2Extended-Aydao-AnimeDanbooru2019s-512x512-5268480kimg.pkl
  • 512x512
  • RGB

Dataset

  • 71,791 PNG images
  • 512x512
  • RGB

@nuclearsugar
Copy link
Author

Interesting to note, if I instead use a snapshot of this repo that I have saved from when Stylegan2-ext was newly implemented (2023-02-22) then the training starts up without any issues.

@PDillis
Copy link
Owner

PDillis commented Apr 15, 2024

Ok thanks that helps, I was thinking that since it worked then and there were no issues like this. I don't remember updating much, but I'll see the diff in case I moved something.

@nuclearsugar
Copy link
Author

Quick update, I'm experiencing this issue using Google Colan notebook. Same issue as before.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants