Skip to content

Conversation

@Koratahiu
Copy link
Contributor

Rework of #1083

It’s pretty stable and ready (just needs testing).

More info: Koratahiu/Advanced_Optimizers#12

@betterftr
Copy link
Contributor

self.layer_state[layer_key]['sum_sq_accumulator'] += torch.sum(grad.detach().pow(2)).float() in Kourkoutas.py gives error when used with cpu offloading in Onetrainer (device mismatch)

also compiled optimizer setting required to run OT venv using the native cmd for visual studio otherwise fatal error C1083: Cannot open include file: 'omp.h' error

@Koratahiu
Copy link
Contributor Author

self.layer_state[layer_key]['sum_sq_accumulator'] += torch.sum(grad.detach().pow(2)).float() in Kourkoutas.py gives error when used with cpu offloading in Onetrainer (device mismatch)

This is odd.
K-B logic should be unchanged, but it uses the first parameter for device determination. This mirrors a previous Prodigy device mismatch error (specific to CPU offloading), which may be the cause.
I will see a better approach

also compiled optimizer setting required to run OT venv using the native cmd for visual studio otherwise fatal error C1083: Cannot open include file: 'omp.h' error

I don't understand this issue; it works for me (and it should work, as it's just a standard optimizer setting for OT).

@Koratahiu
Copy link
Contributor Author

self.layer_state[layer_key]['sum_sq_accumulator'] += torch.sum(grad.detach().pow(2)).float() in Kourkoutas.py gives error when used with cpu offloading in Onetrainer (device mismatch)

Should be fixed in dev2, can you confirm?

@Koratahiu Koratahiu marked this pull request as draft December 27, 2025 23:10
@Koratahiu Koratahiu marked this pull request as ready for review January 5, 2026 18:05
@Koratahiu
Copy link
Contributor Author

This is now well-tested and ready.

The only remaining issue is Muon's strange interaction with torch.compile.
It worked for me while training SDXL, but others reported that it's exploding, which looks like a torch.compile issue that happens in specific use cases.

@dxqb
Copy link
Collaborator

dxqb commented Jan 7, 2026

renamed OptimizerConfig.compiled_optimizer to OptimizerConfig.compile just for consistency. TrainConfig.compile is the config for model compilation.
the parameter to the optimizers in adv_optm are unchanged

@dxqb dxqb changed the base branch from master to merge January 7, 2026 17:14
@dxqb dxqb merged commit b2d12d5 into Nerogar:merge Jan 7, 2026
1 check passed
@Koratahiu Koratahiu deleted the advoptm_2 branch January 13, 2026 22:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants