Skip to content

Conversation

@Koratahiu
Copy link
Owner

This is a major update to the library, introducing:

Compiled Optimizers

Enables torch.compile for the optimizer step (via the compiled_optimizer option) across all adv optimizers.

When to use:

  • Features with noticeable overhead: With torch.compile, the cost of complex features becomes negligible:
    • OrthoGrad: Mitigates the ~33% overhead seen with small batch sizes.
    • 1-bit Factored mode: Reduces calculation overhead.
    • 3-state optimizers (e.g., AdEMAMix): Efficiently handles the additional states.
  • Full Finetuning: Mitigates optimizer-side bottlenecks in larger models.
  • Orthogonal Optimizers: Reduces the cost of orthogonalization ops in Muon and AdaMuon.

1-bit Factored Changes (nnmf_factor)

  • Code Centralization: Greatly reduced code duplication; all logic now resides in adv_optm\util\factorization_util.py.
  • Reduced Temporary Tensors: Temporary tensors are significantly reduced, lowering VRAM spikes in factored mode.

Muon Variants

etc.

@Koratahiu Koratahiu merged commit 5a1a7fa into main Jan 5, 2026
@Koratahiu Koratahiu deleted the v2.0 branch January 5, 2026 18:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants