Skip to content

No weight decay for bias/norm/token/etc.#597

Merged
guarin merged 7 commits intomainfrom
guarin-trn-1803-no-weight-decay-for-bias-norms-and-tokens
Feb 5, 2026
Merged

No weight decay for bias/norm/token/etc.#597
guarin merged 7 commits intomainfrom
guarin-trn-1803-no-weight-decay-for-bias-norms-and-tokens

Conversation

@guarin
Copy link
Contributor

@guarin guarin commented Feb 3, 2026

What has changed and why?

  • Skip weight decay for bias/norm/token/etc.
  • Reduce number of logged lr/wd values

This PR updates all models except LTDETR for which I'll create a follow-up PR.

How has it been tested?

  • Unit tests
  • Manual tests

Did you update CHANGELOG.md?

  • Yes
  • Not needed (internal change)

Did you update the documentation?

  • Yes
  • Not needed (internal change without effects for user)

Copilot AI review requested due to automatic review settings February 3, 2026 13:24
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

grouped.append(current_group)
elif group["lr"] != current_group["lr"]:
assert last_group is not None

P2 Badge Split groups by weight_decay to avoid merging wd/no-wd params

The new logic assigns weight_decay: 0.0 to no‑decay params earlier in this method, but group_param_groups still only splits when the LR changes. When two adjacent groups share the same LR but have different weight_decay values, they get merged here and the merged group keeps the first group's weight_decay. That means either no‑decay params get decayed or decayable params lose decay, depending on ordering. This is a regression introduced by adding weight_decay: 0.0 without updating the grouping condition; other EoMT models in this commit updated the condition to check weight_decay too.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements selective weight decay exclusion for specific parameter types (bias, normalization layers, tokens, etc.) and reduces logging overhead by selectively logging learning rate and weight decay values.

Changes:

  • Added a new get_weight_decay_parameters function to categorize parameters based on whether they should have weight decay applied
  • Updated optimizer creation in multiple task models to use separate parameter groups with different weight decay values
  • Added conditional logging for optimizer parameter groups to reduce logging overhead
  • Updated the warmup_steps calculation in picodet to prevent it from exceeding total steps

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/lightly_train/_optim/optimizer_helpers.py Added new function to categorize parameters for weight decay and updated imports
tests/_optim/test_optimizer_helpers.py Added comprehensive test for the new weight decay parameter categorization function
src/lightly_train/_task_models/picodet_object_detection/train_model.py Updated optimizer to use separate parameter groups and fixed warmup_steps calculation
src/lightly_train/_task_models/dinov3_eomt_semantic_segmentation/train_model.py Updated optimizer to apply selective weight decay and added conditional logging
src/lightly_train/_task_models/dinov3_eomt_panoptic_segmentation/train_model.py Updated optimizer and grouping logic to handle different weight decay values correctly
src/lightly_train/_task_models/dinov3_eomt_instance_segmentation/train_model.py Updated optimizer and grouping logic to handle different weight decay values correctly
src/lightly_train/_task_models/dinov2_linear_semantic_segmentation/train_model.py Replaced lightly library's function with internal implementation
src/lightly_train/_task_models/dinov2_eomt_semantic_segmentation/train_model.py Updated optimizer and grouping logic to handle different weight decay values correctly
src/lightly_train/_commands/train_task.py Added conditional logging to respect the "log" flag in parameter groups
CHANGELOG.md Documented the new feature

Copy link
Contributor

@IgorSusmelj IgorSusmelj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
I was wondering whether we could put some of the helper into LightlySSL?

@guarin
Copy link
Contributor Author

guarin commented Feb 5, 2026

I was wondering whether we could put some of the helper into LightlySSL?

The weight decay / no weight decay helper function is in LightlySSL although not with all the functionality. We could definitely also move it there.

@guarin
Copy link
Contributor Author

guarin commented Feb 5, 2026

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c62cad255a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@guarin guarin enabled auto-merge (squash) February 5, 2026 14:55
@guarin guarin merged commit 09d15ed into main Feb 5, 2026
19 checks passed
@guarin guarin deleted the guarin-trn-1803-no-weight-decay-for-bias-norms-and-tokens branch February 5, 2026 15:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants