No weight decay for bias/norm/token/etc. by guarin · Pull Request #597 · lightly-ai/lightly-train

guarin · 2026-02-03T13:24:39Z

What has changed and why?

Skip weight decay for bias/norm/token/etc.
Reduce number of logged lr/wd values

This PR updates all models except LTDETR for which I'll create a follow-up PR.

How has it been tested?

Unit tests
Manual tests

Did you update CHANGELOG.md?

Yes
Not needed (internal change)

Did you update the documentation?

Yes
Not needed (internal change without effects for user)

chatgpt-codex-connector

💡 Codex Review

lightly-train/src/lightly_train/_task_models/dinov3_eomt_semantic_segmentation/train_model.py

Lines 600 to 602 in eb56ace

    
               grouped.append(current_group) 
        
           elif group["lr"] != current_group["lr"]: 
        
               assert last_group is not None

Split groups by weight_decay to avoid merging wd/no-wd params

The new logic assigns weight_decay: 0.0 to no‑decay params earlier in this method, but group_param_groups still only splits when the LR changes. When two adjacent groups share the same LR but have different weight_decay values, they get merged here and the merged group keeps the first group's weight_decay. That means either no‑decay params get decayed or decayable params lose decay, depending on ordering. This is a regression introduced by adding weight_decay: 0.0 without updating the grouping condition; other EoMT models in this commit updated the condition to check weight_decay too.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copilot

Pull request overview

This PR implements selective weight decay exclusion for specific parameter types (bias, normalization layers, tokens, etc.) and reduces logging overhead by selectively logging learning rate and weight decay values.

Changes:

Added a new get_weight_decay_parameters function to categorize parameters based on whether they should have weight decay applied
Updated optimizer creation in multiple task models to use separate parameter groups with different weight decay values
Added conditional logging for optimizer parameter groups to reduce logging overhead
Updated the warmup_steps calculation in picodet to prevent it from exceeding total steps

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
src/lightly_train/_optim/optimizer_helpers.py	Added new function to categorize parameters for weight decay and updated imports
tests/_optim/test_optimizer_helpers.py	Added comprehensive test for the new weight decay parameter categorization function
src/lightly_train/_task_models/picodet_object_detection/train_model.py	Updated optimizer to use separate parameter groups and fixed warmup_steps calculation
src/lightly_train/_task_models/dinov3_eomt_semantic_segmentation/train_model.py	Updated optimizer to apply selective weight decay and added conditional logging
src/lightly_train/_task_models/dinov3_eomt_panoptic_segmentation/train_model.py	Updated optimizer and grouping logic to handle different weight decay values correctly
src/lightly_train/_task_models/dinov3_eomt_instance_segmentation/train_model.py	Updated optimizer and grouping logic to handle different weight decay values correctly
src/lightly_train/_task_models/dinov2_linear_semantic_segmentation/train_model.py	Replaced lightly library's function with internal implementation
src/lightly_train/_task_models/dinov2_eomt_semantic_segmentation/train_model.py	Updated optimizer and grouping logic to handle different weight decay values correctly
src/lightly_train/_commands/train_task.py	Added conditional logging to respect the "log" flag in parameter groups
CHANGELOG.md	Documented the new feature

src/lightly_train/_task_models/picodet_object_detection/train_model.py

src/lightly_train/_commands/train_task.py

IgorSusmelj

LGTM
I was wondering whether we could put some of the helper into LightlySSL?

…ms-and-tokens

guarin · 2026-02-05T14:44:51Z

I was wondering whether we could put some of the helper into LightlySSL?

The weight decay / no weight decay helper function is in LightlySSL although not with all the functionality. We could definitely also move it there.

guarin · 2026-02-05T14:45:16Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c62cad255a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

src/lightly_train/_optim/optimizer_helpers.py

guarin added 2 commits February 3, 2026 13:21

No weight decay for bias/norm/token/etc.

b177e44

Update changelog

eb56ace

Copilot AI review requested due to automatic review settings February 3, 2026 13:24

Copilot started reviewing on behalf of guarin February 3, 2026 13:24 View session

chatgpt-codex-connector bot reviewed Feb 3, 2026

View reviewed changes

Copilot AI reviewed Feb 3, 2026

View reviewed changes

src/lightly_train/_task_models/picodet_object_detection/train_model.py Show resolved Hide resolved

guarin added 2 commits February 3, 2026 13:39

Update

2beed3d

Fix

12d9dd0

guarin commented Feb 3, 2026

View reviewed changes

src/lightly_train/_commands/train_task.py Show resolved Hide resolved

IgorSusmelj approved these changes Feb 3, 2026

View reviewed changes

guarin added 2 commits February 5, 2026 14:39

Merge branch 'main' into guarin-trn-1803-no-weight-decay-for-bias-nor…

6db2c33

…ms-and-tokens

Scale with sqrt for adamw

c62cad2

chatgpt-codex-connector bot reviewed Feb 5, 2026

View reviewed changes

src/lightly_train/_optim/optimizer_helpers.py Show resolved Hide resolved

Clarify recursion depth

c9aa940

guarin enabled auto-merge (squash) February 5, 2026 14:55

guarin merged commit 09d15ed into main Feb 5, 2026
19 checks passed

guarin deleted the guarin-trn-1803-no-weight-decay-for-bias-norms-and-tokens branch February 5, 2026 15:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No weight decay for bias/norm/token/etc.#597

No weight decay for bias/norm/token/etc.#597
guarin merged 7 commits intomainfrom
guarin-trn-1803-no-weight-decay-for-bias-norms-and-tokens

guarin commented Feb 3, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

IgorSusmelj left a comment •

edited

Loading

Uh oh!

guarin commented Feb 5, 2026

Uh oh!

guarin commented Feb 5, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	grouped.append(current_group)
	elif group["lr"] != current_group["lr"]:
	assert last_group is not None

Conversation

guarin commented Feb 3, 2026

What has changed and why?

How has it been tested?

Did you update CHANGELOG.md?

Did you update the documentation?

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

IgorSusmelj left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

guarin commented Feb 5, 2026

Uh oh!

guarin commented Feb 5, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

IgorSusmelj left a comment •

edited

Loading